Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task

This research investigates whether transformer models adaptively utilize their depth for relational reasoning, revealing distinct behaviors between pretraine...

Level: advanced

By Alicia Curth

Category: research