CMSA New Technologies: How do Transformers reason? First principles via automata, semigroups, and circuits

SEMINARS, CMSA EVENTS

View Calendar
December 7, 2022 2:00 pm - 3:00 pm
CMSA, 20 Garden St, G10
Address: 20 Garden Street, Cambridge, MA 02138
Speaker:

Cyril Zhang - Microsoft Research


The current "Transformer era" of deep learning is marked by the emergence of combinatorial and algorithmic reasoning capabilities in large sequence models, leading to dramatic advances in natural language understanding, program synthesis, and theorem proving. What is the nature of these models' internal representations (i.e. how do they represent the states and computational steps of the algorithms they execute)? How can we understand and mitigate their weaknesses, given that they resist interpretation? In this work, we present some insights (and many further mysteries) through the lens of automata and their algebraic structure.

Specifically, we investigate the apparent mismatch between recurrent models of computation (automata & Turing machines) and Transformers (which are typically shallow and non-recurrent). Using tools from circuit complexity and semigroup theory, we characterize shortcut solutions, whereby a shallow Transformer with only o(T) layers can exactly replicate T computational steps of an automaton. We show that Transformers can efficiently represent these shortcuts in theory; furthermore, in synthetic experiments, standard training successfully finds these shortcuts. We demonstrate that shortcuts can lead to statistical brittleness, and discuss mitigations.

Joint work with Bingbin Liu, Jordan Ash, Surbhi Goel, and Akshay Krishnamurthy.


This seminar will be held in person and on Zoom. For more information on how to join, please see: https://live-hu-cmsa-222.pantheonsite.io/event_category/new-technologies-in-mathematics-seminar-series/