How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
CMSA EVENTS: CMSA NEW TECHNOLOGIES IN MATHEMATICS
This talk examines how large language models (LLMs) evolve from simple word prediction to complex skills, with a focus on mathematical problem solving. A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The first part of the talk focuses on analysing emergence using the famous (and empirical) Scaling Laws of LLMs. Then I talk about howc LLMs can verbalize these skills by assigning labels to problems and clustering them into interpretable categories. This metacognitive ability allows us to leverage skill-based prompting, significantly improving performance on mathematical reasoning. I then present a framework that combines LLMs with human oversight to generate challenging, out-of-distribution math questions. This process led to the creation of the MATH^2 dataset, which enhances both model and human performance, driving
further advances in mathematical reasoning capabilities.
In-person and on Zoom:
https://harvard.zoom.us/j/92220006185?pwd=V3mrb4cNSbgRXtNJtRJkTvWFVhmbI5.1
Password: cmsa
