CMSA New Technologies in Mathematics: Toward Demystifying Transformers and Attention


View Calendar
February 9, 2022 2:00 pm - 3:00 pm
via Zoom Video Conferencing

Ben Edelman - Harvard Computer Science

Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into whythey work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to sparse regression.