High-dimensional learning of narrow neural networks
CMSA EVENTS, CMSA EVENTS: CMSA MEMBER SEMINAR
This talk explores the interplay between neural network architectures and data structure through the lens of high-dimensional asymptotics. We focus on a class of narrow neural networks, namely networks possessing a finite number of hidden units, while operating in high dimensions. In the limit of large data dimension and comparably large number of samples, we derive a tight asymptotic characterization of the learning of these architectures. As an illustration, we discuss how this characterization enables the analysis of a solvable model of dot-product attention. We show how the latter can learn to implement either a positional attention mechanism (with tokens attending to each other based on their respective positions), or a semantic attention mechanism (with tokens attending to each other based on their meaning), and evidence a phase transition with sample complexity from positional to semantic learning.
Zoom ID 965 2902 1352
Passcode 322891
https://harvard.zoom.us/j/96529021352?pwd=ehXEylANVrstFfISgNJhjaPwcIuCby.1