Surya Ganguli - Dept. of Applied Physics, Stanford University
Statistical mechanics and neural network theory have long enjoyed fruitful interactions. We will review some of our recent work in this area and then focus on two vignettes. First we will analyze the high dimensional geometry of neural network error landscapes that happen to arise as the classical limit of a dissipative many-body quantum optimizer. In particular, we will be able to use the Kac-Rice formula and the replica method to calculate the number, location, energy levels, and Hessian eigenspectra of all critical points of any index. Second we will review recent work on neural power laws, which reveal that the error of many neural networks falls off as a power law with network size or dataset size. Such power laws have motivated significant societal investments in large scale model training and data collection efforts. Inspired by statistical mechanics calculations, we show both in theory and in practice how we can beat neural power law scaling with respect to dataset size, sometimes achieving exponential scaling, by collecting small carefully curated datasets rather than large random ones.
Y. Bahri, J. Kadmon, J. Pennington, S. Schoenholz, J. Sohl-Dickstein, and S. Ganguli, Statistical mechanics of deep learning, Annual Reviews of Condensed Matter Physics, 2020.
Sorscher, Ben, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari S. Morcos. 2022. Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning https://arxiv.org/abs/2206.14486 (NeurIPS 2022)