CMSA New Technologies in Mathematics: Self-induced regularization from linear regression to neural networks
Andrea Montanari - Departments of Electrical Engineering and Statistics, Stanford
Modern machine learning methods --most noticeably multi-layer neural networks-- require to fit highly non-linear models comprising tens of thousands to millions of parameters. Despite this, little attention is paid to the regularization mechanism to control model's complexity. Indeed, the resulting models are often so complex as to achieve vanishing training error: they interpolate the data. Despite this, these models generalize well to unseen data: they have small test error. I will discuss several examples of this phenomenon, beginning with a simple linear regression model, and ending with two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory. I will try to extract a unifying picture.
A common feature is the fact that a complex unregularized nonlinear model becomes essentially equivalent to a simpler model, which is however regularized in a non-trivial way.
[Based on joint papers with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]