CMSA New Technologies in Mathematics: Triple Descent and a Fine-Grained Bias-Variance Decomposition
Jeffrey Pennington - Google Brain
Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, striking a balance between simpler models that exhibit high bias and more complex models that exhibit high variance of the predictive function. However, such a simple trade-off does not adequately describe the behavior of many modern deep learning models, which simultaneously attain low bias and low variance in the heavily overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which are too coarse to reveal important nuances of actual neural networks. In this talk, I will describe a precise high-dimensional asymptotic analysis of Neural Tangent Kernel regression that reveals some of these nuances, including non-monotonic behavior deep in the overparameterized regime. I will also present a novel bias-variance decomposition that unambiguously attributes these surprising observations to particular sources of randomness in the training procedure.