TY - JOUR
T1 - Good Classifiers are Abundant in the Interpolating Regime
AU - Theisen, Ryan
AU - Klusowski, Jason M.
AU - Mahoney, Michael W.
N1 - Funding Information:
MM would like to acknowledge DARPA, NSF, and ONR for providing partial support of this work. JK would like to acknowledge funding from NSF DMS-1915932 and TRIPODS DATA-INSPIRE CCF-1934924. We also thank the authors of (Gessner et al., 2020) for sharing their implementation of the lin-ess algorithm.
Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value ε∗, which deviates substantially from the test error of the worst-case interpolating model on the same datasets, indicating that “bad” classifiers are extremely rare. We provide theoretical results in a simple setting in which we characterize the full asymptotic distribution of test errors, and we show that these indeed concentrate around a value ε∗, which we also identify exactly. We then formalize a more general conjecture supported by our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning may offer a promising alternative.
AB - Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value ε∗, which deviates substantially from the test error of the worst-case interpolating model on the same datasets, indicating that “bad” classifiers are extremely rare. We provide theoretical results in a simple setting in which we characterize the full asymptotic distribution of test errors, and we show that these indeed concentrate around a value ε∗, which we also identify exactly. We then formalize a more general conjecture supported by our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning may offer a promising alternative.
UR - http://www.scopus.com/inward/record.url?scp=85161839089&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161839089&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85161839089
SN - 2640-3498
VL - 130
SP - 3376
EP - 3384
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021
Y2 - 13 April 2021 through 15 April 2021
ER -