TY - GEN
T1 - A theory of universal learning
AU - Bousquet, Olivier
AU - Hanneke, Steve
AU - Moran, Shay
AU - Van Handel, Ramon
AU - Yehudayoff, Amir
N1 - Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/6/15
Y1 - 2021/6/15
N2 - How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.
AB - How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.
KW - Distribution-Dependent Learning Rates
KW - Online Learning
KW - Statistical Learning
UR - http://www.scopus.com/inward/record.url?scp=85108176120&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108176120&partnerID=8YFLogxK
U2 - 10.1145/3406325.3451087
DO - 10.1145/3406325.3451087
M3 - Conference contribution
AN - SCOPUS:85108176120
T3 - Proceedings of the Annual ACM Symposium on Theory of Computing
SP - 532
EP - 541
BT - STOC 2021 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing
A2 - Khuller, Samir
A2 - Williams, Virginia Vassilevska
PB - Association for Computing Machinery
T2 - 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021
Y2 - 21 June 2021 through 25 June 2021
ER -