TY - GEN
T1 - Multiclass Boosting and the Cost of Weak Learning
AU - Brukhim, Nataly
AU - Hazan, Elad
AU - Moran, Shay
AU - Mukherjee, Indraneel
AU - Schapire, Robert E.
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Boosting is an algorithmic approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. In this work we study multiclass boosting with a possibly large number of classes or categories. Multiclass boosting can be formulated in various ways. Here, we focus on an especially natural formulation in which the weak hypotheses are assumed to belong to an “easy-to-learn” base class, and the weak learner is an agnostic PAC learner for that class with respect to the standard classification loss. This is in contrast with other, more complicated losses as have often been considered in the past. The goal of the overall boosting algorithm is then to learn a combination of weak hypotheses by repeatedly calling the weak learner. We study the resources required for boosting, especially how they depend on the number of classes k, for both the booster and weak learner. We find that the boosting algorithm itself only requires O(log k) samples, as we show by analyzing a variant of AdaBoost for our setting. In stark contrast, assuming typical limits on the number of weak-learner calls, we prove that the number of samples required by a weak learner is at least polynomial in k, exponentially more than the number of samples needed by the booster. Alternatively, we prove that the weak learner's accuracy parameter must be smaller than an inverse polynomial in k, showing that the returned weak hypotheses must be nearly the best in their class when k is large. We also prove a trade-off between number of oracle calls and the resources required of the weak learner, meaning that the fewer calls to the weak learner the more that is demanded on each call.
AB - Boosting is an algorithmic approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. In this work we study multiclass boosting with a possibly large number of classes or categories. Multiclass boosting can be formulated in various ways. Here, we focus on an especially natural formulation in which the weak hypotheses are assumed to belong to an “easy-to-learn” base class, and the weak learner is an agnostic PAC learner for that class with respect to the standard classification loss. This is in contrast with other, more complicated losses as have often been considered in the past. The goal of the overall boosting algorithm is then to learn a combination of weak hypotheses by repeatedly calling the weak learner. We study the resources required for boosting, especially how they depend on the number of classes k, for both the booster and weak learner. We find that the boosting algorithm itself only requires O(log k) samples, as we show by analyzing a variant of AdaBoost for our setting. In stark contrast, assuming typical limits on the number of weak-learner calls, we prove that the number of samples required by a weak learner is at least polynomial in k, exponentially more than the number of samples needed by the booster. Alternatively, we prove that the weak learner's accuracy parameter must be smaller than an inverse polynomial in k, showing that the returned weak hypotheses must be nearly the best in their class when k is large. We also prove a trade-off between number of oracle calls and the resources required of the weak learner, meaning that the fewer calls to the weak learner the more that is demanded on each call.
UR - http://www.scopus.com/inward/record.url?scp=85126873682&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126873682&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126873682
T3 - Advances in Neural Information Processing Systems
SP - 3057
EP - 3067
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -