In this paper, we study the sample complexity of weak learning. That is, we ask how many data must be collected from an unknown distribution in order to extract a small but significant advantage in prediction. We show that it is important to distinguish between those learning algorithms that output deterministic hypotheses and those that output randomized hypotheses. We prove that in the weak learning model, any algorithm using deterministic hypotheses to weakly learn a class of Vapnik-Chervonenkis dimension d(n) requires Ω(√ d(n)) examples. In contrast, when randomized hypotheses are allowed, we show that Θ(1) examples suffice in some cases. We then show that there exists an efficient algorithm using deterministic hypotheses that weakly learns against any distribution on a set of size d(n) with only O(d(n)2/3) examples. Thus for the class of symmetric Boolean functions over n variables, where the strong learning sample complexity is Θ(n), the sample complexity for weak learning using deterministic hypotheses is Ω(√ n) and O(n2/3), and the sample complexity for weak learning using randomized hypotheses is Θ(1). Next we prove the existence of classes for which the distribution-free sample size required to obtain a slight advantage in prediction over random guessing is essentially equal to that required to obtain arbitrary accuracy. Finally, for a class of small circuits, namely all parity functions of subsets of n Boolean variables, we prove a weak learning sample complexity of Θ(n). This bound holds even if the weak learning algorithm is allowed to replace random sampling with membership queries, and the target distribution is uniform on (0, 1)n.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics