Abstract
This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for empirical risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound.
Original language | English (US) |
---|---|
Journal | Journal of Machine Learning Research |
Volume | 40 |
Issue number | 2015 |
State | Published - 2015 |
Externally published | Yes |
Event | 28th Conference on Learning Theory, COLT 2015 - Paris, France Duration: Jul 2 2015 → Jul 6 2015 |
All Science Journal Classification (ASJC) codes
- Software
- Artificial Intelligence
- Control and Systems Engineering
- Statistics and Probability
Keywords
- Classification
- Conditional Probability Estimation
- Consistency
- Convex Duality
- Maximum Entropy
- Orlicz Spaces