Abstract
Random forests have become an important tool for improving accuracy in regression and classification problems since their inception by Leo Breiman in 2001. In this paper, we revisit a historically important random forest model, called centered random forests, originally proposed by Breiman in 2004 and later studied by Gérard Biau in 2012, where a feature is selected at random and the splits occurs at the midpoint of the node along the chosen feature. If the regression function is d-dimensional and Lipschitz, we show that, given access to n observations, the mean-squared prediction er-1 ror is (Equation presented). This positively answers an outstanding question of Biau about whether the rate of convergence for this random forest model could be im-1 proved beyond (Equation presented). Furthermore, by a refined analysis of the approximation and estimation errors for linear models, we show that our new rate cannot be improved in general. Finally, we generalize our analysis and improve current prediction error bounds for another random forest model, called median random forests, in which each tree is constructed from subsampled data and the splits are performed at the empirical median along a chosen feature.
Original language | English (US) |
---|---|
Pages (from-to) | 757-765 |
Number of pages | 9 |
Journal | Proceedings of Machine Learning Research |
Volume | 130 |
State | Published - 2021 |
Event | 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021 - Virtual, Online, United States Duration: Apr 13 2021 → Apr 15 2021 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability