TY - GEN
T1 - Predicting nearly as well as the best pruning of a decision tree
AU - Helmbold, David P.
AU - Schapire, Robert E.
N1 - Publisher Copyright:
© 1995 ACM.
PY - 1995/7/5
Y1 - 1995/7/5
N2 - Many algorithms for inferring a decision tree from data involve a two-phase process: First, a very large decision tree is grown which typically ends up "over-fitting" the data. To reduce over-fitting, in the second phase, the tree is pruned using one of a number of available methods. The final tree is then output and used for classification on test data. In this paper, we suggest an alternative approach to the pruning phase. Using a given unpruned decision tree, we present a new method of making predictions on test data, and we prove that our algorithm's performance will not be "much worse" (in a precise technical sense) than the predictions made by the best reasonably small pruning of the given decision tree. Thus, our procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust. Our method can be viewed as a synthesis of two previously studied techniques. First, we apply Cesa-Bianchi et al.'s [3] results on predicting using "expert advice" (where we view each pruning as an "expert") to obtain an algorithm that has provably low prediction loss, but that is computationally infeasible. Next, we generalize and apply a method developed by Buntine [2, 1] and Willems, Shtarkov and Tjalkens [18, 19] to derive a very efficient implementation of this procedure.
AB - Many algorithms for inferring a decision tree from data involve a two-phase process: First, a very large decision tree is grown which typically ends up "over-fitting" the data. To reduce over-fitting, in the second phase, the tree is pruned using one of a number of available methods. The final tree is then output and used for classification on test data. In this paper, we suggest an alternative approach to the pruning phase. Using a given unpruned decision tree, we present a new method of making predictions on test data, and we prove that our algorithm's performance will not be "much worse" (in a precise technical sense) than the predictions made by the best reasonably small pruning of the given decision tree. Thus, our procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust. Our method can be viewed as a synthesis of two previously studied techniques. First, we apply Cesa-Bianchi et al.'s [3] results on predicting using "expert advice" (where we view each pruning as an "expert") to obtain an algorithm that has provably low prediction loss, but that is computationally infeasible. Next, we generalize and apply a method developed by Buntine [2, 1] and Willems, Shtarkov and Tjalkens [18, 19] to derive a very efficient implementation of this procedure.
UR - http://www.scopus.com/inward/record.url?scp=54949139410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=54949139410&partnerID=8YFLogxK
U2 - 10.1145/225298.225305
DO - 10.1145/225298.225305
M3 - Conference contribution
AN - SCOPUS:54949139410
T3 - Proceedings of the 8th Annual Conference on Computational Learning Theory, COLT 1995
SP - 61
EP - 68
BT - Proceedings of the 8th Annual Conference on Computational Learning Theory, COLT 1995
PB - Association for Computing Machinery, Inc
T2 - 8th Annual Conference on Computational Learning Theory, COLT 1995
Y2 - 5 July 1995 through 8 July 1995
ER -