TY - GEN
T1 - Boosting and Rocchio Applied to Text Filtering
AU - Schapire, Robert E.
AU - Singer, Yoram
AU - Singhal, Amit
N1 - Publisher Copyright:
© 1998 ACM.
PY - 1998/8/1
Y1 - 1998/8/1
N2 - We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that AdaBoost significantly outperforms another highly effective text filtering algorithm. We then compare AdaBoost and Rocchio over three large text filtering tasks. Overall both algorithms are comparable and are quite effective. AdaBoost produces better classifiers than Rocchio when the training collection contains a very large number of relevant documents. However, on these tasks, Rocchio runs much faster than AdaBoost.
AB - We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that AdaBoost significantly outperforms another highly effective text filtering algorithm. We then compare AdaBoost and Rocchio over three large text filtering tasks. Overall both algorithms are comparable and are quite effective. AdaBoost produces better classifiers than Rocchio when the training collection contains a very large number of relevant documents. However, on these tasks, Rocchio runs much faster than AdaBoost.
UR - http://www.scopus.com/inward/record.url?scp=85165061552&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85165061552&partnerID=8YFLogxK
U2 - 10.1145/290941.290996
DO - 10.1145/290941.290996
M3 - Conference contribution
AN - SCOPUS:85165061552
T3 - SIGIR 1998 - Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 215
EP - 223
BT - SIGIR 1998 - Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998
Y2 - 24 August 1998 through 28 August 1998
ER -