Boosting and Rocchio Applied to Text Filtering

Robert E. Schapire, Yoram Singer, Amit Singhal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that AdaBoost significantly outperforms another highly effective text filtering algorithm. We then compare AdaBoost and Rocchio over three large text filtering tasks. Overall both algorithms are comparable and are quite effective. AdaBoost produces better classifiers than Rocchio when the training collection contains a very large number of relevant documents. However, on these tasks, Rocchio runs much faster than AdaBoost.

Original languageEnglish (US)
Title of host publicationSIGIR 1998 - Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages215-223
Number of pages9
ISBN (Electronic)9781581130157
DOIs
StatePublished - Aug 1 1998
Event21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998 - Melbourne, Australia
Duration: Aug 24 1998Aug 28 1998

Publication series

NameSIGIR 1998 - Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998
Country/TerritoryAustralia
CityMelbourne
Period8/24/988/28/98

All Science Journal Classification (ASJC) codes

  • Information Systems and Management
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Boosting and Rocchio Applied to Text Filtering'. Together they form a unique fingerprint.

Cite this