Abstract
We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for the algorithms stems from recent advances in online learning algorithms. The algorithms we present are simple to implement and are time and memory efficient. We evaluate the algorithms on the Reuters-21578 corpus and the new corpus released by Reuters in 2000. On both corpora the algorithms we present outperform adaptations to topic-ranking of Rocchio's algorithm and the Perceptron algorithm. We also outline the formal analysis of the algorithm in the mistake bound model. To our knowledge, this work is the first to report performance results with the entire new Reuters corpus.
Original language | English (US) |
---|---|
Pages (from-to) | 151-158 |
Number of pages | 8 |
Journal | SIGIR Forum (ACM Special Interest Group on Information Retrieval) |
State | Published - 2002 |
Event | Proceedings of the Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Tampere, Finland Duration: Aug 11 2002 → Aug 15 2002 |
All Science Journal Classification (ASJC) codes
- Management Information Systems
- Hardware and Architecture
Keywords
- Category ranking
- Online learning
- Perceptrons