We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for the algorithms stem from recent advances in online learning algorithms. The algorithms are simple to implement and are also time and memory efficient. We provide a unified analysis of the family of algorithms in the mistake bound model. We then discuss experiments with the proposed family of topic-ranking algorithms on the Reuters-21578 corpus and the new corpus released by Reuters in 2000. On both corpora, the algorithms we present achieve state-of-the-art results and outperforms topic-ranking adaptations of Rocchio's algorithm and of the Perceptron algorithm.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence