Abstract
Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification.
Original language | English (US) |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference |
Pages | 82-89 |
Number of pages | 8 |
State | Published - Dec 1 2009 |
Externally published | Yes |
Event | 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009 - Vancouver, BC, Canada Duration: Dec 7 2009 → Dec 10 2009 |
Other
Other | 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009 |
---|---|
Country/Territory | Canada |
City | Vancouver, BC |
Period | 12/7/09 → 12/10/09 |
All Science Journal Classification (ASJC) codes
- Information Systems