Efficiently matching sets of features with random histograms

Wei Dong, Zhe Wang, Moses Charikar, Kai Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

47 Scopus citations

Abstract

As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inef- ficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single high-dimensional vec- tor to simplify the feature-set matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate feature-set distance measures. We evaluated the proposed approach under three differ- ent task settings, i.e. content-based image search, image object recognition and near-duplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy com- parable to the feature-set matching methods, while requir- ing significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the state-of-the-art feature-set matching method.

Original languageEnglish (US)
Title of host publicationMM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops
Pages179-188
Number of pages10
DOIs
StatePublished - 2008
Event16th ACM International Conference on Multimedia, MM '08 - Vancouver, BC, Canada
Duration: Oct 26 2008Oct 31 2008

Publication series

NameMM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops

Other

Other16th ACM International Conference on Multimedia, MM '08
Country/TerritoryCanada
CityVancouver, BC
Period10/26/0810/31/08

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Software

Keywords

  • Locality sensitive hashing
  • Random histogram
  • Set of features

Fingerprint

Dive into the research topics of 'Efficiently matching sets of features with random histograms'. Together they form a unique fingerprint.

Cite this