Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces

Wei Dong, Moses Charikar, Kai Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

88 Scopus citations

Abstract

Efficient similarity search in high-dimensional spaces is important to content-based retrieval systems. Recent studies have shown that sketches can effectively approximate L1 distance in high-dimensional spaces, and that filtering with sketches can speed up similarity search by an order of magnitude. It is a challenge to further reduce the size of sketches, which are already compact, without compromising accuracy of distance estimation. This paper presents an efficient sketch algorithm for similarity search with L 2 distances and a novel asymmetric distance estimation technique. Our new asymmetric estimator takes advantage of the original feature vector of the query to boost the distance estimation accuracy. We also apply this asymmetric method to existing sketches for cosine similarity and Li distance. Evaluations with datasets extracted from images and telephone records show that our L 2 sketch outperforms existing methods, and the asymmetric estimators consistently improve the accuracy of different sketch methods. To achieve the same search quality, asymmetric estimators can reduce the sketch size by 10% to 40%.

Original languageEnglish (US)
Title of host publicationACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
Pages123-130
Number of pages8
DOIs
StatePublished - 2008
Event31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008 - Singapore, Singapore
Duration: Jul 20 2008Jul 24 2008

Publication series

NameACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings

Other

Other31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008
Country/TerritorySingapore
CitySingapore
Period7/20/087/24/08

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Keywords

  • Asymmetric distance estimation
  • Similarity search
  • Sketch

Fingerprint

Dive into the research topics of 'Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces'. Together they form a unique fingerprint.

Cite this