Modeling LSH for performance tuning

Wei Dong, Zhe Wang, William Josephson, Moses Charikar, Kai Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

109 Scopus citations


Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained in-teresting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process. To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Number of pages10
StatePublished - 2008
Externally publishedYes
Event17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
Duration: Oct 26 2008Oct 30 2008

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings


Other17th ACM Conference on Information and Knowledge Management, CIKM'08
Country/TerritoryUnited States
CityNapa Valley, CA

All Science Journal Classification (ASJC) codes

  • General Decision Sciences
  • General Business, Management and Accounting


  • Locality sensitive hashing
  • Similarity search


Dive into the research topics of 'Modeling LSH for performance tuning'. Together they form a unique fingerprint.

Cite this