TY - GEN
T1 - Modeling LSH for performance tuning
AU - Dong, Wei
AU - Wang, Zhe
AU - Josephson, William
AU - Charikar, Moses
AU - Li, Kai
N1 - Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained in-teresting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process. To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.
AB - Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained in-teresting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process. To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.
KW - Locality sensitive hashing
KW - Similarity search
UR - http://www.scopus.com/inward/record.url?scp=70349248484&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349248484&partnerID=8YFLogxK
U2 - 10.1145/1458082.1458172
DO - 10.1145/1458082.1458172
M3 - Conference contribution
AN - SCOPUS:70349248484
SN - 9781595939913
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 669
EP - 678
BT - Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
T2 - 17th ACM Conference on Information and Knowledge Management, CIKM'08
Y2 - 26 October 2008 through 30 October 2008
ER -