TY - GEN
T1 - Multi-probe LSH
T2 - 33rd International Conference on Very Large Data Bases, VLDB 2007
AU - Lv, Qin
AU - Josephson, William
AU - Wang, Zhe
AU - Charikar, Moses
AU - Li, Kai
N1 - Publisher Copyright:
Copyright 2007 VLDB Endowment, ACM.
PY - 2007
Y1 - 2007
N2 - Similarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multi-probe LSH that overcomes this drawback. Multi-probe LSH is built on the well-known LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropy-based LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. Our evaluation shows that the multi-probe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multi-probe LSH has a similar time-efficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropy-based LSH method, to achieve the same search quality, multi-probe LSH uses less query time and 5 to 8 times fewer number of hash tables.
AB - Similarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multi-probe LSH that overcomes this drawback. Multi-probe LSH is built on the well-known LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropy-based LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. Our evaluation shows that the multi-probe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multi-probe LSH has a similar time-efficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropy-based LSH method, to achieve the same search quality, multi-probe LSH uses less query time and 5 to 8 times fewer number of hash tables.
UR - http://www.scopus.com/inward/record.url?scp=84955245129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84955245129&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84955245129
T3 - 33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
SP - 950
EP - 961
BT - 33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
A2 - Gehrke, Johannes
A2 - Koch, Christoph
A2 - Garofalakis, Minos
A2 - Aberer, Karl
A2 - Kanne, Carl-Christian
A2 - Neuhold, Erich J.
A2 - Ganti, Venkatesh
A2 - Klas, Wolfgang
A2 - Chan, Chee-Yong
A2 - Srivastava, Divesh
A2 - Florescu, Dana
A2 - Deshpande, Anand
PB - Association for Computing Machinery, Inc
Y2 - 23 September 2007 through 27 September 2007
ER -