Abstract
The past decade has been marked by the (continued) explosion of diverse data content and the fast development of intelligent data analytics techniques. One problem we identified in the mid-2000s was similarity search of feature-rich data. The challenge here was achieving both high accuracy and high effciency in high-dimensional spaces. Locality sensitive hashing (LSH), which uses certain random space partitions and hash table lookups to find approximate nearest neighbors, was a promising approach with theoretical guarantees. But LSH alone was insuffcient since a large number of hash tables were required to achieve good search quality. Building on an idea of Panigrahy, our multi-probe LSH method introduced the idea of intelligent probing. Given a query object, we strategically probe its neighboring hash buckets (in a query-dependent fashion) by calculating the statistical probabilities of similar objects falling into each bucket. Such intelligent probing can significantly reduce the number of hash tables while achieving high quality. In this paper, we revisit the problem motivation, the challenges, the key design considerations of multi-probe LSH, as well as discuss recent developments in this space and some questions for further research.
Original language | English (US) |
---|---|
Pages (from-to) | 2021-2024 |
Number of pages | 4 |
Journal | Proceedings of the VLDB Endowment |
Volume | 10 |
Issue number | 12 |
DOIs | |
State | Published - Aug 1 2017 |
Event | 43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany Duration: Aug 28 2017 → Sep 1 2017 |
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- General Computer Science