TY - GEN
T1 - Data-Dependent hashing via nonlinear spectral gaps
AU - Andoni, Alexandr
AU - Naor, Assaf
AU - Nikolov, Aleksandar
AU - Razenshteyn, Ilya
AU - Waingarten, Erik
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/6/20
Y1 - 2018/6/20
N2 - We establish a generic reduction from nonlinear spectral gaps of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: For general d-dimensional normed spaces and n-point datasets, we obtain a cell-probe ANN data structure with approximation O(log ε2 d), space dO(1)n1+ε, and dO(1)nε cell probes per query, for any > 0. No non-trivial approximation was known before in this generality other than the O(d) bound which follows from embedding a general norm into ℓ2. For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query time. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a “tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.
AB - We establish a generic reduction from nonlinear spectral gaps of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: For general d-dimensional normed spaces and n-point datasets, we obtain a cell-probe ANN data structure with approximation O(log ε2 d), space dO(1)n1+ε, and dO(1)nε cell probes per query, for any > 0. No non-trivial approximation was known before in this generality other than the O(d) bound which follows from embedding a general norm into ℓ2. For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query time. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a “tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.
KW - Locality-sensitive hashing
KW - Nearest neighbor search
KW - Nonlinear spectral gaps
KW - Randomized space partitions
UR - http://www.scopus.com/inward/record.url?scp=85049894854&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049894854&partnerID=8YFLogxK
U2 - 10.1145/3188745.3188846
DO - 10.1145/3188745.3188846
M3 - Conference contribution
AN - SCOPUS:85049894854
T3 - Proceedings of the Annual ACM Symposium on Theory of Computing
SP - 685
EP - 698
BT - STOC 2018 - Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
A2 - Henzinger, Monika
A2 - Kempe, David
A2 - Diakonikolas, Ilias
PB - Association for Computing Machinery
T2 - 50th Annual ACM Symposium on Theory of Computing, STOC 2018
Y2 - 25 June 2018 through 29 June 2018
ER -