TY - GEN

T1 - Data-Dependent hashing via nonlinear spectral gaps

AU - Andoni, Alexandr

AU - Naor, Assaf

AU - Nikolov, Aleksandar

AU - Razenshteyn, Ilya

AU - Waingarten, Erik

PY - 2018/6/20

Y1 - 2018/6/20

N2 - We establish a generic reduction from nonlinear spectral gaps of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: For general d-dimensional normed spaces and n-point datasets, we obtain a cell-probe ANN data structure with approximation O(log ε2 d), space dO(1)n1+ε, and dO(1)nε cell probes per query, for any > 0. No non-trivial approximation was known before in this generality other than the O(d) bound which follows from embedding a general norm into ℓ2. For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query time. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a “tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.

AB - We establish a generic reduction from nonlinear spectral gaps of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: For general d-dimensional normed spaces and n-point datasets, we obtain a cell-probe ANN data structure with approximation O(log ε2 d), space dO(1)n1+ε, and dO(1)nε cell probes per query, for any > 0. No non-trivial approximation was known before in this generality other than the O(d) bound which follows from embedding a general norm into ℓ2. For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query time. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a “tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.

KW - Locality-sensitive hashing

KW - Nearest neighbor search

KW - Nonlinear spectral gaps

KW - Randomized space partitions

UR - http://www.scopus.com/inward/record.url?scp=85049894854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049894854&partnerID=8YFLogxK

U2 - 10.1145/3188745.3188846

DO - 10.1145/3188745.3188846

M3 - Conference contribution

AN - SCOPUS:85049894854

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 685

EP - 698

BT - STOC 2018 - Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

A2 - Henzinger, Monika

A2 - Kempe, David

A2 - Diakonikolas, Ilias

PB - Association for Computing Machinery

T2 - 50th Annual ACM Symposium on Theory of Computing, STOC 2018

Y2 - 25 June 2018 through 29 June 2018

ER -