TY - GEN
T1 - Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces
AU - Dong, Wei
AU - Charikar, Moses
AU - Li, Kai
PY - 2008
Y1 - 2008
N2 - Efficient similarity search in high-dimensional spaces is important to content-based retrieval systems. Recent studies have shown that sketches can effectively approximate L1 distance in high-dimensional spaces, and that filtering with sketches can speed up similarity search by an order of magnitude. It is a challenge to further reduce the size of sketches, which are already compact, without compromising accuracy of distance estimation. This paper presents an efficient sketch algorithm for similarity search with L 2 distances and a novel asymmetric distance estimation technique. Our new asymmetric estimator takes advantage of the original feature vector of the query to boost the distance estimation accuracy. We also apply this asymmetric method to existing sketches for cosine similarity and Li distance. Evaluations with datasets extracted from images and telephone records show that our L 2 sketch outperforms existing methods, and the asymmetric estimators consistently improve the accuracy of different sketch methods. To achieve the same search quality, asymmetric estimators can reduce the sketch size by 10% to 40%.
AB - Efficient similarity search in high-dimensional spaces is important to content-based retrieval systems. Recent studies have shown that sketches can effectively approximate L1 distance in high-dimensional spaces, and that filtering with sketches can speed up similarity search by an order of magnitude. It is a challenge to further reduce the size of sketches, which are already compact, without compromising accuracy of distance estimation. This paper presents an efficient sketch algorithm for similarity search with L 2 distances and a novel asymmetric distance estimation technique. Our new asymmetric estimator takes advantage of the original feature vector of the query to boost the distance estimation accuracy. We also apply this asymmetric method to existing sketches for cosine similarity and Li distance. Evaluations with datasets extracted from images and telephone records show that our L 2 sketch outperforms existing methods, and the asymmetric estimators consistently improve the accuracy of different sketch methods. To achieve the same search quality, asymmetric estimators can reduce the sketch size by 10% to 40%.
KW - Asymmetric distance estimation
KW - Similarity search
KW - Sketch
UR - http://www.scopus.com/inward/record.url?scp=57349114053&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=57349114053&partnerID=8YFLogxK
U2 - 10.1145/1390334.1390358
DO - 10.1145/1390334.1390358
M3 - Conference contribution
AN - SCOPUS:57349114053
SN - 9781605581644
T3 - ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
SP - 123
EP - 130
BT - ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
T2 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008
Y2 - 20 July 2008 through 24 July 2008
ER -