High-confidence near-duplicate image detection

Wei Dong, Zhe Wang, Moses Charikar, Kai Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Scopus citations

Abstract

In this paper, we propose two techniques for near-duplicate image detection at high confidence and large scale. First, we show that entropy-based filtering eliminates ambiguous SIFT features that cause most of the false positives, and enables claiming near-duplicity with a single match of the retained high-quality features. Second, we show that graph cut can be used for query expansion with a duplicity graph computed offline to substantially improve search quality. Evaluation with web images show that when combined with sketch embedding [6], our methods achieve false positive rate orders of magnitude lower than the standard visual word approach. We demonstrate the proposed techniques with a large-scale image search engine which, using indexing data structure offline computed with a Hadoop cluster, is capable of serving more than 50 million web images with a single commodity server.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012
DOIs
StatePublished - Jul 27 2012
Event2nd ACM International Conference on Multimedia Retrieval, ICMR 2012 - Hong Kong, China
Duration: Jun 5 2012Jun 8 2012

Publication series

NameProceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012

Other

Other2nd ACM International Conference on Multimedia Retrieval, ICMR 2012
CountryChina
CityHong Kong
Period6/5/126/8/12

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Software

Keywords

  • Entropy
  • Graph cut
  • Near-duplicate
  • Query expansion

Fingerprint Dive into the research topics of 'High-confidence near-duplicate image detection'. Together they form a unique fingerprint.

Cite this