Ferret: A toolkit for content-based similarity search of feature-rich data

Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Scopus citations

Abstract

Building content-based search tools for feature-rich data has been a challenging problem because feature-rich data such as audio recordings, digital images, and sensor data are inherently noisy and high dimensional. Comparing noisy data requires comparisons based on similarity instead of exact matches, and thus searching for noisy data requires similarity search instead of exact search.The Ferret toolkit is designed to help system builders quickly construct content-based similarity search systems for feature-rich data types. The key component of the toolkit is a content-based similarity search engine for generic, multi-feature object representations. To solve the similarity search problem in high-dimensional spaces, we have developed approximation methods inspired by recent theoretical results on dimension reduction. The search engine constructs sketches from feature vectors as highly compact data structures for matching, filtering and ranking data objects. The toolkit also includes several other components to help system builders address search system infrastructure issues. We have implemented the toolkit and used it to successfully construct content-based similarity search systems for four data types: audio recordings, digital photos, 3D shape models and genomic microarray data.

Original languageEnglish (US)
Title of host publicationProceedings of the 2006 EuroSys Conference
Pages317-330
Number of pages14
DOIs
StatePublished - Dec 1 2006
Event2006 EuroSys Conference - Leuven, Belgium
Duration: Apr 18 2006Apr 21 2006

Publication series

NameProceedings of the 2006 EuroSys Conference

Other

Other2006 EuroSys Conference
CountryBelgium
CityLeuven
Period4/18/064/21/06

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems

Keywords

  • Feature-rich data
  • Similarity search
  • Sketch
  • Toolkit

Fingerprint Dive into the research topics of 'Ferret: A toolkit for content-based similarity search of feature-rich data'. Together they form a unique fingerprint.

Cite this