Audio Similarity is Unreliable as a Proxy for Audio Quality

Pranay Manocha, Zeyu Jin, Adam Finkelstein

Research output: Contribution to journalConference articlepeer-review

Abstract

Many audio processing tasks require perceptual assessment. However, the time and expense of obtaining “gold standard” human judgments limit the availability of such data. Most applications incorporate full reference or other similarity-based metrics (e.g. PESQ) that depend on a clean reference. Researchers have relied on such metrics to evaluate and compare various proposed methods, often concluding that small, measured differences imply one is more effective than another. This paper demonstrates several practical scenarios where similarity metrics fail to agree with human perception, because they: (1) vary with clean references; (2) rely on attributes that humans factor out when considering quality, and (3) are sensitive to imperceptible signal level differences. In those scenarios, we show that no-reference metrics do not suffer from such shortcomings and correlate better with human perception. We conclude therefore that similarity serves as an unreliable proxy for audio quality.

Original languageEnglish (US)
Pages (from-to)3553-3557
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2022-September
DOIs
StatePublished - 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: Sep 18 2022Sep 22 2022

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Keywords

  • audio quality
  • perceptual metric
  • similarity metrics
  • speech enhancement
  • speech quality

Fingerprint

Dive into the research topics of 'Audio Similarity is Unreliable as a Proxy for Audio Quality'. Together they form a unique fingerprint.

Cite this