SQAPP: NO-REFERENCE SPEECH QUALITY ASSESSMENT VIA PAIRWISE PREFERENCE

Pranay Manocha, Zeyu Jin, Adam Finkelstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic speech quality assessment remains challenging, as we lack complete models of human auditory perception. Many existing full-reference models correlate well with human perception, but cannot be used in real-world scenarios where ground truth clean reference recordings are not available. On the other hand no-reference metrics typically suffer from several shortcomings, such as lack of robustness to unseen perturbations and reliance on (limited) labeled data for training. Moreover, noise or large variance among the labels makes it difficult to learn generalizable representations, especially for recordings with subtle differences. This paper proposes a learning framework for estimating the quality of a recording without any reference, and without any human judgments. The main component of this framework is a pairwise quality-preference strategy that reduces label noise, thereby making learning more robust. From pairwise preferences, we first learn a content invariant quality ordering; and then we re-target the model to predict quality on an absolute scale. We show that the resulting learned metric is well-calibrated with human judgments. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for downstream tasks. For example, we show that adding this metric to an existing speech enhancement method yields significant improvement.

Original languageEnglish (US)
Title of host publication2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages891-895
Number of pages5
ISBN (Electronic)9781665405409
DOIs
StatePublished - 2022
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: May 23 2022May 27 2022

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2022-May
ISSN (Print)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period5/23/225/27/22

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Keywords

  • audio quality
  • no-reference metric
  • pairwise preference
  • perceptual metric
  • speech enhancement
  • speech quality

Fingerprint

Dive into the research topics of 'SQAPP: NO-REFERENCE SPEECH QUALITY ASSESSMENT VIA PAIRWISE PREFERENCE'. Together they form a unique fingerprint.

Cite this