HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

Jiaqi Su, Zeyu Jin, Adam Finkelstein

Research output: Contribution to journalConference articlepeer-review

10 Scopus citations

Abstract

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.

Original languageEnglish (US)
Pages (from-to)4506-4510
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: Oct 25 2020Oct 29 2020

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Keywords

  • Deep features
  • Denoising
  • Dereverberation
  • Generative adversarial networks
  • Speech enhancement

Fingerprint

Dive into the research topics of 'HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks'. Together they form a unique fingerprint.

Cite this