HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

Jiaqi Su, Zeyu Jin, Adam Finkelstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

61 Scopus citations

Abstract

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.

Original languageEnglish (US)
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages4506-4510
Number of pages5
ISBN (Print)9781713820697
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: Oct 25 2020Oct 29 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period10/25/2010/29/20

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Keywords

  • Deep features
  • Denoising
  • Dereverberation
  • Generative adversarial networks
  • Speech enhancement

Fingerprint

Dive into the research topics of 'HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks'. Together they form a unique fingerprint.

Cite this