Environment adaptation for robust speaker verification

Kwok Kwong Yiu, Man Wai Mak, Sun Yuan Kung

Research output: Contribution to conferencePaper

15 Scopus citations

Abstract

In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations and (2) handset-dependent speaker models to reduce the effect caused by the acoustic distortion. Specifically, a number of Gaussian mixture models are independently trained to identify the most likely handset given a test utterance; then during recognition, the speaker model and background model are either transformed by MLLR-based handset-specific transformation or respectively replaced by a handset-dependent speaker model and a handset-dependent background model whose parameters were adapted by reinforced learning to fit the new environment. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on both MLLR and reinforced learning outperforms the classical CMS, Hnorm and Tnorm approaches, with MLLR adaptation achieves the best performance.

Original languageEnglish (US)
Pages2973-2976
Number of pages4
StatePublished - Jan 1 2003
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: Sep 1 2003Sep 4 2003

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
CountrySwitzerland
CityGeneva
Period9/1/039/4/03

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Fingerprint Dive into the research topics of 'Environment adaptation for robust speaker verification'. Together they form a unique fingerprint.

  • Cite this

    Yiu, K. K., Mak, M. W., & Kung, S. Y. (2003). Environment adaptation for robust speaker verification. 2973-2976. Paper presented at 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland.