Blind stochastic feature transformation for speaker verification over cellular networks

Kwok Kwong Yiu, Man Wai Mak, Ming Cheung Cheung, Sun-Yuan Kung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Acoustic mismatch between the training and recognition conditions presents one of the serious challenges faced by speaker recognition researchers today. The goal of channel compensation is to achieve performance approaching that of a "matched condition" system while avoiding the need for a large amount of training data. It is important to ensure that the channel compensation algorithms in these systems compensate the channel variation instead of speaker variation. This paper addresses the problem of unsupervised compensation in which the features of a test utterance are transformed to fit the clean speaker model and gender-dependent background model. Specifically, a feature-based transformation is estimated based on the statistical difference between a test utterance and a composite acoustic model formed by combining the speaker and background models. By transforming the features to fit both models, the transformation is implicitly constrained. Experimental results based on the 2001 MIST evaluation set show that the proposed transformation approach achieves significant improvement in both equal error rate and minimum detection cost as compared to cepstral mean subtraction, Znorm and short-time Gaussianization.

Original languageEnglish (US)
Title of host publication2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages679-682
Number of pages4
ISBN (Print)0780386884, 9780780386884
StatePublished - Jan 1 2004
Event2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004 - Hong Kong, China, Hong Kong
Duration: Oct 20 2004Oct 22 2004

Publication series

Name2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004

Other

Other2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004
CountryHong Kong
CityHong Kong, China
Period10/20/0410/22/04

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Keywords

  • Channel compensation
  • Feature transformation
  • MAP adaptation
  • Robustness
  • Speaker verification

Fingerprint Dive into the research topics of 'Blind stochastic feature transformation for speaker verification over cellular networks'. Together they form a unique fingerprint.

Cite this