Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition

Fabian Triefenbach, Azarakhsh Jalalvand, Kris Demuynck, Jean Pierre Martens

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean & variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22% and a speaker-dependent PER of 20.5%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.

Original languageEnglish (US)
Pages (from-to)3342-3346
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2013
Externally publishedYes
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: Aug 25 2013Aug 29 2013

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Keywords

  • Acoustic modeling
  • Context-dependency
  • Reservoir computing
  • Speaker normalization

Fingerprint

Dive into the research topics of 'Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition'. Together they form a unique fingerprint.

Cite this