Abstract
Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean & variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22% and a speaker-dependent PER of 20.5%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.
Original language | English (US) |
---|---|
Pages (from-to) | 3342-3346 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2013 |
Externally published | Yes |
Event | 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France Duration: Aug 25 2013 → Aug 29 2013 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation
Keywords
- Acoustic modeling
- Context-dependency
- Reservoir computing
- Speaker normalization