Abstract
It is acknowledged that Hidden Markov Models (HMMs) with Gaussian Mixture Models (GMMs) as the observation density functions achieve excellent digit recognition performance at high signal to noise ratios (SNRs). Moreover, many years of research have led to good techniques to reduce the impact of noise, distortion and mismatch between training and test conditions on the recognition accuracy. Nevertheless, we still await systems that are truly robust against these confounding factors. The present paper extends recent work on acoustic modeling based on Reservoir Computing (RC), a concept that has its roots in Machine Learning. By introducing a novel analysis of reservoirs as non-linear dynamical systems, new insights are gained and translated into a new reservoir design recipe that is extremely simple and highly comprehensible in terms of the dynamics of the acoustic features and the modeled acoustic units. By tuning the reservoir to these dynamics, one can create RC-based systems that not only compete well with conventional systems in clean conditions, but also degrade more gracefully in noisy conditions. Control experiments show that noise-robustness follows from the random fixation of the reservoir neurons whereas, tuning the reservoir dynamics increases the accuracy without compromising the noise-robustness.
Original language | English (US) |
---|---|
Pages (from-to) | 135-158 |
Number of pages | 24 |
Journal | Computer Speech and Language |
Volume | 30 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2015 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Software
- Theoretical Computer Science
- Human-Computer Interaction
Keywords
- Acoustic modeling
- Automatic Speech Recognition
- Noise robust spoken digit recognition
- Recurrent Neural Networks
- Reservoir Computing