Abstract
The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the effect of extracting spectral features from different stages of the front-end processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a back-end speaker verification system to reduce the acoustic mismatch between different handsets. Because the feature vectors obtained from the back-end server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was used in the experiments and results based on 150 speakers show that stochastic feature transformation can be added to the back-end server for compensating transducer distortion. It is also found that better verification performance can be achieved when the LMS-based blind equalization in the standard is replaced by stochastic feature transformation.
Original language | English (US) |
---|---|
Pages (from-to) | 67-77 |
Number of pages | 11 |
Journal | International Journal of Speech Technology |
Volume | 8 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2005 |
All Science Journal Classification (ASJC) codes
- Software
- Language and Linguistics
- Human-Computer Interaction
- Linguistics and Language
- Computer Vision and Pattern Recognition
Keywords
- DSR
- DSR front-end processing
- Distributed speaker verification
- Feature transformation