TY - GEN
T1 - Robust blind source separation in a reverberant room based on beamforming with a large-aperture microphone array
AU - Sanz-Robinson, Josue
AU - Huang, Liechao
AU - Moy, Tiffany
AU - Rieutort-Louis, Warren
AU - Hu, Yingzhe
AU - Wagner, Sigurd
AU - Sturm, James C.
AU - Verma, Naveen
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - Large-Area Electronics (LAE) technology has enabled the development of physically-expansive sensing systems with a flexible form-factor, including large-aperture microphone arrays. We propose an approach to blind source separation based on leveraging such an array. In our algorithm we carry out delay-sum beamforming, but use frequency-dependent time delays, making it well-suited for a practical reverberant room. This is followed by a binary mask stage for further interference cancellation. A key feature is that it is fully «blind», since it requires no prior information about the location of the speakers or microphones. Instead, we carry out k-means cluster analysis, to estimate time delays in the background from acquired audio signals that represent the mixture of simultaneous sources. We have tested this algorithm in a conference room (T60 = 350 ms), using two linear arrays consisting of: (1) commercial electret capsules, and (2) LAE microphones, fabricated in-house. We have achieved high-quality separation results, obtaining a mean PESQ MOS improvement (relative to the unprocessed signal) for the electret array of 0.7 for two sources and 0.6 for four simultaneous sources, and for the LAE array of 0.5 and 0.3, respectively.
AB - Large-Area Electronics (LAE) technology has enabled the development of physically-expansive sensing systems with a flexible form-factor, including large-aperture microphone arrays. We propose an approach to blind source separation based on leveraging such an array. In our algorithm we carry out delay-sum beamforming, but use frequency-dependent time delays, making it well-suited for a practical reverberant room. This is followed by a binary mask stage for further interference cancellation. A key feature is that it is fully «blind», since it requires no prior information about the location of the speakers or microphones. Instead, we carry out k-means cluster analysis, to estimate time delays in the background from acquired audio signals that represent the mixture of simultaneous sources. We have tested this algorithm in a conference room (T60 = 350 ms), using two linear arrays consisting of: (1) commercial electret capsules, and (2) LAE microphones, fabricated in-house. We have achieved high-quality separation results, obtaining a mean PESQ MOS improvement (relative to the unprocessed signal) for the electret array of 0.7 for two sources and 0.6 for four simultaneous sources, and for the LAE array of 0.5 and 0.3, respectively.
KW - BSS
KW - LAE
KW - beamforming
KW - large-area electronics
KW - microphone array
KW - reverberant room
KW - source separation
UR - http://www.scopus.com/inward/record.url?scp=84973345455&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973345455&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7471713
DO - 10.1109/ICASSP.2016.7471713
M3 - Conference contribution
AN - SCOPUS:84973345455
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 440
EP - 444
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -