Cute: A concatenative method for voice conversion using exemplar-based unit selection

Zeyu Jin, Adam Finkelstein, Stephen Diverdi, Jingwan Lu, Gautham J. Mysore

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

State-of-the art voice conversion methods re-synthesize voice from spectral representations such as MFCCs and STRAIGHT, thereby introducing muffled artifacts. We propose a method that circumvents this concern using concatenative synthesis coupled with exemplar-based unit selection. Given parallel speech from source and target speakers as well as a new query from the source, our method stitches together pieces of the target voice. It optimizes for three goals: matching the query, using long consecutive segments, and smooth transitions between the segments. To achieve these goals, we perform unit selection at the frame level and introduce triphone-based preselection that greatly reduces computation and enforces selection of long, contiguous pieces. Our experiments show that the proposed method has better quality than baseline methods, while preserving high individuality.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5660-5664
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period3/20/163/25/16

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Cute: A concatenative method for voice conversion using exemplar-based unit selection'. Together they form a unique fingerprint.

  • Cite this

    Jin, Z., Finkelstein, A., Diverdi, S., Lu, J., & Mysore, G. J. (2016). Cute: A concatenative method for voice conversion using exemplar-based unit selection. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (pp. 5660-5664). [7472761] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2016-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472761