TY - GEN
T1 - Real-time online singing voice separation from monaural recordings using robust low-rank modeling
AU - Sprechmann, Pablo
AU - Bronstein, Alex
AU - Sapiro, Guillermo
PY - 2012
Y1 - 2012
N2 - Separating the leading vocals from the musical accompaniment is a challenging task that appears naturally in several music processing applications. Robust principal component analysis (RPCA) has been recently employed to this problem producing very successful results. The method decomposes the signal into a low-rank component corresponding to the accompaniment with its repetitive structure, and a sparse component corresponding to the voice with its quasi-harmonic structure. In this paper we first introduce a non-negative variant of RPCA, termed as robust low-rank non-negative matrix factorization (RNMF). This new framework better suits audio applications. We then propose two efficient feed-forward architectures that approximate the RPCA and RNMF with low latency and a fraction of the complexity of the original optimization method. These approximants allow incorporating elements of unsupervised, semi- and fully-supervised learning into the RPCA and RNMF frameworks. Our basic implementation shows several orders of magnitude speedup compared to the exact solvers with no performance degradation, and allows online and faster-than-real-time processing. Evaluation on the MIR-1K dataset demonstrates state-of-the-art performance.
AB - Separating the leading vocals from the musical accompaniment is a challenging task that appears naturally in several music processing applications. Robust principal component analysis (RPCA) has been recently employed to this problem producing very successful results. The method decomposes the signal into a low-rank component corresponding to the accompaniment with its repetitive structure, and a sparse component corresponding to the voice with its quasi-harmonic structure. In this paper we first introduce a non-negative variant of RPCA, termed as robust low-rank non-negative matrix factorization (RNMF). This new framework better suits audio applications. We then propose two efficient feed-forward architectures that approximate the RPCA and RNMF with low latency and a fraction of the complexity of the original optimization method. These approximants allow incorporating elements of unsupervised, semi- and fully-supervised learning into the RPCA and RNMF frameworks. Our basic implementation shows several orders of magnitude speedup compared to the exact solvers with no performance degradation, and allows online and faster-than-real-time processing. Evaluation on the MIR-1K dataset demonstrates state-of-the-art performance.
UR - http://www.scopus.com/inward/record.url?scp=84873423755&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84873423755&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84873423755
SN - 9789727521449
T3 - Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012
SP - 67
EP - 72
BT - Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012
T2 - 13th International Society for Music Information Retrieval Conference, ISMIR 2012
Y2 - 8 October 2012 through 12 October 2012
ER -