TY - GEN
T1 - Multipitch tracking in music signals using echo state networks
AU - Steiner, Peter
AU - Stone, Simon
AU - Birkholz, Peter
AU - Jalalvand, Azarakhsh
N1 - Publisher Copyright:
© 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2021/1/24
Y1 - 2021/1/24
N2 - Currently, convolutional neural networks (CNNs) define the state of the art for multipitch tracking in music signals. Echo State Networks (ESNs), a recently introduced recurrent neural network architecture, achieved similar results as CNNs for various tasks, such as phoneme or digit recognition. However, they have not yet received much attention in the community of Music Information Retrieval. The core of ESNs is a group of unordered, randomly connected neurons, i.e., the reservoir, by which the low-dimensional input space is non-linearly transformed into a high-dimensional feature space. Because only the weights of the connections between the reservoir and the output are trained using linear regression, ESNs are easier to train than deep neural networks. This paper presents a first exploration of ESNs for the challenging task of multipitch tracking in music signals. The best results presented in this paper were achieved with a bidirectional two-layer ESN with 20 000 neurons in each layer. Although the final F-score of 0.7198 still falls below the state of the art (0.7370), the proposed ESN-based approach serves as a baseline for further investigations of ESNs in audio signal processing in the future.
AB - Currently, convolutional neural networks (CNNs) define the state of the art for multipitch tracking in music signals. Echo State Networks (ESNs), a recently introduced recurrent neural network architecture, achieved similar results as CNNs for various tasks, such as phoneme or digit recognition. However, they have not yet received much attention in the community of Music Information Retrieval. The core of ESNs is a group of unordered, randomly connected neurons, i.e., the reservoir, by which the low-dimensional input space is non-linearly transformed into a high-dimensional feature space. Because only the weights of the connections between the reservoir and the output are trained using linear regression, ESNs are easier to train than deep neural networks. This paper presents a first exploration of ESNs for the challenging task of multipitch tracking in music signals. The best results presented in this paper were achieved with a bidirectional two-layer ESN with 20 000 neurons in each layer. Although the final F-score of 0.7198 still falls below the state of the art (0.7370), the proposed ESN-based approach serves as a baseline for further investigations of ESNs in audio signal processing in the future.
KW - Echo State Network
KW - MIR
KW - Multipitch
KW - RNN
KW - Reservoir Computing
UR - http://www.scopus.com/inward/record.url?scp=85099292718&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099292718&partnerID=8YFLogxK
U2 - 10.23919/Eusipco47968.2020.9287638
DO - 10.23919/Eusipco47968.2020.9287638
M3 - Conference contribution
AN - SCOPUS:85099292718
T3 - European Signal Processing Conference
SP - 126
EP - 130
BT - 28th European Signal Processing Conference, EUSIPCO 2020 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 28th European Signal Processing Conference, EUSIPCO 2020
Y2 - 24 August 2020 through 28 August 2020
ER -