TY - GEN
T1 - An Upper-Bound on the Required Size of a Neural Network Classifier
AU - Valavi, Hossein
AU - Ramadge, Peter J.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - There is growing interest in understanding the impact of architectural parameters such as depth, width, and the type of activation function on the performance of a neural network. We provide an upper-bound on the number of free parameters a ReLU-type neural network needs to exactly fit the training data. Whether a net of this size generalizes to test data will be governed by the fidelity of the training data and the applicability of the principle of Occam's Razor. We introduce the concept of s-separability and show that for the special case of (c-1)-separable training data with c classes, a neural network with (d + 2c) parameters can achieve 100% training classification accuracy, where d is the dimension of data. It is also shown that if the number of free parameters is at least (d+ 2p), where p is the size of the training set, the neural network can memorize each training example. Finally, a framework is introduced for finding a neural network achieving a given training error, subject to an upper-bound on layer width.
AB - There is growing interest in understanding the impact of architectural parameters such as depth, width, and the type of activation function on the performance of a neural network. We provide an upper-bound on the number of free parameters a ReLU-type neural network needs to exactly fit the training data. Whether a net of this size generalizes to test data will be governed by the fidelity of the training data and the applicability of the principle of Occam's Razor. We introduce the concept of s-separability and show that for the special case of (c-1)-separable training data with c classes, a neural network with (d + 2c) parameters can achieve 100% training classification accuracy, where d is the dimension of data. It is also shown that if the number of free parameters is at least (d+ 2p), where p is the size of the training set, the neural network can memorize each training example. Finally, a framework is introduced for finding a neural network achieving a given training error, subject to an upper-bound on layer width.
KW - Deep Learning
KW - Neural Networks
UR - http://www.scopus.com/inward/record.url?scp=85054199232&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054199232&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8461635
DO - 10.1109/ICASSP.2018.8461635
M3 - Conference contribution
AN - SCOPUS:85054199232
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2356
EP - 2360
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -