Abstract
Before backpropagation training, it is common to randomly initialize a neural network so that mean and variance of activity are uniform across neurons. Classically these statistics were defined over an ensemble of random networks. Alternatively, they can be defined over a random sample of inputs to the network. We show analytically and numerically that these two formulations of the principle of mean-variance preservation are very different in deep networks using rectification nonlinearity (ReLU). We numerically investigate training speed after data-dependent initialization of networks to preserve sample mean and variance.
Original language | English (US) |
---|---|
Article number | 033135 |
Journal | Physical Review Research |
Volume | 2 |
Issue number | 3 |
DOIs | |
State | Published - Jul 2020 |
All Science Journal Classification (ASJC) codes
- General Physics and Astronomy