## Abstract

We consider the problem of learning a one-hidden-layer neural network: we assume the input x ∈ R^{d} is from Gaussian distribution and the label y = a^{>}σ(Bx) + ξ, where a is a nonnegative vector in R^{m} with m ≤ d, B ∈ R^{m}×d is a full-rank weight matrix, and ξ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function G(•) whose landscape is guaranteed to have the following properties: 1. All local minima of G are also global minima. 2. All global minima of G correspond to the ground truth parameters. 3. The value and gradient of G can be estimated using samples. With these properties, stochastic gradient descent on G provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity results and validate the results by simulations.

Original language | English (US) |
---|---|

State | Published - 2018 |

Externally published | Yes |

Event | 6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada Duration: Apr 30 2018 → May 3 2018 |

### Conference

Conference | 6th International Conference on Learning Representations, ICLR 2018 |
---|---|

Country/Territory | Canada |

City | Vancouver |

Period | 4/30/18 → 5/3/18 |

## All Science Journal Classification (ASJC) codes

- Language and Linguistics
- Education
- Computer Science Applications
- Linguistics and Language