We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
- Gradient descent
- Local minimum
- Saddle points