Convergence of adversarial training in overparametrized neural networks

Ruiqi Gao, Tianle Cai, Haochuan Li, Liwei Wang, Cho Jui Hsieh, Jason D. Lee

Research output: Contribution to journalConference articlepeer-review

57 Scopus citations


Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training [31], a heuristic form of robust optimization that alternates between minimization and maximization steps, has proven to be among the most successful methods to train networks to be robust against a pre-defined family of perturbations. This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within e of the optimal robust loss. Then we show that the optimal robust loss is also close to zero, hence adversarial training finds a robust classifier. The analysis technique leverages recent work on the analysis of neural networks via Neural Tangent Kernel (NTK), combined with motivation from online-learning when the maximization is solved by a heuristic, and the expressiveness of the NTK kernel in the `8-norm. In addition, we also prove that robust interpolation requires more model capacity, supporting the evidence that adversarial training requires wider networks.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: Dec 8 2019Dec 14 2019

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of 'Convergence of adversarial training in overparametrized neural networks'. Together they form a unique fingerprint.

Cite this