Adapting to Evolving Adversaries with Regularized Continual Robust Training

  • Sihui Dai
  • , Christian Cianfarani
  • , Vikash Sehwag
  • , Prateek Mittal
  • , Arjun Bhagoji

Research output: Contribution to journalConference articlepeer-review

Abstract

Robust training methods typically defend against specific attack types, such as lp attacks with fixed budgets, and rarely account for the fact that de-fenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning. a method which we call continual robust train-ing (CRT). However, when implemented naively. fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space, suggesting that regular-izing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and Image Nette) and over 100 at-tack combinations demonstrate that the proposed regularization improves robust accuracy with lit-the overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.

Original languageEnglish (US)
Pages (from-to)11954-12000
Number of pages47
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Externally publishedYes
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: Jul 13 2025Jul 19 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Adapting to Evolving Adversaries with Regularized Continual Robust Training'. Together they form a unique fingerprint.

Cite this