Abstract
Robust training methods typically defend against specific attack types, such as lp attacks with fixed budgets, and rarely account for the fact that de-fenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning. a method which we call continual robust train-ing (CRT). However, when implemented naively. fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space, suggesting that regular-izing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and Image Nette) and over 100 at-tack combinations demonstrate that the proposed regularization improves robust accuracy with lit-the overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 11954-12000 |
| Number of pages | 47 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 267 |
| State | Published - 2025 |
| Externally published | Yes |
| Event | 42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada Duration: Jul 13 2025 → Jul 19 2025 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence