TY - GEN
T1 - PatchGuard
T2 - 30th USENIX Security Symposium, USENIX Security 2021
AU - Xiang, Chong
AU - Bhagoji, Arjun Nitin
AU - Sehwag, Vikash
AU - Mittal, Prateek
N1 - Funding Information:
We are grateful to David Wagner for shepherding the paper and anonymous reviewers at USENIX Security for their valuable feedback. This work was supported in part by the National Science Foundation under grants CNS-1553437 and CNS-1704105, the ARL’s Army Artificial Intelligence Innovation Institute (A2I2), the Office of Naval Research Young Investigator Award, the Army Research Office Young Investigator Prize, Faculty research award from Facebook, Schmidt DataX award, and Princeton E-ffiliates Award.
Publisher Copyright:
© 2021 by The USENIX Association. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches. The cornerstone of PatchGuard involves the use of CNNs with small receptive fields to impose a bound on the number of features corrupted by an adversarial patch. Given a bounded number of corrupted features, the problem of designing an adversarial patch defense reduces to that of designing a secure feature aggregation mechanism. Towards this end, we present our robust masking defense that robustly detects and masks corrupted features to recover the correct prediction. Notably, we can prove the robustness of our defense against any adversary within our threat model. Our extensive evaluation on ImageNet, ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates that our defense achieves state-of-the-art performance in terms of both provable robust accuracy and clean accuracy.
AB - Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches. The cornerstone of PatchGuard involves the use of CNNs with small receptive fields to impose a bound on the number of features corrupted by an adversarial patch. Given a bounded number of corrupted features, the problem of designing an adversarial patch defense reduces to that of designing a secure feature aggregation mechanism. Towards this end, we present our robust masking defense that robustly detects and masks corrupted features to recover the correct prediction. Notably, we can prove the robustness of our defense against any adversary within our threat model. Our extensive evaluation on ImageNet, ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates that our defense achieves state-of-the-art performance in terms of both provable robust accuracy and clean accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85106619580&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106619580&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85106619580
T3 - Proceedings of the 30th USENIX Security Symposium
SP - 2237
EP - 2254
BT - Proceedings of the 30th USENIX Security Symposium
PB - USENIX Association
Y2 - 11 August 2021 through 13 August 2021
ER -