TY - GEN
T1 - PRIVACY AUDITING OF LARGE LANGUAGE MODELS
AU - Panda, Ashwinee
AU - Tang, Xinyu
AU - Nasr, Milad
AU - Choquette-Choo, Christopher A.
AU - Mittal, Prateek
N1 - Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Current techniques for privacy auditing of large language models (LLMs) have limited efficacy-they rely on basic approaches to generate canaries which leads to weak membership inference attacks that in turn give loose lower bounds on the empirical privacy leakage. We develop canaries that are far more effective than those used in prior work under threat models that cover a range of realistic settings. We demonstrate through extensive experiments on multiple families of fine-tuned LLMs that our approach sets a new standard for detection of privacy leakage. For measuring the memorization rate of non-privately trained LLMs, our designed canaries surpass prior approaches. For example, on the Qwen2.5-0.5B model, our designed canaries achieve 49.6% TPR at 1% FPR, vastly surpassing the prior approach's 4.2% TPR at 1% FPR. Our method can be used to provide a privacy audit of ε ≈ 1 for a model trained with theoretical ε of 4. To the best of our knowledge, this is the first time that a privacy audit of LLM training has achieved nontrivial auditing success in the setting where the attacker cannot train shadow models, insert gradient canaries, or access the model at every iteration.
AB - Current techniques for privacy auditing of large language models (LLMs) have limited efficacy-they rely on basic approaches to generate canaries which leads to weak membership inference attacks that in turn give loose lower bounds on the empirical privacy leakage. We develop canaries that are far more effective than those used in prior work under threat models that cover a range of realistic settings. We demonstrate through extensive experiments on multiple families of fine-tuned LLMs that our approach sets a new standard for detection of privacy leakage. For measuring the memorization rate of non-privately trained LLMs, our designed canaries surpass prior approaches. For example, on the Qwen2.5-0.5B model, our designed canaries achieve 49.6% TPR at 1% FPR, vastly surpassing the prior approach's 4.2% TPR at 1% FPR. Our method can be used to provide a privacy audit of ε ≈ 1 for a model trained with theoretical ε of 4. To the best of our knowledge, this is the first time that a privacy audit of LLM training has achieved nontrivial auditing success in the setting where the attacker cannot train shadow models, insert gradient canaries, or access the model at every iteration.
UR - https://www.scopus.com/pages/publications/105010220635
UR - https://www.scopus.com/pages/publications/105010220635#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105010220635
T3 - 13th International Conference on Learning Representations, ICLR 2025
SP - 10573
EP - 10589
BT - 13th International Conference on Learning Representations, ICLR 2025
PB - International Conference on Learning Representations, ICLR
T2 - 13th International Conference on Learning Representations, ICLR 2025
Y2 - 24 April 2025 through 28 April 2025
ER -