TY - JOUR
T1 - Federated Learning with Differential Privacy
T2 - Algorithms and Performance Analysis
AU - Wei, Kang
AU - Li, Jun
AU - Ding, Ming
AU - Ma, Chuan
AU - Yang, Howard H.
AU - Farokhi, Farhad
AU - Jin, Shi
AU - Quek, Tony Q.S.
AU - Vincent Poor, H.
N1 - Funding Information:
Manuscript received December 6, 2019; revised March 20, 2020; accepted April 11, 2020. Date of publication April 17, 2020; date of current version June 16, 2020. This work was supported in part by the National Key Research and Development Program under Grant 2018YFB1004800, in part by the National Natural Science Foundation of China under Grant 61872184 and Grant 61727802, in part by the SUTD Growth Plan Grant for AI, in part by the U.S. National Science Foundation under Grant CCF-1908308, and in part by the Princeton Center for Statistics and Machine Learning under a Data X Grant. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Aris Gkoulalas Divanis. (Corresponding authors: Jun Li; Chuan Ma.) Kang Wei and Chuan Ma are with the School of Electrical and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: kang.wei@njust.edu.cn; chuan.ma@njust.edu.cn).
Publisher Copyright:
© 2005-2012 IEEE.
PY - 2020
Y1 - 2020
N2 - Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving clients' private data from being exposed to adversaries. Nevertheless, private information can still be divulged by analyzing uploaded parameters from clients, e.g., weights trained in deep neural networks. In this paper, to effectively prevent information leakage, we propose a novel framework based on the concept of differential privacy (DP), in which artificial noise is added to parameters at the clients' side before aggregating, namely, noising before model aggregation FL (NbAFL). First, we prove that the NbAFL can satisfy DP under distinct protection levels by properly adapting different variances of artificial noise. Then we develop a theoretical convergence bound on the loss function of the trained FL model in the NbAFL. Specifically, the theoretical bound reveals the following three key properties: 1) there is a tradeoff between convergence performance and privacy protection levels, i.e., better convergence performance leads to a lower protection level; 2) given a fixed privacy protection level, increasing the number N of overall clients participating in FL can improve the convergence performance; and 3) there is an optimal number aggregation times (communication rounds) in terms of convergence performance for a given protection level. Furthermore, we propose a K -client random scheduling strategy, where K ( 1\leq K< N ) clients are randomly selected from the N overall clients to participate in each aggregation. We also develop a corresponding convergence bound for the loss function in this case and the K -client random scheduling strategy also retains the above three properties. Moreover, we find that there is an optimal K that achieves the best convergence performance at a fixed privacy level. Evaluations demonstrate that our theoretical results are consistent with simulations, thereby facilitating the design of various privacy-preserving FL algorithms with different tradeoff requirements on convergence performance and privacy levels.
AB - Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving clients' private data from being exposed to adversaries. Nevertheless, private information can still be divulged by analyzing uploaded parameters from clients, e.g., weights trained in deep neural networks. In this paper, to effectively prevent information leakage, we propose a novel framework based on the concept of differential privacy (DP), in which artificial noise is added to parameters at the clients' side before aggregating, namely, noising before model aggregation FL (NbAFL). First, we prove that the NbAFL can satisfy DP under distinct protection levels by properly adapting different variances of artificial noise. Then we develop a theoretical convergence bound on the loss function of the trained FL model in the NbAFL. Specifically, the theoretical bound reveals the following three key properties: 1) there is a tradeoff between convergence performance and privacy protection levels, i.e., better convergence performance leads to a lower protection level; 2) given a fixed privacy protection level, increasing the number N of overall clients participating in FL can improve the convergence performance; and 3) there is an optimal number aggregation times (communication rounds) in terms of convergence performance for a given protection level. Furthermore, we propose a K -client random scheduling strategy, where K ( 1\leq K< N ) clients are randomly selected from the N overall clients to participate in each aggregation. We also develop a corresponding convergence bound for the loss function in this case and the K -client random scheduling strategy also retains the above three properties. Moreover, we find that there is an optimal K that achieves the best convergence performance at a fixed privacy level. Evaluations demonstrate that our theoretical results are consistent with simulations, thereby facilitating the design of various privacy-preserving FL algorithms with different tradeoff requirements on convergence performance and privacy levels.
KW - Federated learning
KW - client selection
KW - convergence performance
KW - differential privacy
KW - information leakage
UR - http://www.scopus.com/inward/record.url?scp=85083775015&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083775015&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2020.2988575
DO - 10.1109/TIFS.2020.2988575
M3 - Article
AN - SCOPUS:85083775015
SN - 1556-6013
VL - 15
SP - 3454
EP - 3469
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
M1 - 9069945
ER -