TY - JOUR
T1 - Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge
AU - Amiri, Mohammad Mohammadi
AU - Gunduz, Deniz
AU - Kulkarni, Sanjeev R.
AU - Vincent Poor, H.
N1 - Funding Information:
Manuscript received April 8, 2020; revised August 13, 2020 and December 4, 2020; accepted January 10, 2021. Date of publication January 27, 2021; date of current version June 10, 2021. This work was supported in part by the U.S. National Science Foundation under Grant CCF-0939370 and Grant CCF-1908308, in part by the European Research Council Starting BEACON Grant 677854, and in part by the U.K. EPSRC under Grant EP/T023600/1. The associate editor coordinating the review of this article and approving it for publication was L.-C. Wang. (Corresponding author: Mohammad Mohammadi Amiri.) Mohammad Mohammadi Amiri, Sanjeev R. Kulkarni, and H. Vincent Poor are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: mamiri@princeton.edu; kulkarni@princeton.edu; poor@princeton.edu).
Publisher Copyright:
© 2002-2012 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.
AB - We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.
KW - Federated learning
KW - stochastic gradient descent
KW - update aware device selection
UR - http://www.scopus.com/inward/record.url?scp=85100507147&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100507147&partnerID=8YFLogxK
U2 - 10.1109/TWC.2021.3052681
DO - 10.1109/TWC.2021.3052681
M3 - Article
AN - SCOPUS:85100507147
SN - 1536-1276
VL - 20
SP - 3643
EP - 3658
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 6
M1 - 9337227
ER -