TY - GEN
T1 - Communication-Constrained Distributed Learning
T2 - 2023 IEEE Global Communications Conference, GLOBECOM 2023
AU - Yu, Siyuan
AU - Chen, Wei
AU - Poor, H. Vincent
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Distributed machine learning including federated learning has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect the user privacy. As one of key enablers of distributed learning, asynchronous optimization allows multiple workers to process data simultaneously without paying a cost of synchronization delay. However, given limited communication bandwidth, asynchronous optimization can be hampered by gradient staleness, which severely hinders the learning process. In this paper, we present a communication-constrained distributed learning scheme, in which asynchronous stochastic gradients generated by parallel workers are transmitted over a shared medium or link. Our aim is to minimize the average training time by striking the optimal tradeoff between the number of parallel workers and their gradient staleness. To this end, a queueing theoretic model is formulated, which allows us to find the optimal number of workers participating in the asynchronous optimization. Furthermore, we also leverage the packet arrival time at the parameter server, also referred to as Timing Side Information (TSI), to compress the staleness information for the stalenessaware Asynchronous Stochastic Gradients Descent (Asyn-SGD). Numerical results demonstrate the substantial reduction of training time owing to both the worker selection and TSI-aided compression of staleness information.
AB - Distributed machine learning including federated learning has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect the user privacy. As one of key enablers of distributed learning, asynchronous optimization allows multiple workers to process data simultaneously without paying a cost of synchronization delay. However, given limited communication bandwidth, asynchronous optimization can be hampered by gradient staleness, which severely hinders the learning process. In this paper, we present a communication-constrained distributed learning scheme, in which asynchronous stochastic gradients generated by parallel workers are transmitted over a shared medium or link. Our aim is to minimize the average training time by striking the optimal tradeoff between the number of parallel workers and their gradient staleness. To this end, a queueing theoretic model is formulated, which allows us to find the optimal number of workers participating in the asynchronous optimization. Furthermore, we also leverage the packet arrival time at the parameter server, also referred to as Timing Side Information (TSI), to compress the staleness information for the stalenessaware Asynchronous Stochastic Gradients Descent (Asyn-SGD). Numerical results demonstrate the substantial reduction of training time owing to both the worker selection and TSI-aided compression of staleness information.
KW - Asynchronous optimization
KW - federated learning
KW - gradient staleness
KW - stochastic gradient descent
KW - timing side information
UR - http://www.scopus.com/inward/record.url?scp=85187365396&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187365396&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM54140.2023.10437351
DO - 10.1109/GLOBECOM54140.2023.10437351
M3 - Conference contribution
AN - SCOPUS:85187365396
T3 - Proceedings - IEEE Global Communications Conference, GLOBECOM
SP - 1495
EP - 1500
BT - GLOBECOM 2023 - 2023 IEEE Global Communications Conference
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 December 2023 through 8 December 2023
ER -