TY - GEN
T1 - Distributed Stochastic Optimization with Random Communication and Computational Delays
T2 - 59th Annual IEEE International Conference on Communications, ICC 2024
AU - Yu, Siyuan
AU - Chen, Wei
AU - Poor, H. Vincent
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Distributed stochastic optimization has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect user privacy in decentralized machine learning. However, the staggers and limited bandwidth may induce random computational and communication delays, thereby severely hindering the optimization or learning process. As a result, we are interested in the optimal policies and their performance analysis for latency-aware distributed Stochastic Gradient Descent (SGD). To understand the effect of staleness and error of gradients in distributed optimization, both of which may determine the convergence time, we present a unified framework based on the stochastic delay differential equation to characterize the random convergence time. It is interestingly found that the average convergence time is much more sensitive to the gradient staleness rather than its error. To provide further insights, we show that the time cost of fully asynchronous SGD is approximately determined by the product of the gradient staleness and the 2-norm of the Hessian matrix of the objective function. Moreover, small staleness may slightly accelerate the SGD, while large staleness will result in its divergence.
AB - Distributed stochastic optimization has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect user privacy in decentralized machine learning. However, the staggers and limited bandwidth may induce random computational and communication delays, thereby severely hindering the optimization or learning process. As a result, we are interested in the optimal policies and their performance analysis for latency-aware distributed Stochastic Gradient Descent (SGD). To understand the effect of staleness and error of gradients in distributed optimization, both of which may determine the convergence time, we present a unified framework based on the stochastic delay differential equation to characterize the random convergence time. It is interestingly found that the average convergence time is much more sensitive to the gradient staleness rather than its error. To provide further insights, we show that the time cost of fully asynchronous SGD is approximately determined by the product of the gradient staleness and the 2-norm of the Hessian matrix of the objective function. Moreover, small staleness may slightly accelerate the SGD, while large staleness will result in its divergence.
KW - Asynchronous optimization
KW - federated learning
KW - gradient staleness
KW - stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85197369376&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197369376&partnerID=8YFLogxK
U2 - 10.1109/ICC51166.2024.10622795
DO - 10.1109/ICC51166.2024.10622795
M3 - Conference contribution
AN - SCOPUS:85197369376
T3 - IEEE International Conference on Communications
SP - 3791
EP - 3796
BT - ICC 2024 - IEEE International Conference on Communications
A2 - Valenti, Matthew
A2 - Reed, David
A2 - Torres, Melissa
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 June 2024 through 13 June 2024
ER -