TY - GEN
T1 - Communication-computation efficient gradient coding
AU - Ye, Min
AU - Abbe, Emmanuel
N1 - Publisher Copyright:
© 35th International Conference on Machine Learning, ICML 2018.All Rights Reserved.
PY - 2018
Y1 - 2018
N2 - This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients in terms of three parameters: computation load, straggler tolerance and communication cost. It further gives an explicit coding scheme that achieves the optimal tradeoff based on recursive polynomial constructions, coding both across data subsets and vector components. As a result, the proposed scheme allows to minimize the running time for gradient computations. Implementations are made on Amazon EC2 clusters using Python with mpi4py package. Results show that the proposed scheme maintains the same generalization error while reducing the running time by 32% compared to uncoded schemes and 23% compared to prior coded schemes focusing only on stragglers (Tandon et al., ICML 2017).
AB - This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients in terms of three parameters: computation load, straggler tolerance and communication cost. It further gives an explicit coding scheme that achieves the optimal tradeoff based on recursive polynomial constructions, coding both across data subsets and vector components. As a result, the proposed scheme allows to minimize the running time for gradient computations. Implementations are made on Amazon EC2 clusters using Python with mpi4py package. Results show that the proposed scheme maintains the same generalization error while reducing the running time by 32% compared to uncoded schemes and 23% compared to prior coded schemes focusing only on stragglers (Tandon et al., ICML 2017).
UR - http://www.scopus.com/inward/record.url?scp=85057297083&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057297083&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057297083
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 9716p
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -