TY - GEN
T1 - Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks
AU - Yang, Shuoguang
AU - Zhang, Xuezhou
AU - Wang, Mengdi
N1 - Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Bilevel optimization have gained growing interests, with numerous applications being found in meta learning, minimax games, reinforcement learning, and nested composition optimization. This paper studies the problem of decentralized distributed stochastic bilevel optimization over a network where each agent can only communicate with its neighbors, and gives examples from multi-task, multi-agent learning and federated learning. In this paper, we propose a gossip-based decentralized bilevel learning algorithm that allows networked agents to solve both the inner and outer optimization problems in a single timescale and share information through network propagation. We show that our algorithm enjoys the Õ(1/Kε2) per-agent sample complexity for general nonconvex bilevel optimization and Õ(1/Kε) for Polyak-Łojasiewicz objectives, achieving a speedup that scales linearly with the network size K. The sample complexities are optimal in both ε and K. We test our algorithm on the examples of hyperparameter tuning and decentralized reinforcement learning. Simulated experiments confirmed that our algorithm achieves the state-of-the-art training efficiency and test accuracy.
AB - Bilevel optimization have gained growing interests, with numerous applications being found in meta learning, minimax games, reinforcement learning, and nested composition optimization. This paper studies the problem of decentralized distributed stochastic bilevel optimization over a network where each agent can only communicate with its neighbors, and gives examples from multi-task, multi-agent learning and federated learning. In this paper, we propose a gossip-based decentralized bilevel learning algorithm that allows networked agents to solve both the inner and outer optimization problems in a single timescale and share information through network propagation. We show that our algorithm enjoys the Õ(1/Kε2) per-agent sample complexity for general nonconvex bilevel optimization and Õ(1/Kε) for Polyak-Łojasiewicz objectives, achieving a speedup that scales linearly with the network size K. The sample complexities are optimal in both ε and K. We test our algorithm on the examples of hyperparameter tuning and decentralized reinforcement learning. Simulated experiments confirmed that our algorithm achieves the state-of-the-art training efficiency and test accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85143077207&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143077207&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85143077207
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Y2 - 28 November 2022 through 9 December 2022
ER -