TY - JOUR
T1 - Multi-Agent Reinforcement Learning for Cooperative Coded Caching via Homotopy Optimization
AU - Wu, Xiongwei
AU - Li, Jun
AU - Xiao, Ming
AU - Ching, P. C.
AU - Vincent Poor, H.
N1 - Funding Information:
Manuscript received June 14, 2020; revised December 30, 2020; accepted March 5, 2021. Date of publication March 23, 2021; date of current version August 12, 2021. This work was supported in part by the U.S. National Science Foundation under Grant CCF-1908308, in part by the EU Marie Sklodowska-Curie Actions Project entitled “High-reliability Low-latency Communications with Network Coding,” in part by the Swedish Foundation for International Cooperation in Research and Higher Education (STINT), project “Efficient and Secure Distributed Machine Learning with Gradient Descend,” and in part by the National Natural Science Foundation of China under Grant 61872184. The associate editor coordinating the review of this article and approving it for publication was X. Cheng. (Corresponding authors: Jun Li; Xiongwei Wu.) Xiongwei Wu and P. C. Ching are with the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, SAR, China (e-mail: xwwu@ee.cuhk.edu.hk; pcching@ee.cuhk.edu.hk).
Publisher Copyright:
© 2002-2012 IEEE.
PY - 2021/8
Y1 - 2021/8
N2 - Introducing cooperative coded caching into small cell networks is a promising approach to reducing traffic loads. By encoding content via maximum distance separable (MDS) codes, coded fragments can be collectively cached at small-cell base stations (SBSs) to enhance caching efficiency. However, content popularity is usually time-varying and unknown in practice. As a result, cached content is anticipated to be intelligently updated by taking into account limited caching storage and interactive impacts among SBSs. In response to these challenges, we propose a multi-agent deep reinforcement learning (DRL) framework to intelligently update cached content in dynamic environments. With the goal of minimizing long-term expected fronthaul traffic loads, we first model dynamic coded caching as a cooperative multi-agent Markov decision process. Owing to the use of MDS coding, the resulting decision-making falls into a class of constrained reinforcement learning problems with continuous decision variables. To deal with this difficulty, we custom-build a novel DRL algorithm by embedding homotopy optimization into a deep deterministic policy gradient formalism. Next, to empower the caching framework with an effective trade-off between complexity and performance, we propose centralized, and partially and fully decentralized caching controls by applying the derived DRL approach. Simulation results demonstrate the superior performance of the proposed multi-agent framework.
AB - Introducing cooperative coded caching into small cell networks is a promising approach to reducing traffic loads. By encoding content via maximum distance separable (MDS) codes, coded fragments can be collectively cached at small-cell base stations (SBSs) to enhance caching efficiency. However, content popularity is usually time-varying and unknown in practice. As a result, cached content is anticipated to be intelligently updated by taking into account limited caching storage and interactive impacts among SBSs. In response to these challenges, we propose a multi-agent deep reinforcement learning (DRL) framework to intelligently update cached content in dynamic environments. With the goal of minimizing long-term expected fronthaul traffic loads, we first model dynamic coded caching as a cooperative multi-agent Markov decision process. Owing to the use of MDS coding, the resulting decision-making falls into a class of constrained reinforcement learning problems with continuous decision variables. To deal with this difficulty, we custom-build a novel DRL algorithm by embedding homotopy optimization into a deep deterministic policy gradient formalism. Next, to empower the caching framework with an effective trade-off between complexity and performance, we propose centralized, and partially and fully decentralized caching controls by applying the derived DRL approach. Simulation results demonstrate the superior performance of the proposed multi-agent framework.
KW - MDS codes
KW - Small cell networks
KW - deep multi-agent reinforcement learning
KW - homotopy optimization
UR - http://www.scopus.com/inward/record.url?scp=85103245099&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103245099&partnerID=8YFLogxK
U2 - 10.1109/TWC.2021.3066458
DO - 10.1109/TWC.2021.3066458
M3 - Article
AN - SCOPUS:85103245099
SN - 1536-1276
VL - 20
SP - 5258
EP - 5272
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 8
M1 - 9384286
ER -