TY - GEN
T1 - Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks
AU - Hu, Ye
AU - Chen, Mingzhe
AU - Saad, Walid
AU - Poor, H. Vincent
AU - Cui, Shuguang
N1 - Funding Information:
This work was supported by the Key Area R&D Program of Guangdong Province with grant No. 2018B030338001, and in part by the Natural Science Foundation of China with grant NSFC-61629101, and by the U.S. National Science Foundation under Grants CNS-1617896, CCF-0939370 and CCF-1908308.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - In this paper, the design of an optimal trajectory for an energy-constrained drone operating in dynamic network environments is studied. In the considered model, a drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable. In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests. To this end, a metalearning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments, by tuning a reinforcement learning (RL) solution. The meta-learning algorithm provides a solution that adapts the DBS in novel environments quickly based on limited former experiences. The meta-tuned RL is shown to yield a faster convergence to the optimal coverage in unseen environments with a considerably low computation complexity, compared to the baseline policy gradient algorithm. Simulation results show that, the proposed meta-learning solution yields a 25% improvement in the convergence speed, and about 10% improvement in the DBS' communication performance, compared to a baseline policy gradient algorithm. Meanwhile, the probability that the DBS serves over 50% of user requests increases about 27%, compared to the baseline policy gradient algorithm.
AB - In this paper, the design of an optimal trajectory for an energy-constrained drone operating in dynamic network environments is studied. In the considered model, a drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable. In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests. To this end, a metalearning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments, by tuning a reinforcement learning (RL) solution. The meta-learning algorithm provides a solution that adapts the DBS in novel environments quickly based on limited former experiences. The meta-tuned RL is shown to yield a faster convergence to the optimal coverage in unseen environments with a considerably low computation complexity, compared to the baseline policy gradient algorithm. Simulation results show that, the proposed meta-learning solution yields a 25% improvement in the convergence speed, and about 10% improvement in the DBS' communication performance, compared to a baseline policy gradient algorithm. Meanwhile, the probability that the DBS serves over 50% of user requests increases about 27%, compared to the baseline policy gradient algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85099887801&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099887801&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM42002.2020.9322414
DO - 10.1109/GLOBECOM42002.2020.9322414
M3 - Conference contribution
AN - SCOPUS:85099887801
T3 - 2020 IEEE Global Communications Conference, GLOBECOM 2020 - Proceedings
BT - 2020 IEEE Global Communications Conference, GLOBECOM 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE Global Communications Conference, GLOBECOM 2020
Y2 - 7 December 2020 through 11 December 2020
ER -