TY - GEN
T1 - Primal Dual PPO Learning Resource Allocation in Indoor IRS-Aided Networks
AU - Zhang, Haijun
AU - Liu, Xiangnan
AU - Long, Keping
AU - Poor, H. Vincent
N1 - Funding Information:
VI. ACKNOWLEDGMENT This work was supported by National Key R&D Program of China (2019YFB1803304), the National Natural Science Foundation of China (61822104, 61771044), and the Fundamental Research Funds for the Central Universities (FRFTP-19-002C1, RC1631), Beijing Top Discipline for Artificial Intelligent Science and Engineering, University of Science and Technology Beijing. The corresponding authors are Keping Long and Haijun Zhang.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Terahertz communications is regarded as a promising technology due to its higher bandwidth and narrower beamwidths, which can improve capacity and coverage for indoor wireless users. In this paper, the intelligent reflecting surface (IRS) technique and non-orthogonal multiple access (NOMA) are utilized to compensate drawbacks of indoor transmission mismatch in the terahertz band. Then wireless resource allocation optimization in indoor terahertz IRS-aided systems is transformed into a universal optimization problem with ergodic constraints. With the aid of parametrization features of deep neural networks (DNNs), proximal policy optimization (PPO) is adopted to train the policy and corresponding actions to allocate power and bandwidths. The actor part generates continuous power allocation, and the critic part takes charge of discrete bandwidths allocation. In the design of a deep reinforcement learning (DRL) framework, primal dual ascent is proposed to realize model-free training. Simulation results demonstrate the effectiveness of the primal dual PPO learning algorithm in different settings.
AB - Terahertz communications is regarded as a promising technology due to its higher bandwidth and narrower beamwidths, which can improve capacity and coverage for indoor wireless users. In this paper, the intelligent reflecting surface (IRS) technique and non-orthogonal multiple access (NOMA) are utilized to compensate drawbacks of indoor transmission mismatch in the terahertz band. Then wireless resource allocation optimization in indoor terahertz IRS-aided systems is transformed into a universal optimization problem with ergodic constraints. With the aid of parametrization features of deep neural networks (DNNs), proximal policy optimization (PPO) is adopted to train the policy and corresponding actions to allocate power and bandwidths. The actor part generates continuous power allocation, and the critic part takes charge of discrete bandwidths allocation. In the design of a deep reinforcement learning (DRL) framework, primal dual ascent is proposed to realize model-free training. Simulation results demonstrate the effectiveness of the primal dual PPO learning algorithm in different settings.
UR - http://www.scopus.com/inward/record.url?scp=85127273896&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127273896&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM46510.2021.9685203
DO - 10.1109/GLOBECOM46510.2021.9685203
M3 - Conference contribution
AN - SCOPUS:85127273896
T3 - 2021 IEEE Global Communications Conference, GLOBECOM 2021 - Proceedings
BT - 2021 IEEE Global Communications Conference, GLOBECOM 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Global Communications Conference, GLOBECOM 2021
Y2 - 7 December 2021 through 11 December 2021
ER -