TY - JOUR
T1 - No-Pain No-Gain
T2 - DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
AU - Ding, Zhiguo
AU - Schober, Robert
AU - Poor, H. Vincent
N1 - Funding Information:
Manuscript received April 12, 2021; accepted June 2, 2021. Date of publication June 8, 2021; date of current version September 16, 2021. The work of Zhiguo Ding was supported by the UK EPSRC under grant number EP/P009719/2 and by H2020 H2020-MSCA-RISE-2020 under grant number 101006411. The work of H. Vincent Poor was supported by the U.S. National Science Foundation under Grant CCF-1908308. The associate editor coordinating the review of this article and approving it for publication was C. R. Murthy. (Corresponding author: Zhiguo Ding.) Zhiguo Ding is with the Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544 USA, and also with the School of Electrical and Electronic Engineering, The University of Manchester, Manchester M13 9PL, U.K. (e-mail: zhiguo.ding@manchester.ac.uk).
Publisher Copyright:
© 1972-2012 IEEE.
PY - 2021/9
Y1 - 2021/9
N2 - This paper applies machine learning to optimize the transmission policies of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) networks, where time-division multiple access (TDMA) is used to serve multiple primary users and an energy-constrained secondary user is admitted to the primary users' time slots via NOMA. During each time slot, the secondary user performs the two tasks: data transmission and energy harvesting based on the signals received from the primary users. The goal of the paper is to maximize the secondary user's long-term throughput, by optimizing its transmit power and the time-sharing coefficient for its two tasks. The long-term throughput maximization problem is challenging due to the need for making decisions that yield long-term gains but might result in short-term losses. For example, when in a given time slot, a primary user with large channel gains transmits, intuition suggests that the secondary user should not carry out data transmission due to the strong interference from the primary user but perform energy harvesting only, which results in zero data rate for this time slot but yields potential long-term benefits. In this paper, a deep reinforcement learning (DRL) approach is applied to emulate this intuition, where the deep deterministic policy gradient (DDPG) algorithm is employed together with convex optimization. Our simulation results demonstrate that the proposed DRL assisted NOMA transmission scheme can yield significant performance gains over two benchmark schemes.
AB - This paper applies machine learning to optimize the transmission policies of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) networks, where time-division multiple access (TDMA) is used to serve multiple primary users and an energy-constrained secondary user is admitted to the primary users' time slots via NOMA. During each time slot, the secondary user performs the two tasks: data transmission and energy harvesting based on the signals received from the primary users. The goal of the paper is to maximize the secondary user's long-term throughput, by optimizing its transmit power and the time-sharing coefficient for its two tasks. The long-term throughput maximization problem is challenging due to the need for making decisions that yield long-term gains but might result in short-term losses. For example, when in a given time slot, a primary user with large channel gains transmits, intuition suggests that the secondary user should not carry out data transmission due to the strong interference from the primary user but perform energy harvesting only, which results in zero data rate for this time slot but yields potential long-term benefits. In this paper, a deep reinforcement learning (DRL) approach is applied to emulate this intuition, where the deep deterministic policy gradient (DDPG) algorithm is employed together with convex optimization. Our simulation results demonstrate that the proposed DRL assisted NOMA transmission scheme can yield significant performance gains over two benchmark schemes.
KW - Non-orthogonal multiple access
KW - and energy harvesting
KW - cognitive radio communications
KW - deep reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85111032523&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111032523&partnerID=8YFLogxK
U2 - 10.1109/TCOMM.2021.3087624
DO - 10.1109/TCOMM.2021.3087624
M3 - Article
AN - SCOPUS:85111032523
SN - 0090-6778
VL - 69
SP - 5917
EP - 5932
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
IS - 9
ER -