TY - GEN
T1 - A reinforcement learning optimization framework for cognitive interference networks
AU - Levorato, Marco
AU - Firouzabadi, Sina
AU - Goldsmith, Andrea
PY - 2011
Y1 - 2011
N2 - A Reinforcement Learning (RL) algorithm for the optimization of secondary user's transmission strategies in cognitive networks is presented. The secondary user minimizes a cost function while generating a bounded performance loss to the primary users' network. The state of the primary users' network, defined as a collection of variables describing features of the network (e.g., buffer state, ARQ state) evolves over time according to a homogeneous Markov process. The statistics of the Markov process is dependent on the strategy of the secondary user and, thus, the instantaneous idleness/transmission action of the secondary user has a long-term impact on the temporal evolution of the network. The proposed RL algorithm finds the optimal randomized past-independent policy from a sample-path of state-cost observations without any a priori knowledge of the statistics of the Markov process. The performance and structure of the policy resulting from the proposed RL algorithm is compared to those of the policy identified by the algorithm in [1].
AB - A Reinforcement Learning (RL) algorithm for the optimization of secondary user's transmission strategies in cognitive networks is presented. The secondary user minimizes a cost function while generating a bounded performance loss to the primary users' network. The state of the primary users' network, defined as a collection of variables describing features of the network (e.g., buffer state, ARQ state) evolves over time according to a homogeneous Markov process. The statistics of the Markov process is dependent on the strategy of the secondary user and, thus, the instantaneous idleness/transmission action of the secondary user has a long-term impact on the temporal evolution of the network. The proposed RL algorithm finds the optimal randomized past-independent policy from a sample-path of state-cost observations without any a priori knowledge of the statistics of the Markov process. The performance and structure of the policy resulting from the proposed RL algorithm is compared to those of the policy identified by the algorithm in [1].
UR - http://www.scopus.com/inward/record.url?scp=84856112756&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856112756&partnerID=8YFLogxK
U2 - 10.1109/Allerton.2011.6120364
DO - 10.1109/Allerton.2011.6120364
M3 - Conference contribution
AN - SCOPUS:84856112756
SN - 9781457718168
T3 - 2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011
SP - 1633
EP - 1640
BT - 2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011
T2 - 2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011
Y2 - 28 September 2011 through 30 September 2011
ER -