A Reinforcement Learning (RL) algorithm for the optimization of secondary user's transmission strategies in cognitive networks is presented. The secondary user minimizes a cost function while generating a bounded performance loss to the primary users' network. The state of the primary users' network, defined as a collection of variables describing features of the network (e.g., buffer state, ARQ state) evolves over time according to a homogeneous Markov process. The statistics of the Markov process is dependent on the strategy of the secondary user and, thus, the instantaneous idleness/transmission action of the secondary user has a long-term impact on the temporal evolution of the network. The proposed RL algorithm finds the optimal randomized past-independent policy from a sample-path of state-cost observations without any a priori knowledge of the statistics of the Markov process. The performance and structure of the policy resulting from the proposed RL algorithm is compared to those of the policy identified by the algorithm in .