Opportunistic spectrum access is a viable technique for cognitive radio (CR) networks to address the spectrum scarcity problem, where both spectrum sensing and resource allocation (SSRA) are significant to the system throughput performance. Previous works on SSRA often require complete network statistics which may not be feasible given the time-varying nature of practical CR networks. In this paper, we propose a learning-based optimization framework for SSRA in multi-band-multi-user CR networks. We develop a dynamic cooperative spectrum sensing strategy which allows secondary users to detect available spectrum bands of the primary user, followed by flexible power allocation for efficient data transmissions. To cope with the dynamic of channel and resource statistics, we propose an improved deep reinforcement learning scheme based on a maximum entropy-enabled actor critic algorithm. Numerical results demonstrate the superiority of our approach over existing schemes.