Terahertz communications is regarded as a promising technology due to its higher bandwidth and narrower beamwidths, which can improve capacity and coverage for indoor wireless users. In this paper, the intelligent reflecting surface (IRS) technique and non-orthogonal multiple access (NOMA) are utilized to compensate drawbacks of indoor transmission mismatch in the terahertz band. Then wireless resource allocation optimization in indoor terahertz IRS-aided systems is transformed into a universal optimization problem with ergodic constraints. With the aid of parametrization features of deep neural networks (DNNs), proximal policy optimization (PPO) is adopted to train the policy and corresponding actions to allocate power and bandwidths. The actor part generates continuous power allocation, and the critic part takes charge of discrete bandwidths allocation. In the design of a deep reinforcement learning (DRL) framework, primal dual ascent is proposed to realize model-free training. Simulation results demonstrate the effectiveness of the primal dual PPO learning algorithm in different settings.