REGULARIZATION MATTERS IN POLICY OPTIMIZATION - AN EMPIRICAL STUDY ON CONTINUOUS CONTROL

Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

Research output: Contribution to conferencePaperpeer-review

14 Scopus citations

Abstract

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., L2 regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment, and because the deep RL community focuses more on high-level algorithm designs. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement, especially on harder tasks. We also compare these techniques with the more widely used entropy regularization. Our findings are shown to be robust against training hyperparameter variations. In addition, we study regularizing different components and find that only regularizing the policy network is typically the best. Finally, we discuss and analyze why regularization may help generalization in RL from four perspectives - sample complexity, return distribution, weight norm, and noise robustness. We hope our study provides guidance for future practices in regularizing policy optimization algorithms. Our code is available at https://github.com/xuanlinli17/iclr2021_rlreg.

Original languageEnglish (US)
StatePublished - 2021
Externally publishedYes
Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
Duration: May 3 2021May 7 2021

Conference

Conference9th International Conference on Learning Representations, ICLR 2021
CityVirtual, Online
Period5/3/215/7/21

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'REGULARIZATION MATTERS IN POLICY OPTIMIZATION - AN EMPIRICAL STUDY ON CONTINUOUS CONTROL'. Together they form a unique fingerprint.

Cite this