Distributionally Robust Policy Learning via Adversarial Environment Generation

Allen Z. Ren, Anirudha Majumdar

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.

Original languageEnglish (US)
Pages (from-to)1379-1386
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume7
Issue number2
DOIs
StatePublished - Apr 1 2022

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Biomedical Engineering
  • Human-Computer Interaction
  • Mechanical Engineering
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Control and Optimization
  • Artificial Intelligence

Keywords

  • Continual learning
  • Data sets for robot learning
  • Generalization
  • Grasping
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Distributionally Robust Policy Learning via Adversarial Environment Generation'. Together they form a unique fingerprint.

Cite this