Abstract
Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 1379-1386 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 7 |
| Issue number | 2 |
| DOIs | |
| State | Published - Apr 1 2022 |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Biomedical Engineering
- Human-Computer Interaction
- Mechanical Engineering
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Control and Optimization
- Artificial Intelligence
Keywords
- Continual learning
- Data sets for robot learning
- Generalization
- Grasping
- Reinforcement learning
Fingerprint
Dive into the research topics of 'Distributionally Robust Policy Learning via Adversarial Environment Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver