DIFFUSION POLICY POLICY OPTIMIZATION

  • Allen Z. Ren
  • , Justin Lidard
  • , Lars L. Ankile
  • , Anthony Simeonov
  • , Pulkit Agrawal
  • , Anirudha Majumdar
  • , Benjamin Burchfiel
  • , Hongkai Dai
  • , Max Simchowitz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy (Chi et al., 2024b)) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. Through experimental investigation, we find that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization, leading to structured and on-manifold exploration, stable training, and strong policy robustness. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware in a long-horizon, multi-stage manipulation task. Webpage with code: diffusion-ppo.github.io.

Original languageEnglish (US)
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages58018-58059
Number of pages42
ISBN (Electronic)9798331320850
StatePublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: Apr 24 2025Apr 28 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period4/24/254/28/25

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'DIFFUSION POLICY POLICY OPTIMIZATION'. Together they form a unique fingerprint.

Cite this