GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

Kyle B. Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Image and video generative models that are pretrained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate sub-goals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models and low-level controllers. For example, generative models may predict photo-realistic yet physically infeasible frames that confuse low-level policies. Low-level policies may also be sensitive to subtle visual artifacts in generated goal images. This paper addresses these two facets of generalization, providing an interface to effectively 'glue together' language-conditioned image or video prediction models with low-level goal-conditioned policies. Our method, Generative Hierarchical Imitation Learning-Glue (GHIL-Glue), filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts. We find in extensive experiments in both simulated and real environments that GHIL-Glue achieves a 25% improvement across several hierarchical models that leverage generative subgoals, achieving a new state-of-the-art on the CALVIN simulation benchmark for policies using observations from a single RGB camera. GHIL-Glue also outperforms other generalist robot policies across 3/4 language-conditioned manipulation tasks testing zero-shot generalization in physical experiments. Code, model checkpoints, videos, and supplementary materials can be found at https://ghil-glue.github.io.

Original languageEnglish (US)
Title of host publication2025 IEEE International Conference on Robotics and Automation, ICRA 2025
EditorsChristian Ott, Henny Admoni, Sven Behnke, Stjepan Bogdan, Aude Bolopion, Youngjin Choi, Fanny Ficuciello, Nicholas Gans, Clement Gosselin, Kensuke Harada, Erdal Kayacan, H. Jin Kim, Stefan Leutenegger, Zhe Liu, Perla Maiolino, Lino Marques, Takamitsu Matsubara, Anastasia Mavromatti, Mark Minor, Jason O'Kane, Hae Won Park, Hae-Won Park, Ioannis Rekleitis, Federico Renda, Elisa Ricci, Laurel D. Riek, Lorenzo Sabattini, Shaojie Shen, Yu Sun, Pierre-Brice Wieber, Katsu Yamane, Jingjin Yu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9516-9524
Number of pages9
ISBN (Electronic)9798331541392
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Robotics and Automation, ICRA 2025 - Atlanta, United States
Duration: May 19 2025May 23 2025

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2025 IEEE International Conference on Robotics and Automation, ICRA 2025
Country/TerritoryUnited States
CityAtlanta
Period5/19/255/23/25

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'GHIL-Glue: Hierarchical Control with Filtered Subgoal Images'. Together they form a unique fingerprint.

Cite this