Abstract
Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations.Worse, the representations of these entities are unknown and must be inferred from sensory percepts.We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs.By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.
Original language | English (US) |
---|---|
State | Published - 2023 |
Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: May 1 2023 → May 5 2023 |
Conference
Conference | 11th International Conference on Learning Representations, ICLR 2023 |
---|---|
Country/Territory | Rwanda |
City | Kigali |
Period | 5/1/23 → 5/5/23 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language