Abstract
Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations.Worse, the representations of these entities are unknown and must be inferred from sensory percepts.We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs.By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.
| Original language | English (US) |
|---|---|
| State | Published - 2023 |
| Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: May 1 2023 → May 5 2023 |
Conference
| Conference | 11th International Conference on Learning Representations, ICLR 2023 |
|---|---|
| Country/Territory | Rwanda |
| City | Kigali |
| Period | 5/1/23 → 5/5/23 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language