TY - CONF
T1 - MODEL-BASED VISUAL PLANNING WITH SELF-SUPERVISED FUNCTIONAL DISTANCES
AU - Tian, Stephen
AU - Nair, Suraj
AU - Ebert, Frederik
AU - Dasari, Sudeep
AU - Eysenbach, Benjamin
AU - Finn, Chelsea
AU - Levine, Sergey
N1 - Funding Information:
Acknowledgements. We thank students from the Robotic AI and Learning Lab for insightful feedback on earlier drafts of this paper and Aurick Zhou and Danijar Hafner for helpful discussions. This work was supported in part by Schmidt Futures, the Fannie and John Hertz Foundation, the Office of Naval Research (grants N00014-20-1-2675, N00014-16-1-2420, & N00014-19-1-2042), and the National Science Foundation (DGE-1745016 and through an NSF GRFP (GRFP 2018259676)). This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley.
Funding Information:
We thank students from the Robotic AI and Learning Lab for insightful feedback on earlier drafts of this paper and Aurick Zhou and Danijar Hafner for helpful discussions. This work was supported in part by Schmidt Futures, the Fannie and John Hertz Foundation, the Office of Naval Research (grants N00014-20-1-2675, N00014-16-1-2420, & N00014-19-1-2042), and the National Science Foundation (DGE-1745016 and through an NSF GRFP (GRFP 2018259676)). This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley.
Publisher Copyright:
© 2021 ICLR 2021 - 9th International Conference on Learning Representations. All rights reserved.
PY - 2021
Y1 - 2021
N2 - A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when hand-engineered reward functions are not available. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. Our approach learns entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. In our experiments, we find that our method can successfully learn models that perform a variety of tasks at test-time, moving objects amid distractors with a simulated robotic arm and even learning to open and close a drawer using a real-world robot. In comparisons, we find that this approach substantially outperforms both model-free and model-based prior methods. Videos and visualizations are available here: https://sites.google.com/berkeley.edu/mbold.
AB - A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when hand-engineered reward functions are not available. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. Our approach learns entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. In our experiments, we find that our method can successfully learn models that perform a variety of tasks at test-time, moving objects amid distractors with a simulated robotic arm and even learning to open and close a drawer using a real-world robot. In comparisons, we find that this approach substantially outperforms both model-free and model-based prior methods. Videos and visualizations are available here: https://sites.google.com/berkeley.edu/mbold.
UR - http://www.scopus.com/inward/record.url?scp=85110962098&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110962098&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85110962098
T2 - 9th International Conference on Learning Representations, ICLR 2021
Y2 - 3 May 2021 through 7 May 2021
ER -