TY - GEN
T1 - Self-consistent trajectory autoencoder
T2 - 35th International Conference on Machine Learning, ICML 2018
AU - Co-Reyes, John D.
AU - Liu, Yu Xuan
AU - Gupta, Abhishek
AU - Eysenbach, Benjamin
AU - Abbeel, Pieter
AU - Levine, Sergey
N1 - Publisher Copyright:
© 2018 35th International Conference on Machine Learning, ICML 2018. All rights reserved.
PY - 2018
Y1 - 2018
N2 - In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcomc of closcd loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.
AB - In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcomc of closcd loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.
UR - http://www.scopus.com/inward/record.url?scp=85057249782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057249782&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057249782
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 1637
EP - 1647
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Krause, Andreas
A2 - Dy, Jennifer
PB - International Machine Learning Society (IMLS)
Y2 - 10 July 2018 through 15 July 2018
ER -