TY - GEN
T1 - Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning
AU - Hanjie, Austin W.
AU - Zhong, Victor
AU - Narasimhan, Karthik
N1 - Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - We investigate the use of natural language to drive the generalization of control policies and introduce the new multi-task environment MESSENGER with free-form text manuals describing the environment dynamics. Unlike previous work, MESSENGER does not assume prior knowledge connecting text and state observations - the control policy must simultaneously ground the game manual to entity symbols and dynamics in the environment. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses an entity-conditioned attention module that allows for selective focus over relevant descriptions in the manual for each entity in the environment. EMMA is end-to-end differentiable and learns a latent grounding of entities and dynamics from text to observations using only environment rewards. EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining a 40% higher win rate compared to multiple baselines. However, win rate on the hardest stage of MESSENGER remains low (10%), demonstrating the need for additional work in this direction.
AB - We investigate the use of natural language to drive the generalization of control policies and introduce the new multi-task environment MESSENGER with free-form text manuals describing the environment dynamics. Unlike previous work, MESSENGER does not assume prior knowledge connecting text and state observations - the control policy must simultaneously ground the game manual to entity symbols and dynamics in the environment. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses an entity-conditioned attention module that allows for selective focus over relevant descriptions in the manual for each entity in the environment. EMMA is end-to-end differentiable and learns a latent grounding of entities and dynamics from text to observations using only environment rewards. EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining a 40% higher win rate compared to multiple baselines. However, win rate on the hardest stage of MESSENGER remains low (10%), demonstrating the need for additional work in this direction.
UR - http://www.scopus.com/inward/record.url?scp=85161306522&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161306522&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85161306522
T3 - Proceedings of Machine Learning Research
SP - 4051
EP - 4062
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
T2 - 38th International Conference on Machine Learning, ICML 2021
Y2 - 18 July 2021 through 24 July 2021
ER -