TY - GEN
T1 - Keep CALM and explore
T2 - 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
AU - Yao, Shunyu
AU - Rao, Rohan
AU - Hausknecht, Matthew
AU - Narasimhan, Karthik
N1 - Funding Information:
Gracious thanks to Jacqeline Ashwell for running ClubFloyd and agreeing to our use of the collected transcripts. We thank Danqi Chen, Jimmy Yang, Jens Tuyls, and other colleagues from Princeton NLP group for proofreading and discussion. We also thank reviewers for constructive feedbacks. This research was partially funded by the Center for Statistics and Machine Learning at Princeton University through support from Microsoft.
Publisher Copyright:
© 2020 Association for Computational Linguistics.
PY - 2020
Y1 - 2020
N2 - Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize in-game rewards. We evaluate our approach using the Jericho benchmark (Hausknecht et al., 2019a), on games unseen by CALM during training. Our method obtains a 69% relative improvement in average game score over the previous state-of-the-art model. Surprisingly, on half of these games, CALM is competitive with or better than other models that have access to ground truth admissible actions.
AB - Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize in-game rewards. We evaluate our approach using the Jericho benchmark (Hausknecht et al., 2019a), on games unseen by CALM during training. Our method obtains a 69% relative improvement in average game score over the previous state-of-the-art model. Surprisingly, on half of these games, CALM is competitive with or better than other models that have access to ground truth admissible actions.
UR - http://www.scopus.com/inward/record.url?scp=85108687879&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108687879&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85108687879
T3 - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 8736
EP - 8754
BT - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 16 November 2020 through 20 November 2020
ER -