TY - JOUR
T1 - Scaling Laws for Imitation Learning in Single-Agent Games
AU - Tuyls, Jens
AU - Madeka, Dhruv
AU - Torkkola, Kari
AU - Foster, Dean P.
AU - Narasimhan, Karthik
AU - Kakade, Sham
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior (Wen et al., 2020; Jacob et al., 2022), even in constrained environments like single-agent games (De Haan et al., 2019; Hambro et al., 2022b). However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) (Kaplan et al., 2020; Hoffmann et al., 2022) where “scaling up” has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws (and variations of them) for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find our best agent outperforms the prior state-of-the-art by 1.7x in the offline setting. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as helps narrow the gap between the learner and the expert in NetHack, a game that remains elusively hard for current AI systems.
AB - Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior (Wen et al., 2020; Jacob et al., 2022), even in constrained environments like single-agent games (De Haan et al., 2019; Hambro et al., 2022b). However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) (Kaplan et al., 2020; Hoffmann et al., 2022) where “scaling up” has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws (and variations of them) for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find our best agent outperforms the prior state-of-the-art by 1.7x in the offline setting. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as helps narrow the gap between the learner and the expert in NetHack, a game that remains elusively hard for current AI systems.
UR - http://www.scopus.com/inward/record.url?scp=85219573954&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85219573954&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85219573954
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -