TY - CONF
T1 - THE INFORMATION GEOMETRY OF UNSUPERVISED REINFORCEMENT LEARNING
AU - Eysenbach, Benjamin
AU - Salakhutdinov, Ruslan
AU - Levine, Sergey
N1 - Funding Information:
Acknowledgements. We thank Abhishek Gupta, Alex Alemi, Archit Sharma, Jonathan Ho, Michael Janner, Ruosong Wang, Shane Gu and anonymous reviewers for discussions and feedback on the paper. This material is supported by the Fannie and John Hertz Foundation and the NSF GRFP (DGE1745016).
Funding Information:
We thank Abhishek Gupta, Alex Alemi, Archit Sharma, Jonathan Ho, Michael Janner, Ruosong Wang, Shane Gu and anonymous reviewers for discussions and feedback on the paper. This material is supported by the Fannie and John Hertz Foundation and the NSF GRFP (DGE1745016).
Publisher Copyright:
© 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.
PY - 2022
Y1 - 2022
N2 - How can a reinforcement learning (RL) agent prepare to solve downstream tasks if those tasks are not known a priori? One approach is unsupervised skill discovery, a class of algorithms that learn a set of policies without access to a reward function. Such algorithms bear a close resemblance to representation learning algorithms (e.g., contrastive learning) in supervised learning, in that both are pretraining algorithms that maximize some approximation to a mutual information objective. While prior work has shown that the set of skills learned by such methods can accelerate downstream RL tasks, prior work offers little analysis into whether these skill learning algorithms are optimal, or even what notion of optimality would be appropriate to apply to them. In this work, we show that unsupervised skill discovery algorithms based on mutual information maximization do not learn skills that are optimal for every possible reward function. However, we show that the distribution over skills provides an optimal initialization minimizing regret against adversarially-chosen reward functions, assuming a certain type of adaptation procedure. Our analysis also provides a geometric perspective on these skill learning methods.
AB - How can a reinforcement learning (RL) agent prepare to solve downstream tasks if those tasks are not known a priori? One approach is unsupervised skill discovery, a class of algorithms that learn a set of policies without access to a reward function. Such algorithms bear a close resemblance to representation learning algorithms (e.g., contrastive learning) in supervised learning, in that both are pretraining algorithms that maximize some approximation to a mutual information objective. While prior work has shown that the set of skills learned by such methods can accelerate downstream RL tasks, prior work offers little analysis into whether these skill learning algorithms are optimal, or even what notion of optimality would be appropriate to apply to them. In this work, we show that unsupervised skill discovery algorithms based on mutual information maximization do not learn skills that are optimal for every possible reward function. However, we show that the distribution over skills provides an optimal initialization minimizing regret against adversarially-chosen reward functions, assuming a certain type of adaptation procedure. Our analysis also provides a geometric perspective on these skill learning methods.
UR - http://www.scopus.com/inward/record.url?scp=85131033577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131033577&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85131033577
T2 - 10th International Conference on Learning Representations, ICLR 2022
Y2 - 25 April 2022 through 29 April 2022
ER -