A reduction from apprenticeship learning to classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Scopus citations

Abstract

We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification algorithm to learn to imitate the expert's behavior. Although this straightforward learning strategy is widely-used in practice, it has been subject to very little formal analysis. We prove that, if the learned classifier has error rate ε, the difference between the value of the apprentice's policy and the expert's policy is O(√ε). Further, we prove that this difference is only O(ε) when the expert's policy is close to optimal. This latter result has an important practical consequence: Not only does imitating a near-optimal expert result in a better policy, but far fewer demonstrations are required to successfully imitate such an expert. This suggests an opportunity for substantial savings whenever the expert is known to be good, but demonstrations are expensive or difficult to obtain.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 23
Subtitle of host publication24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010
PublisherNeural Information Processing Systems
ISBN (Print)9781617823800
StatePublished - 2010
Externally publishedYes
Event24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 - Vancouver, BC, Canada
Duration: Dec 6 2010Dec 9 2010

Publication series

NameAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010

Other

Other24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010
Country/TerritoryCanada
CityVancouver, BC
Period12/6/1012/9/10

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint

Dive into the research topics of 'A reduction from apprenticeship learning to classification'. Together they form a unique fingerprint.

Cite this