Apprenticeship learning using linear programming

Umar Syed, Michael Bowling, Robert E. Schapire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

180 Scopus citations

Abstract

In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods - up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.

Original languageEnglish (US)
Title of host publicationProceedings of the 25th International Conference on Machine Learning
PublisherAssociation for Computing Machinery (ACM)
Pages1032-1039
Number of pages8
ISBN (Print)9781605582054
DOIs
StatePublished - 2008
Externally publishedYes
Event25th International Conference on Machine Learning - Helsinki, Finland
Duration: Jul 5 2008Jul 9 2008

Publication series

NameProceedings of the 25th International Conference on Machine Learning

Other

Other25th International Conference on Machine Learning
Country/TerritoryFinland
CityHelsinki
Period7/5/087/9/08

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Apprenticeship learning using linear programming'. Together they form a unique fingerprint.

Cite this