TY - GEN
T1 - Apprenticeship learning using linear programming
AU - Syed, Umar
AU - Bowling, Michael
AU - Schapire, Robert E.
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods - up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.
AB - In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods - up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.
UR - http://www.scopus.com/inward/record.url?scp=56449119102&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56449119102&partnerID=8YFLogxK
U2 - 10.1145/1390156.1390286
DO - 10.1145/1390156.1390286
M3 - Conference contribution
AN - SCOPUS:56449119102
SN - 9781605582054
T3 - Proceedings of the 25th International Conference on Machine Learning
SP - 1032
EP - 1039
BT - Proceedings of the 25th International Conference on Machine Learning
PB - Association for Computing Machinery (ACM)
T2 - 25th International Conference on Machine Learning
Y2 - 5 July 2008 through 9 July 2008
ER -