TY - GEN

T1 - A game-theoretic approach to apprenticeship learning

AU - Syed, Umar

AU - Schapire, Robert E.

PY - 2009/12/1

Y1 - 2009/12/1

N2 - We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng [1] who considered a framework in which the true reward function is assumed to be a linear combination of a set of known and observable features. We give a new algorithm that, like theirs, is guaranteed to learn a policy that is nearly as good as the expert's, given enough examples. However, unlike their algorithm, we show that ours may produce a policy that is substantially better than the expert's. Moreover, our algorithm is computationally faster, is easier to implement, and can be applied even in the absence of an expert. The method is based on a game-theoretic view of the problem, which leads naturally to a direct application of the multiplicative-weights algorithm of Freund and Schapire [2] for playing repeated matrix games. In addition to our formal presentation and analysis of the new algorithm, we sketch how the method can be applied when the transition function itself is unknown, and we provide an experimental demonstration of the algorithm on a toy video-game environment.

AB - We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng [1] who considered a framework in which the true reward function is assumed to be a linear combination of a set of known and observable features. We give a new algorithm that, like theirs, is guaranteed to learn a policy that is nearly as good as the expert's, given enough examples. However, unlike their algorithm, we show that ours may produce a policy that is substantially better than the expert's. Moreover, our algorithm is computationally faster, is easier to implement, and can be applied even in the absence of an expert. The method is based on a game-theoretic view of the problem, which leads naturally to a direct application of the multiplicative-weights algorithm of Freund and Schapire [2] for playing repeated matrix games. In addition to our formal presentation and analysis of the new algorithm, we sketch how the method can be applied when the transition function itself is unknown, and we provide an experimental demonstration of the algorithm on a toy video-game environment.

UR - http://www.scopus.com/inward/record.url?scp=84858758690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858758690&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84858758690

SN - 160560352X

SN - 9781605603520

T3 - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

BT - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

T2 - 21st Annual Conference on Neural Information Processing Systems, NIPS 2007

Y2 - 3 December 2007 through 6 December 2007

ER -