On the Worst-case Analysis of Temporal-difference Learning Algorithms

Robert E. Schapire, Manfred K. Warmuth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

We study the worst-case behavior of a family of learning algorithms based on Sutton's [7] method of temporal differences. In our on-line learning framework, learning takes place in a sequence of trials, and the goal of the learning algorithm is to estimate a discounted sum of all the reinforcements that will be received in the future. In this setting, we are able to prove general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm. These bounds are stated in terms of the performance of the best linear predictor on the given training sequence, and are proved without making any statistical assumptions of any kind about the process producing the learner's observed training sequence. We also prove lower bounds on the performance of any algorithm for this learning problem, and give a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th International Conference on Machine Learning, ICML 1994
EditorsWilliam W. Cohen, Haym Hirsh
PublisherMorgan Kaufmann Publishers, Inc.
Pages266-274
Number of pages9
ISBN (Electronic)1558603352, 9781558603356
DOIs
StatePublished - 1994
Externally publishedYes
Event11th International Conference on Machine Learning, ICML 1994 - New Brunswick, United States
Duration: Jul 10 1994Jul 13 1994

Publication series

NameProceedings of the 11th International Conference on Machine Learning, ICML 1994

Conference

Conference11th International Conference on Machine Learning, ICML 1994
Country/TerritoryUnited States
CityNew Brunswick
Period7/10/947/13/94

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Software
  • Theoretical Computer Science
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'On the Worst-case Analysis of Temporal-difference Learning Algorithms'. Together they form a unique fingerprint.

Cite this