Abstract
We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.
Original language | English (US) |
---|---|
Pages (from-to) | 167-198 |
Number of pages | 32 |
Journal | Machine Learning |
Volume | 65 |
Issue number | 1 |
DOIs | |
State | Published - Oct 2006 |
All Science Journal Classification (ASJC) codes
- Software
- Artificial Intelligence
Keywords
- Adaptive learning
- Approximate dynamic programming
- Kalman filter
- Stochastic stepsize