Finite-time analysis for the knowledge-gradient policy

Yingfei Wang, Warren B. Powell

Research output: Contribution to journalArticlepeer-review

Abstract

We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective on interpreting Bayesian ranking and selection problems as adaptive stochastic multiset maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In addition, we introduce the concept of prior-optimality and provide another insight into the performance of the knowledge-gradient policy based on the submodular assumption on the value of information. We demonstrate submodularity for the two-alternative case and provide other conditions for more general problems, bringing out the issue and importance of submodularity in learning problems. Empirical experiments are conducted to further illustrate the finite-time behavior of the knowledge-gradient policy.

Original languageEnglish (US)
Pages (from-to)1105-1129
Number of pages25
JournalSIAM Journal on Control and Optimization
Volume56
Issue number2
DOIs
StatePublished - Jan 1 2018

All Science Journal Classification (ASJC) codes

  • Control and Optimization
  • Applied Mathematics

Keywords

  • Ranking and selection
  • Sequential decision analysis
  • Stochastic control

Fingerprint Dive into the research topics of 'Finite-time analysis for the knowledge-gradient policy'. Together they form a unique fingerprint.

Cite this