Optimal learning for sequential sampling with non-parametric beliefs

Emre Barut, Warren B. Powell

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a different bandwidth to achieve better aggregation. The final estimate uses a weighting scheme with the inverse mean square errors of the kernel estimators as weights. This weighting scheme is shown to be optimal under independent kernel estimators. For choosing the measurement, we employ the knowledge gradient policy that relies on predictive distributions to calculate the optimal sampling point. Our method allows a setting where the beliefs are expected to be correlated but the correlation structure is unknown beforehand. Moreover, the proposed policy is shown to be asymptotically optimal.

Original languageEnglish (US)
Pages (from-to)517-543
Number of pages27
JournalJournal of Global Optimization
Volume58
Issue number3
DOIs
StatePublished - Mar 2014

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Management Science and Operations Research
  • Control and Optimization
  • Applied Mathematics

Keywords

  • Bayesian global optimization
  • Knowledge gradient
  • Non-parametric estimation

Fingerprint Dive into the research topics of 'Optimal learning for sequential sampling with non-parametric beliefs'. Together they form a unique fingerprint.

Cite this