TY - GEN

T1 - The knowledge gradient for sequential decision making with stochastic binary feedbacks

AU - Wang, Yingfei

AU - Wang, Chu

AU - Powell, Warren Buckler

N1 - Funding Information:
This research was supported in part by AFOSR grant contract FA9550-12-1-0200 for Natural Materials, Systems and Extremophiles and the program in Optimization and Discrete Mathematics.

PY - 2016

Y1 - 2016

N2 - We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.

AB - We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.

UR - http://www.scopus.com/inward/record.url?scp=84999029650&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84999029650&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84999029650

T3 - 33rd International Conference on Machine Learning, ICML 2016

SP - 1761

EP - 1773

BT - 33rd International Conference on Machine Learning, ICML 2016

A2 - Weinberger, Kilian Q.

A2 - Balcan, Maria Florina

PB - International Machine Learning Society (IMLS)

T2 - 33rd International Conference on Machine Learning, ICML 2016

Y2 - 19 June 2016 through 24 June 2016

ER -