TY - GEN
T1 - The Markovian Price of Information
AU - Gupta, Anupam
AU - Jiang, Haotian
AU - Scully, Ziv
AU - Singla, Sahil
N1 - Funding Information:
Acknowledgements. A. Gupta and S. Singla were supported in part by NSF awards CCF1536002, CCF-1540541, and CCF-1617790, and the Indo-US Joint Center for Algorithms Under Uncertainty. H. Jiang was supported in part by CCF-1740551, CCF-1749609, and DMS-1839116. Z. Scully was supported by an ARCS Foundation scholarship and the NSF GRFP under Grant Nos. DGE-1745016 and DGE-125222.
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Suppose there are n Markov chains and we need to pay a per-step price to advance them. The “destination” states of the Markov chains contain rewards; however, we can only get rewards for a subset of them that satisfy a combinatorial constraint, e.g., at most k of them, or they are acyclic in an underlying graph. What strategy should we choose to advance the Markov chains if our goal is to maximize the total reward minus the total price that we pay? In this paper we introduce a Markovian price of information model to capture settings such as the above, where the input parameters of a combinatorial optimization problem are given via Markov chains. We design optimal/approximation algorithms that jointly optimize the value of the combinatorial problem and the total paid price. We also study robustness of our algorithms to the distribution parameters and how to handle the commitment constraint. Our work brings together two classical lines of investigation: getting optimal strategies for Markovian multi-armed bandits, and getting exact and approximation algorithms for discrete optimization problems using combinatorial as well as linear-programming relaxation ideas.
AB - Suppose there are n Markov chains and we need to pay a per-step price to advance them. The “destination” states of the Markov chains contain rewards; however, we can only get rewards for a subset of them that satisfy a combinatorial constraint, e.g., at most k of them, or they are acyclic in an underlying graph. What strategy should we choose to advance the Markov chains if our goal is to maximize the total reward minus the total price that we pay? In this paper we introduce a Markovian price of information model to capture settings such as the above, where the input parameters of a combinatorial optimization problem are given via Markov chains. We design optimal/approximation algorithms that jointly optimize the value of the combinatorial problem and the total paid price. We also study robustness of our algorithms to the distribution parameters and how to handle the commitment constraint. Our work brings together two classical lines of investigation: getting optimal strategies for Markovian multi-armed bandits, and getting exact and approximation algorithms for discrete optimization problems using combinatorial as well as linear-programming relaxation ideas.
KW - Gittins index
KW - Multi-armed bandits
KW - Probing algorithms
UR - http://www.scopus.com/inward/record.url?scp=85065875455&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065875455&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-17953-3_18
DO - 10.1007/978-3-030-17953-3_18
M3 - Conference contribution
AN - SCOPUS:85065875455
SN - 9783030179526
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 233
EP - 246
BT - Integer Programming and Combinatorial Optimization - 20th International Conference, IPCO 2019, Proceedings
A2 - Lodi, Andrea
A2 - Nagarajan, Viswanath
PB - Springer Verlag
T2 - 20th International Conference on Integer Programming and Combinatorial Optimization, IPCO 2019
Y2 - 22 May 2019 through 24 May 2019
ER -