TY - GEN

T1 - The Markovian Price of Information

AU - Gupta, Anupam

AU - Jiang, Haotian

AU - Scully, Ziv

AU - Singla, Sahil

N1 - Funding Information:
Acknowledgements. A. Gupta and S. Singla were supported in part by NSF awards CCF1536002, CCF-1540541, and CCF-1617790, and the Indo-US Joint Center for Algorithms Under Uncertainty. H. Jiang was supported in part by CCF-1740551, CCF-1749609, and DMS-1839116. Z. Scully was supported by an ARCS Foundation scholarship and the NSF GRFP under Grant Nos. DGE-1745016 and DGE-125222.
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.

PY - 2019

Y1 - 2019

N2 - Suppose there are n Markov chains and we need to pay a per-step price to advance them. The “destination” states of the Markov chains contain rewards; however, we can only get rewards for a subset of them that satisfy a combinatorial constraint, e.g., at most k of them, or they are acyclic in an underlying graph. What strategy should we choose to advance the Markov chains if our goal is to maximize the total reward minus the total price that we pay? In this paper we introduce a Markovian price of information model to capture settings such as the above, where the input parameters of a combinatorial optimization problem are given via Markov chains. We design optimal/approximation algorithms that jointly optimize the value of the combinatorial problem and the total paid price. We also study robustness of our algorithms to the distribution parameters and how to handle the commitment constraint. Our work brings together two classical lines of investigation: getting optimal strategies for Markovian multi-armed bandits, and getting exact and approximation algorithms for discrete optimization problems using combinatorial as well as linear-programming relaxation ideas.

AB - Suppose there are n Markov chains and we need to pay a per-step price to advance them. The “destination” states of the Markov chains contain rewards; however, we can only get rewards for a subset of them that satisfy a combinatorial constraint, e.g., at most k of them, or they are acyclic in an underlying graph. What strategy should we choose to advance the Markov chains if our goal is to maximize the total reward minus the total price that we pay? In this paper we introduce a Markovian price of information model to capture settings such as the above, where the input parameters of a combinatorial optimization problem are given via Markov chains. We design optimal/approximation algorithms that jointly optimize the value of the combinatorial problem and the total paid price. We also study robustness of our algorithms to the distribution parameters and how to handle the commitment constraint. Our work brings together two classical lines of investigation: getting optimal strategies for Markovian multi-armed bandits, and getting exact and approximation algorithms for discrete optimization problems using combinatorial as well as linear-programming relaxation ideas.

KW - Gittins index

KW - Multi-armed bandits

KW - Probing algorithms

UR - http://www.scopus.com/inward/record.url?scp=85065875455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065875455&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-17953-3_18

DO - 10.1007/978-3-030-17953-3_18

M3 - Conference contribution

AN - SCOPUS:85065875455

SN - 9783030179526

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 233

EP - 246

BT - Integer Programming and Combinatorial Optimization - 20th International Conference, IPCO 2019, Proceedings

A2 - Lodi, Andrea

A2 - Nagarajan, Viswanath

PB - Springer Verlag

T2 - 20th International Conference on Integer Programming and Combinatorial Optimization, IPCO 2019

Y2 - 22 May 2019 through 24 May 2019

ER -