TY - GEN
T1 - Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition
AU - Jin, Chi
AU - Jin, Tiancheng
AU - Luo, Haipeng
AU - Sra, Suvrit
AU - Yu, Tiancheng
N1 - Publisher Copyright:
© 2020 by the Authors.
PY - 2020
Y1 - 2020
N2 - We consider the task of learning in episodic finitehorizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves O(LjXj p jAjT) regret with high probability, where L is the horizon, jXj the number of states, jAj the number of actions, and T the number of episodes. To our knowledge, our algorithm is the first to ensure O( p T) regret in this challenging setting; in fact it achieves the same regret as (Rosenberg and Mansour, 2019a) who consider the easier setting with full-information. Our key contributions are two-fold: A tighter confidence set for the transition function; and an optimistic loss estimator that is inversely weighted by an upper occupancy bound.
AB - We consider the task of learning in episodic finitehorizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves O(LjXj p jAjT) regret with high probability, where L is the horizon, jXj the number of states, jAj the number of actions, and T the number of episodes. To our knowledge, our algorithm is the first to ensure O( p T) regret in this challenging setting; in fact it achieves the same regret as (Rosenberg and Mansour, 2019a) who consider the easier setting with full-information. Our key contributions are two-fold: A tighter confidence set for the transition function; and an optimistic loss estimator that is inversely weighted by an upper occupancy bound.
UR - http://www.scopus.com/inward/record.url?scp=85105193467&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105193467&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105193467
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 4810
EP - 4819
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -