Learning to Control in Metric Space with Optimal Regret

Chengzhuo Ni, Lin F. Yang, Mengdi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

We study online reinforcement learning for finite-horizon deterministic control systems with arbitrary state and action spaces. Suppose the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after K episodes is o(DLK)^{\frac{d}{d+1}}H where D is the diameter of the state-action space, L is a smoothness parameter, and d is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret.

Original languageEnglish (US)
Title of host publication2019 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages726-733
Number of pages8
ISBN (Electronic)9781728131511
DOIs
StatePublished - Sep 2019
Externally publishedYes
Event57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019 - Monticello, United States
Duration: Sep 24 2019Sep 27 2019

Publication series

Name2019 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019

Conference

Conference57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019
Country/TerritoryUnited States
CityMonticello
Period9/24/199/27/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Learning to Control in Metric Space with Optimal Regret'. Together they form a unique fingerprint.

Cite this