Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Kai Chieh Hsu, Vicenç Rubies-Royo, Claire J. Tomlin, Jaime F. Fisac

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in the use of reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective (cumulative costs or rewards over time) used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation.

Original languageEnglish (US)
Title of host publicationRobotics
Subtitle of host publicationScience and Systems XVII
EditorsDylan A. Shell, Marc Toussaint, M. Ani Hsieh
PublisherMIT Press Journals
ISBN (Print)9780992374778
DOIs
StatePublished - 2021
Event17th Robotics: Science and Systems, RSS 2021 - Virtual, Online
Duration: Jul 12 2021Jul 16 2021

Publication series

NameRobotics: Science and Systems
ISSN (Electronic)2330-765X

Conference

Conference17th Robotics: Science and Systems, RSS 2021
CityVirtual, Online
Period7/12/217/16/21

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning'. Together they form a unique fingerprint.

Cite this