Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

  • Clement Gehring
  • , Masataro Asai
  • , Rohan Chitnis
  • , Tom Silver
  • , Leslie Kaelbling
  • , Shirin Sohrabi
  • , Michael Katz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Scopus citations

Abstract

Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain. The source code and the appendix are available at github.com/ibm/pddlrl and arxiv.org/abs/2109.14830.

Original languageEnglish (US)
Title of host publicationProceedings of the 32nd International Conference on Automated Planning and Scheduling, ICAPS 2022
EditorsAkshat Kumar, Sylvie Thiebaux, Pradeep Varakantham, William Yeoh
PublisherAssociation for the Advancement of Artificial Intelligence
Pages588-596
Number of pages9
ISBN (Electronic)9781577358749
DOIs
StatePublished - Jun 13 2022
Externally publishedYes
Event32nd International Conference on Automated Planning and Scheduling, ICAPS 2022 - Virtual, Online, Singapore
Duration: Jun 13 2022Jun 24 2022

Publication series

NameProceedings International Conference on Automated Planning and Scheduling, ICAPS
Volume32
ISSN (Print)2334-0835
ISSN (Electronic)2334-0843

Conference

Conference32nd International Conference on Automated Planning and Scheduling, ICAPS 2022
Country/TerritorySingapore
CityVirtual, Online
Period6/13/226/24/22

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators'. Together they form a unique fingerprint.

Cite this