TY - GEN
T1 - Reinforcement Learning for Classical Planning
T2 - 32nd International Conference on Automated Planning and Scheduling, ICAPS 2022
AU - Gehring, Clement
AU - Asai, Masataro
AU - Chitnis, Rohan
AU - Silver, Tom
AU - Kaelbling, Leslie
AU - Sohrabi, Shirin
AU - Katz, Michael
N1 - Publisher Copyright:
© 2022, Association for the Advancement of Artificial Intelligence.
PY - 2022/6/13
Y1 - 2022/6/13
N2 - Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain. The source code and the appendix are available at github.com/ibm/pddlrl and arxiv.org/abs/2109.14830.
AB - Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain. The source code and the appendix are available at github.com/ibm/pddlrl and arxiv.org/abs/2109.14830.
UR - https://www.scopus.com/pages/publications/85136603171
UR - https://www.scopus.com/pages/publications/85136603171#tab=citedBy
U2 - 10.1609/icaps.v32i1.19846
DO - 10.1609/icaps.v32i1.19846
M3 - Conference contribution
AN - SCOPUS:85136603171
T3 - Proceedings International Conference on Automated Planning and Scheduling, ICAPS
SP - 588
EP - 596
BT - Proceedings of the 32nd International Conference on Automated Planning and Scheduling, ICAPS 2022
A2 - Kumar, Akshat
A2 - Thiebaux, Sylvie
A2 - Varakantham, Pradeep
A2 - Yeoh, William
PB - Association for the Advancement of Artificial Intelligence
Y2 - 13 June 2022 through 24 June 2022
ER -