TY - JOUR
T1 - Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods
AU - Singh, Sumeet
AU - Lacotte, Jonathan
AU - Majumdar, Anirudha
AU - Pavone, Marco
N1 - Funding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors were partially supported by the Office of Naval Research (ONR), Science of Autonomy Program (contract number N00014-15-1-2673) and by the Toyota Research Institute (TRI). This article solely reflects the opinions and conclusions of its authors and not ONR, TRI, or any other Toyota entity.
Publisher Copyright:
© The Author(s) 2018.
PY - 2018/12/1
Y1 - 2018/12/1
N2 - The literature on inverse reinforcement learning (IRL) typically assumes that humans take actions to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive (RS) IRL to explicitly account for a human’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk neutral to worst case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human’s underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with 10 human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk averse to risk neutral in a data-efficient manner. Moreover, comparisons of the RS-IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.
AB - The literature on inverse reinforcement learning (IRL) typically assumes that humans take actions to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive (RS) IRL to explicitly account for a human’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk neutral to worst case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human’s underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with 10 human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk averse to risk neutral in a data-efficient manner. Moreover, comparisons of the RS-IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.
KW - coherent risk measures
KW - non-parametric method
KW - risk-sensitive inverse reinforcement learning
KW - semi-parametric method
UR - http://www.scopus.com/inward/record.url?scp=85047389334&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047389334&partnerID=8YFLogxK
U2 - 10.1177/0278364918772017
DO - 10.1177/0278364918772017
M3 - Article
AN - SCOPUS:85047389334
SN - 0278-3649
VL - 37
SP - 1713
EP - 1740
JO - International Journal of Robotics Research
JF - International Journal of Robotics Research
IS - 13-14
ER -