TY - GEN
T1 - Risk-sensitive inverse reinforcement learning via coherent risk models
AU - Majumdar, Anirudha
AU - Singh, Sumeet
AU - Mandlekar, Ajay
AU - Pavone, Marco
N1 - Publisher Copyright:
© 2017 MIT Press Journals. All rights reserved.
PY - 2017
Y1 - 2017
N2 - The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for an expert's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk metrics, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient algorithms based on Linear Programming for inferring an expert's underlying risk metric and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively.
AB - The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for an expert's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk metrics, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient algorithms based on Linear Programming for inferring an expert's underlying risk metric and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively.
UR - http://www.scopus.com/inward/record.url?scp=85048804859&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048804859&partnerID=8YFLogxK
U2 - 10.15607/rss.2017.xiii.069
DO - 10.15607/rss.2017.xiii.069
M3 - Conference contribution
AN - SCOPUS:85048804859
T3 - Robotics: Science and Systems
BT - Robotics
A2 - Amato, Nancy
A2 - Srinivasa, Siddhartha
A2 - Ayanian, Nora
A2 - Kuindersma, Scott
PB - MIT Press Journals
T2 - 2017 Robotics: Science and Systems, RSS 2017
Y2 - 12 July 2017 through 16 July 2017
ER -