TY - JOUR
T1 - Value-free reinforcement learning
T2 - policy optimization as a minimal model of operant behavior
AU - Bennett, Daniel
AU - Niv, Yael
AU - Langdon, Angela J.
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/10
Y1 - 2021/10
N2 - Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of ‘value’ in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.
AB - Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of ‘value’ in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.
UR - http://www.scopus.com/inward/record.url?scp=85107060032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107060032&partnerID=8YFLogxK
U2 - 10.1016/j.cobeha.2021.04.020
DO - 10.1016/j.cobeha.2021.04.020
M3 - Review article
C2 - 36341023
AN - SCOPUS:85107060032
SN - 2352-1546
VL - 41
SP - 114
EP - 121
JO - Current Opinion in Behavioral Sciences
JF - Current Opinion in Behavioral Sciences
ER -