Value-free reinforcement learning: policy optimization as a minimal model of operant behavior

Daniel Bennett, Yael Niv, Angela J. Langdon

Research output: Contribution to journalReview articlepeer-review

21 Scopus citations

Abstract

Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of ‘value’ in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.

Original languageEnglish (US)
Pages (from-to)114-121
Number of pages8
JournalCurrent Opinion in Behavioral Sciences
Volume41
DOIs
StatePublished - Oct 2021

All Science Journal Classification (ASJC) codes

  • Psychiatry and Mental health
  • Cognitive Neuroscience
  • Behavioral Neuroscience

Fingerprint

Dive into the research topics of 'Value-free reinforcement learning: policy optimization as a minimal model of operant behavior'. Together they form a unique fingerprint.

Cite this