Signals in human striatum are appropriate for policy update rather than value prediction

Research output: Contribution to journalArticle

74 Scopus citations

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

Original languageEnglish (US)
Pages (from-to)5504-5511
Number of pages8
JournalJournal of Neuroscience
Volume31
Issue number14
DOIs
StatePublished - Apr 6 2011

All Science Journal Classification (ASJC) codes

  • Neuroscience(all)

Fingerprint Dive into the research topics of 'Signals in human striatum are appropriate for policy update rather than value prediction'. Together they form a unique fingerprint.

  • Cite this