Timing and Partial Observability in the Dopamine System

Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

Original languageEnglish (US)
Title of host publicationNIPS 2002
Subtitle of host publicationProceedings of the 15th International Conference on Neural Information Processing Systems
EditorsSuzanna Becker, Sebastian Thrun, Klaus Obermayer
PublisherMIT Press Journals
Pages83-90
Number of pages8
ISBN (Electronic)0262025507, 9780262025508
StatePublished - 2002
Externally publishedYes
Event15th International Conference on Neural Information Processing Systems, NIPS 2002 - Vancouver, Canada
Duration: Dec 9 2002Dec 14 2002

Publication series

NameNIPS 2002: Proceedings of the 15th International Conference on Neural Information Processing Systems

Conference

Conference15th International Conference on Neural Information Processing Systems, NIPS 2002
Country/TerritoryCanada
CityVancouver
Period12/9/0212/14/02

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'Timing and Partial Observability in the Dopamine System'. Together they form a unique fingerprint.

Cite this