TY - GEN
T1 - Learning to use working memory in partially observable environments through dopaminergic reinforcement
AU - Todd, Michael T.
AU - Niv, Yael
AU - Cohen, Jonathan D.
PY - 2009
Y1 - 2009
N2 - Working memory is a central topic of cognitive neuroscience because it is critical for solving real-world problems in which information from multiple temporally distant sources must be combined to generate appropriate behavior. However, an often neglected fact is that learning to use working memory effectively is itself a difficult problem. The Gating framework [1- 4] is a collection of psychological models that show how dopamine can train the basal ganglia and prefrontal cortex to form useful working memory representations in certain types of problems. We unite Gating with machine learning theory concerning the general problem of memory-based optimal control [5-6]. We present a normative model that learns, by online temporal difference methods, to use working memory to maximize discounted future reward in partially observable settings. The model successfully solves a benchmark working memory problem, and exhibits limitations similar to those observed in humans. Our purpose is to introduce a concise, normative definition of high level cognitive concepts such as working memory and cognitive control in terms of maximizing discounted future rewards.
AB - Working memory is a central topic of cognitive neuroscience because it is critical for solving real-world problems in which information from multiple temporally distant sources must be combined to generate appropriate behavior. However, an often neglected fact is that learning to use working memory effectively is itself a difficult problem. The Gating framework [1- 4] is a collection of psychological models that show how dopamine can train the basal ganglia and prefrontal cortex to form useful working memory representations in certain types of problems. We unite Gating with machine learning theory concerning the general problem of memory-based optimal control [5-6]. We present a normative model that learns, by online temporal difference methods, to use working memory to maximize discounted future reward in partially observable settings. The model successfully solves a benchmark working memory problem, and exhibits limitations similar to those observed in humans. Our purpose is to introduce a concise, normative definition of high level cognitive concepts such as working memory and cognitive control in terms of maximizing discounted future rewards.
UR - http://www.scopus.com/inward/record.url?scp=77549088095&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77549088095&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:77549088095
SN - 9781605609492
T3 - Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference
SP - 1689
EP - 1696
BT - Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference
PB - Neural Information Processing Systems
T2 - 22nd Annual Conference on Neural Information Processing Systems, NIPS 2008
Y2 - 8 December 2008 through 11 December 2008
ER -