TY - JOUR
T1 - Linear reinforcement learning in planning, grid fields, and cognitive control
AU - Piray, Payam
AU - Daw, Nathaniel D.
N1 - Funding Information:
We thank Tim Behrens and Jon Cohen for helpful discussions. This work was supported by grants IIS-1822571 from the National Science Foundation, part of the CRNCS program, and 61454 from the John Templeton Foundation.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - It is thought that the brain’s judicious reuse of previous computation underlies our ability to plan flexibly, but also that inappropriate reuse gives rise to inflexibilities like habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, here we introduce a model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a default policy. This solution demonstrates connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also provides insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.
AB - It is thought that the brain’s judicious reuse of previous computation underlies our ability to plan flexibly, but also that inappropriate reuse gives rise to inflexibilities like habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, here we introduce a model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a default policy. This solution demonstrates connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also provides insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.
UR - http://www.scopus.com/inward/record.url?scp=85112735523&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112735523&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-25123-3
DO - 10.1038/s41467-021-25123-3
M3 - Article
C2 - 34400622
AN - SCOPUS:85112735523
SN - 2041-1723
VL - 12
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 4942
ER -