Science Cast

Optimal composition of multiple value functions for dopamine-mediated efficient, safe and stable learning

librarianOctober 11, 2025 12:48pm

Views (108)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Optimal composition of multiple value functions for dopamine-mediated efficient, safe and stable learning

bioRxivPDFOctober 10, 2025 12:00am

Authors

Mahajan, P.; Seymour, B.

Abstract

The seminal reward prediction error theory of dopamine function faces several key challenges. Most notable is the difficulty learning multiple rewards simultaneously, inefficient on-policy learning, and accounting for heterogeneous striatal responses in the tail of the striatum. We propose a normative framework, based on linear reinforcement learning, that redefines dopamine's computational objective. We propose that dopamine optimises not just cumulative rewards, but a reward value function augmented by a penalty for deviating from a default behavioural policy, which effectively confers value on controllability. Our simulations show that this single modification enables optimal value composition, fast and robust adaptation to changing priorities, safer exploration in the context of threats, and stable learning amid uncertainty. Critically, this unifies disparate striatal observations, parsimoniously reconciling threat and action prediction error signals within the striatal tail. Our framework refines the core principle governing striatal dopamine, bridging theory with neural data and offering testable predictions.

TwitterandLinkedIn

0 comments

Add comment

Optimal composition of multiple value functions for dopamine-mediated efficient, safe and stable learning

Optimal composition of multiple value functions for dopamine-mediated efficient, safe and stable learning

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments