TY - CHAP
T1 - The Next Generation of Optimization
T2 - A Unified Framework for Dynamic Resource Allocation Problems
AU - Powell, Warren B.
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Sequential decision problems arise in a vast range of applications where decisions are followed by new information that was not known when earlier decisions were made. Applications arise in energy, transportation, health, finance, engineering and the sciences. Problem settings may involve managing resources (inventories for vaccines, financial investments, people and equipment), pure learning problems (laboratory testing, computer simulations, field tests) and combinations of the two. The range of problems is so wide that they have been studied by over a dozen distinct academic communities using names such as dynamic programming, reinforcement learning, stochastic control, stochastic programming, active learning, and multiarmed bandit problems. We bring these fields together into a single framework that involves searching for policies which are functions for making decisions. We then identify four classes of policies that span all the approaches used in the academic literature, or in practice. We claim that these four classes of policies are universal – any solution of a sequential decision problem will consist of one of these four classes, or a hybrid of several.
AB - Sequential decision problems arise in a vast range of applications where decisions are followed by new information that was not known when earlier decisions were made. Applications arise in energy, transportation, health, finance, engineering and the sciences. Problem settings may involve managing resources (inventories for vaccines, financial investments, people and equipment), pure learning problems (laboratory testing, computer simulations, field tests) and combinations of the two. The range of problems is so wide that they have been studied by over a dozen distinct academic communities using names such as dynamic programming, reinforcement learning, stochastic control, stochastic programming, active learning, and multiarmed bandit problems. We bring these fields together into a single framework that involves searching for policies which are functions for making decisions. We then identify four classes of policies that span all the approaches used in the academic literature, or in practice. We claim that these four classes of policies are universal – any solution of a sequential decision problem will consist of one of these four classes, or a hybrid of several.
UR - http://www.scopus.com/inward/record.url?scp=85075869042&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075869042&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-28565-4_9
DO - 10.1007/978-3-030-28565-4_9
M3 - Chapter
AN - SCOPUS:85075869042
T3 - Springer Optimization and Its Applications
SP - 47
EP - 52
BT - Springer Optimization and Its Applications
PB - Springer
ER -