Sequential decision problems arise in a vast range of applications where decisions are followed by new information that was not known when earlier decisions were made. Applications arise in energy, transportation, health, finance, engineering and the sciences. Problem settings may involve managing resources (inventories for vaccines, financial investments, people and equipment), pure learning problems (laboratory testing, computer simulations, field tests) and combinations of the two. The range of problems is so wide that they have been studied by over a dozen distinct academic communities using names such as dynamic programming, reinforcement learning, stochastic control, stochastic programming, active learning, and multiarmed bandit problems. We bring these fields together into a single framework that involves searching for policies which are functions for making decisions. We then identify four classes of policies that span all the approaches used in the academic literature, or in practice. We claim that these four classes of policies are universal – any solution of a sequential decision problem will consist of one of these four classes, or a hybrid of several.