TY - GEN

T1 - A Bayesian network approach to control of networked Markov decision processes

AU - Adlakha, Sachin

AU - Lall, Sanjay

AU - Goldsmith, Andrea

PY - 2008

Y1 - 2008

N2 - We consider the problem of finding an optimal feedback controller for a networked Markov decision process. Specifically, we consider a network of interconnected subsystems, where each subsystem evolves as a Markov decision process (MDP). A subsystem is connected to its neighbors via links over which signals are delayed. We consider centralized control of such networked MDPs. The controller receives delayed state information from each of the subsystem, and it chooses control actions for all subsystems. Such networked MDPs can be represented as partially observed Markov decision processes (POMDPs). We model such a POMDP as a Bayesian network and show that an optimal controller requires only a finite history of past states and control actions. The result is based on the idea that given certain past states and actions, the current state of the networked MDP is independent of the earlier states and actions. This dependence on only the finite past states and actions makes the computation of controllers for networked MDPs tractable.

AB - We consider the problem of finding an optimal feedback controller for a networked Markov decision process. Specifically, we consider a network of interconnected subsystems, where each subsystem evolves as a Markov decision process (MDP). A subsystem is connected to its neighbors via links over which signals are delayed. We consider centralized control of such networked MDPs. The controller receives delayed state information from each of the subsystem, and it chooses control actions for all subsystems. Such networked MDPs can be represented as partially observed Markov decision processes (POMDPs). We model such a POMDP as a Bayesian network and show that an optimal controller requires only a finite history of past states and control actions. The result is based on the idea that given certain past states and actions, the current state of the networked MDP is independent of the earlier states and actions. This dependence on only the finite past states and actions makes the computation of controllers for networked MDPs tractable.

UR - http://www.scopus.com/inward/record.url?scp=64549128207&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=64549128207&partnerID=8YFLogxK

U2 - 10.1109/ALLERTON.2008.4797592

DO - 10.1109/ALLERTON.2008.4797592

M3 - Conference contribution

AN - SCOPUS:64549128207

SN - 9781424429264

T3 - 46th Annual Allerton Conference on Communication, Control, and Computing

SP - 446

EP - 451

BT - 46th Annual Allerton Conference on Communication, Control, and Computing

T2 - 46th Annual Allerton Conference on Communication, Control, and Computing

Y2 - 24 September 2008 through 26 September 2008

ER -