Information state for Markov decision processes with network delays

Sachin Adlakha, Sanjay Lall, Andrea Goldsmith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

We consider a networked control system, where each subsystem evolves as a Markov decision process (MDP). Each subsystem is coupled to its neighbors via communication links over which the signals are delayed, but are otherwise transmitted noise-free. A controller receives delayed state information from each subsystem. Such a networked Markov decision process with delays can be represented as a partially observed Markov decision process (POMDP). We show that this POMDP has a sufficient information state that depends only on a finite history of measurements and control actions. Thus, the POMDP can be converted into an information state MDP, whose state does not grow with time. The optimal controller for networked Markov decision processes can thus be computed using dynamic programming over a finite state space. This result generalizes the previous results on Markov decision processes with delayed state information.

Original languageEnglish (US)
Title of host publicationProceedings of the 47th IEEE Conference on Decision and Control, CDC 2008
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3840-3847
Number of pages8
ISBN (Print)9781424431243
DOIs
StatePublished - 2008
Externally publishedYes
Event47th IEEE Conference on Decision and Control, CDC 2008 - Cancun, Mexico
Duration: Dec 9 2008Dec 11 2008

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Other

Other47th IEEE Conference on Decision and Control, CDC 2008
Country/TerritoryMexico
CityCancun
Period12/9/0812/11/08

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Information state for Markov decision processes with network delays'. Together they form a unique fingerprint.

Cite this