A two-state partially observable Markov decision process with three actions

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

A process can be in either a stable or an unstable state interchangeably. The true state is unobservable and can only be inferred from observations. Three actions are available: continue with the process (CON), repair the process for a certain fee - bring the process to the stable state (REP), and obtain the state of the process for a cost (INS). The objective is to maximize the expected discounted value of the total future profits. We formulate the problem as a discrete-time Partially Observable Markov Decision Process (POMDP). We show that the expected profit function is convex and strictly increasing, and that the optimal policy has either one or two control limits. Also, we show that "dominance in expectation" (the expected revenue is larger in the stable state than in the unstable state) suffices for a control limit structure.

Original languageEnglish
Pages (from-to)957-967
Number of pages11
JournalEuropean Journal of Operational Research
Volume254
Issue number3
DOIs
StatePublished - 1 Nov 2016

Bibliographical note

Publisher Copyright:
© 2016 Elsevier B.V. All rights reserved.

Keywords

  • Control limits
  • Decision processes
  • Markov chains
  • POMDP

Fingerprint

Dive into the research topics of 'A two-state partially observable Markov decision process with three actions'. Together they form a unique fingerprint.

Cite this