Abstract
A process can be in either a stable or an unstable state interchangeably. The true state is unobservable and can only be inferred from observations. Three actions are available: continue with the process (CON), repair the process for a certain fee - bring the process to the stable state (REP), and obtain the state of the process for a cost (INS). The objective is to maximize the expected discounted value of the total future profits. We formulate the problem as a discrete-time Partially Observable Markov Decision Process (POMDP). We show that the expected profit function is convex and strictly increasing, and that the optimal policy has either one or two control limits. Also, we show that "dominance in expectation" (the expected revenue is larger in the stable state than in the unstable state) suffices for a control limit structure.
Original language | English |
---|---|
Pages (from-to) | 957-967 |
Number of pages | 11 |
Journal | European Journal of Operational Research |
Volume | 254 |
Issue number | 3 |
DOIs | |
State | Published - 1 Nov 2016 |
Bibliographical note
Publisher Copyright:© 2016 Elsevier B.V. All rights reserved.
Keywords
- Control limits
- Decision processes
- Markov chains
- POMDP