Percentile performance criteria for limiting average Markov Decision Processes
Abstract
In this paper we address the following basic feasibility
problem for infinite-horizon Markov decision processes
(MDP’s): can a policy be found that achieves a specified value
(target) of the long-run limiting average reward at a specified
probability level (percentile)? Related optimization problems of
maximizing the target for a specified percentile and vice versa
are also considered. We present a complete (and discrete) classification
of both the maximal achievable target levels and of
their corresponding percentiles. We also provide an algorithm for
computing a deterministic policy corresponding to any feasible
target-percentile pair.
Next we consider similar problems for an MDP with multiple
rewards and/or constraints. This case presents some difficulties
and leads to several open problems. An LP-based formulation
provides constructive solutions for most cases.