3.II.28I

Optimization and Control | Part II, 2007

Let PP be a discrete-time controllable dynamical system (or Markov decision process) with countable state-space SS and action-space AA. Consider the nn-horizon dynamic optimization problem with instantaneous costs c(k,x,a)c(k, x, a), on choosing action aa in state xx at time kn1k \leqslant n-1, with terminal cost C(x)C(x), in state xx at time nn. Explain what is meant by a Markov control and how the choice of a control gives rise to a time-inhomogeneous Markov chain.

Suppose we can find a bounded function VV and a Markov control uu^{*} such that

V(k,x)(c+PV)(k,x,a),0kn1,xS,aAV(k, x) \leqslant(c+P V)(k, x, a), \quad 0 \leqslant k \leqslant n-1, \quad x \in S, \quad a \in A

with equality when a=u(k,x)a=u^{*}(k, x), and such that V(n,x)=C(x)V(n, x)=C(x) for all xx. Here PV(k,x,a)P V(k, x, a) denotes the expected value of V(k+1,Xk+1)V\left(k+1, X_{k+1}\right), given that we choose action aa in state xx at time kk. Show that uu^{*} is an optimal Markov control.

A well-shuffled pack of cards is placed face-down on the table. The cards are turned over one by one until none are left. Exactly once you may place a bet of £1000£ 1000 on the event that the next two cards will be red. How should you choose the moment to bet? Justify your answer.

Typos? Please submit corrections to this page on GitHub.