• # Paper 2, Section II, $30 K$

(a) A ball may be in one of $n$ boxes. A search of the $i^{\text {th }}$box costs $c_{i}>0$ and finds the ball with probability $\alpha_{i}>0$ if the ball is in that box. We are given initial probabilities $\left(P_{i}^{1}, i=1,2, \ldots, n\right)$ that the ball is in the $i^{\text {th }}$box.

Show that the policy which at time $t=1,2, \ldots$ searches the box with the maximal value of $\alpha_{i} P_{i}^{t} / c_{i}$ minimises the expected searching cost until the ball is found, where $P_{i}^{t}$ is the probability (given everything that has occurred up to time $t$ ) that the ball is in box $i$.

(b) Next suppose that a reward $R_{i}>0$ is earned if the ball is found in the $i^{\text {th }}$box. Suppose also that we may decide to stop at any time. Develop the dynamic programming equation for the value function starting from the probability distribution $\left(P_{i}^{t}, i=1,2, \ldots, n\right)$.

Show that if $\sum_{i} c_{i} /\left(\alpha_{i} R_{i}\right)<1$ then it is never optimal to stop searching until the ball is found. In this case, is the policy defined in part (a) optimal?

comment
• # Paper 3, Section II, K

The scalars $x_{t}, y_{t}, u_{t}$ are related by the equations

$x_{t}=x_{t-1}+u_{t-1}, \quad y_{t}=x_{t-1}+\eta_{t-1}, \quad t=1,2, \ldots, T,$

where the initial state $x_{0}$ is normally distributed with mean $\hat{x}_{0}$ and variance 1 and $\left\{\eta_{t}\right\}$ is a sequence of independent random variables each normally distributed with mean 0 and variance 1 . The control variable $u_{t}$ is to be chosen at time $t$ on the basis of information $W_{t}$, where $W_{0}=\left(\hat{x}_{0}\right)$ and

$W_{t}=\left(\hat{x}_{0}, u_{0}, \ldots, u_{t-1}, y_{1}, \ldots, y_{t}\right), \quad t=1,2, \ldots, T$

(a) Let $\hat{x}_{1}, \hat{x}_{2}, \ldots, \hat{x}_{T}$ be the Kalman filter estimates of $x_{1}, x_{2}, \ldots, x_{T}$, i.e.

$\hat{x}_{t}=\hat{x}_{t-1}+u_{t-1}+h_{t}\left(y_{t}-\hat{x}_{t-1}\right)$

where $h_{t}$ is chosen to minimise $\mathbb{E}\left(\left(\hat{x}_{t}-x_{t}\right)^{2} \mid W_{t}\right)$. Calculate $h_{t}$ and show that, conditional on $W_{t}, x_{t}$ is normally distributed with mean $\hat{x}_{t}$ and variance $V_{t}=1 /(1+t)$.

(b) Define

\begin{aligned} &F\left(W_{T}\right)=\mathbb{E}\left(x_{T}^{2} \mid W_{T}\right), \quad \text { and } \\ &F\left(W_{t}\right)=\inf _{u_{t}, \ldots, u_{T-1}} \mathbb{E}\left(x_{T}^{2}+\sum_{j=t}^{T-1} u_{j}^{2} \mid W_{t}\right), \quad t=0, \ldots, T-1 . \end{aligned}

Show that $F\left(W_{t}\right)=\hat{x}_{t}^{2} P_{t}+d_{t}$, where $P_{t}=1 /(T-t+1), d_{T}=1 /(1+T)$ and $d_{t-1}=V_{t-1} V_{t} P_{t}+d_{t}$.

(c) Show that the minimising control $u_{t}$ can be expressed in the form $u_{t}=-K_{t} \hat{x}_{t}$ and find $K_{t}$. How would the expression for $K_{t}$ be altered if $x_{0}$ or $\left\{\eta_{t}\right\}$ had variances other than 1?

comment
• # Paper 4, Section II, $30 K$

Consider the deterministic system

$\dot{x}_{t}=u_{t}$

where $x_{t}$ and $u_{t}$ are scalars. Here $x_{t}$ is the state variable and the control variable $u_{t}$ is to be chosen to minimise, for a fixed $h>0$, the cost

$x_{h}^{2}+\int_{0}^{h} c_{t} u_{t}^{2} \mathrm{~d} t$

where $c_{t}$ is known and $c_{t}>c>0$ for all $t$. Let $F(x, t)$ be the minimal cost from state $x$ and time $t$.

(a) By writing the dynamic programming equation in infinitesimal form and taking the appropriate limit show that $F(x, t)$ satisfies

$\frac{\partial F}{\partial t}=-\inf _{u}\left[c_{t} u^{2}+\frac{\partial F}{\partial x} u\right], \quad t

with boundary condition $F(x, h)=x^{2}$.

(b) Determine the form of the optimal control in the special case where $c_{t}$ is constant, and also in general.

comment