Paper 2, Section II, 30K30 K

Optimisation and Control | Part II, 2018

(a) A ball may be in one of nn boxes. A search of the ith i^{\text {th }}box costs ci>0c_{i}>0 and finds the ball with probability αi>0\alpha_{i}>0 if the ball is in that box. We are given initial probabilities (Pi1,i=1,2,,n)\left(P_{i}^{1}, i=1,2, \ldots, n\right) that the ball is in the ith i^{\text {th }}box.

Show that the policy which at time t=1,2,t=1,2, \ldots searches the box with the maximal value of αiPit/ci\alpha_{i} P_{i}^{t} / c_{i} minimises the expected searching cost until the ball is found, where PitP_{i}^{t} is the probability (given everything that has occurred up to time tt ) that the ball is in box ii.

(b) Next suppose that a reward Ri>0R_{i}>0 is earned if the ball is found in the ith i^{\text {th }}box. Suppose also that we may decide to stop at any time. Develop the dynamic programming equation for the value function starting from the probability distribution (Pit,i=1,2,,n)\left(P_{i}^{t}, i=1,2, \ldots, n\right).

Show that if ici/(αiRi)<1\sum_{i} c_{i} /\left(\alpha_{i} R_{i}\right)<1 then it is never optimal to stop searching until the ball is found. In this case, is the policy defined in part (a) optimal?

Typos? Please submit corrections to this page on GitHub.