Optimization And Control

Jump to year

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001

2017

Paper 2, Section II, K
Optimization and Control | Part II, 2017
During each of $N$ time periods a venture capitalist, Vicky, is presented with an investment opportunity for which the rate of return for that period is a random variable; the rates of return in successive periods are independent identically distributed random variables with distributions concentrated on $[-1, \infty)$ . Thus, if $x_{n}$ is Vicky's capital at period $n$ , then $x_{n+1}=\left(1-p_{n}\right) x_{n}+p_{n} x_{n}\left(1+R_{n}\right)$ , where $p_{n} \in[0,1]$ is the proportion of her capital she chooses to invest at period $n$ , and $R_{n}$ is the rate of return for period $n$ . Vicky desires to maximize her expected yield over $N$ periods, where the yield is defined as $\left(\frac{x_{N}}{x_{0}}\right)^{\frac{1}{N}}-1$ , and $x_{0}$ and $x_{N}$ are respectively her initial and final capital.
(a) Express the problem of finding an optimal policy in a dynamic programming framework.
(b) Show that in each time period, the optimal strategy can be expressed in terms of the quantity $p^{*}$ which solves the optimization problem $\max _{p} \mathbb{E}\left(1+p R_{1}\right)^{1 / N}$ . Show that $p^{*}>0$ if $\mathbb{E} R_{1}>0$ . [Do not calculate $p^{*}$ explicitly.]
(c) Compare her optimal policy with the policy which maximizes her expected final capital $x_{N}$ .
comment
Paper 3, Section II, K
Optimization and Control | Part II, 2017
A particle follows a discrete-time trajectory on $\mathbb{R}$ given by
$x_{t+1}=\left(A x_{t}+u_{t}\right) \xi_{t}+\epsilon_{t}$
for $t=1,2, \ldots, T$ . Here $T \geqslant 2$ is a fixed integer, $A$ is a real constant, $x_{t}$ and $u_{t}$ are the position of the particle and control action at time $t$ , respectively, and $\left(\xi_{t}, \epsilon_{t}\right)_{t=1}^{T}$ is a sequence of independent random vectors with
$\mathbb{E} \xi_{t}=\mathbb{E} \epsilon_{t}=0, \operatorname{var}\left(\xi_{t}\right)=V_{\xi}>0, \operatorname{var}\left(\epsilon_{t}\right)=V_{\epsilon}>0 \text { and } \operatorname{cov}\left(\xi_{t}, \epsilon_{t}\right)=0$
Find the optimal control, i.e. the control action $u_{t}$ , defined as a function of $\left(x_{1}, \ldots, x_{t} ; u_{1}, \ldots, u_{t-1}\right)$ , that minimizes
$\sum_{t=1}^{T} x_{t}^{2}+c \sum_{t=1}^{T-1} u_{t}^{2}$
where $c>0$ is given.
On which of $V_{\epsilon}$ and $V_{\xi}$ does the optimal control depend?
Find the limiting form of the optimal control as $T \rightarrow \infty$ , and the minimal average cost per unit time.
comment
Paper 4, Section II, $29 \mathrm{~K}$
Optimization and Control | Part II, 2017
A file of $X$ gigabytes (GB) is to be transmitted over a communications link. At each time $t$ the sender can choose a transmission rate $u(t)$ within the range $[0,1]$ GB per second. The charge for transmitting at rate $u(t)$ at time $t$ is $u(t) p(t)$ . The function $p$ is fully known at time $t=0$ . If it takes a total time $T$ to transmit the file then there is a delay cost of $\gamma T^{2}, \gamma>0$ . Thus $u$ and $T$ are to be chosen to minimize
$\int_{0}^{T} u(t) p(t) d t+\gamma T^{2}$
where $u(t) \in[0,1], d x(t) / d t=-u(t), x(0)=X$ and $x(T)=0$ . Using Pontryagin's maximum principle, or otherwise, show that a property of the optimal policy is that there exists $p^{*}$ such that $u(t)=1$ if $p(t)<p^{*}$ and $u(t)=0$ if $p(t)>p^{*}$ .
Show that the optimal $p^{*}$ and $T$ are related by $p^{*}=p(T)+2 \gamma T$ .
Suppose $p(t)=t+1 / t$ and $X=1$ . Show that it is optimal to transmit at a constant rate $u(t)=1$ between times $T-1 \leqslant t \leqslant T$ , where $T$ is the unique positive solution to the equation
$\frac{1}{(T-1) T}=2 \gamma T+1$
comment

2016

Paper 2, Section II, K
Optimization and Control | Part II, 2016
Consider a Markov decision problem with finite state space $X$ , value function $F$ and dynamic programming equation $F=\mathcal{L} F$ , where
$(\mathcal{L} \phi)(i)=\min _{a \in\{0,1\}}\left\{c(i, a)+\beta \sum_{j \in X} P_{i j}(a) \phi(j)\right\} .$
Suppose $0<\beta<1$ , and $|c(i, a)| \leqslant B$ for all $i \in X, a \in\{0,1\}$ . Prove there exists a deterministic stationary Markov policy that is optimal, explaining what the italicised words mean.
Let $F_{n}=\mathcal{L}^{n} F_{0}$ , where $F_{0}=0$ , and $M_{n}=\max _{i \in X}\left|F(i)-F_{n}(i)\right|$ . Prove that
$M_{n} \leqslant \beta M_{n-1} \leqslant \beta^{n} B /(1-\beta) .$
Deduce that the value iteration algorithm converges to an optimal policy in a finite number of iterations.
comment
Paper 3, Section II, K
Optimization and Control | Part II, 2016
Consider the system in scalar variables, for $t=1,2, \ldots, h$ :
$\begin{aligned} x_{t} &=x_{t-1}+u_{t-1} \\ y_{t} &=x_{t-1}+\eta_{t} \\ \hat{x}_{0} &=x_{0}+\eta_{0} \end{aligned}$
where $\hat{x}_{0}$ is given, $y_{t}, u_{t}$ are observed at $t$ , but $x_{0}, x_{1}, \ldots$ and $\eta_{0}, \eta_{1}, \ldots$ are unobservable, and $\eta_{0}, \eta_{1}, \ldots$ are independent random variables with mean 0 and variance $v$ . Define $\hat{x}_{t-1}$ to be the estimator of $x_{t-1}$ with minimum variance amongst all estimators that are unbiased and linear functions of $W_{t-1}=\left(\hat{x}_{0}, y_{1}, \ldots, y_{t-1}, u_{0}, \ldots, u_{t-2}\right)$ . Suppose $\hat{x}_{t-1}=a^{T} W_{t-1}$ and its variance is $V_{t-1}$ . After observation at $t$ of $\left(y_{t}, u_{t-1}\right)$ , a new unbiased estimator of $x_{t-1}$ , linear in $W_{t}$ , is expressed
$x_{t-1}^{*}=(1-H) b^{T} W_{t-1}+H y_{t}$
Find $b$ and $H$ to minimize the variance of $x_{t-1}^{*}$ . Hence find $\hat{x}_{t}$ in terms of $\hat{x}_{t-1}, y_{t}, u_{t-1}$ , $V_{t-1}$ and $v$ . Calculate $V_{h}$ .
Suppose $\eta_{0}, \eta_{1}, \ldots$ are Gaussian and thus $\hat{x}_{t}=E\left[x_{t} \mid W_{t}\right]$ . Consider minimizing $E\left[x_{h}^{2}+\sum_{t=0}^{h-1} u_{t}^{2}\right]$ , under the constraint that the control $u_{t}$ can only depend on $W_{t}$ . Show that the value function of dynamic programming for this problem can be expressed
$F\left(W_{t}\right)=\Pi_{t} \hat{x}_{t}^{2}+\cdots$
where $F\left(W_{h}\right)=\hat{x}_{h}^{2}+V_{h}$ and $+\cdots$ is independent of $W_{t}$ and linear in $v$ .
comment
Paper 4, Section II, K
Optimization and Control | Part II, 2016
State transversality conditions that can be used with Pontryagin's maximum principle and say when they are helpful.
Given $T$ , it is desired to maximize $c_{1} x_{1}(T)+c_{2} x_{2}(T)$ , where
$\begin{aligned} &\dot{x}_{1}=u_{1}\left(a_{1} x_{1}+a_{2} x_{2}\right), \\ &\dot{x}_{2}=u_{2}\left(a_{1} x_{1}+a_{2} x_{2}\right), \end{aligned}$
and $u=\left(u_{1}, u_{2}\right)$ is a time-varying control such that $u_{1} \geqslant 0, u_{2} \geqslant 0$ and $u_{1}+u_{2}=1$ . Suppose that $x_{1}(0)$ and $x_{2}(0)$ are positive, and that $0<a_{2}<a_{1}$ and $0<c_{1}<c_{2}$ . Find the optimal control at times close to $T$ . Show that over $[0, T]$ the optimal control is constant, or makes exactly one switch, the latter happening if and only if
$c_{2} e^{a_{2} T}<c_{1}+\frac{a_{1} c_{2}}{a_{2}}\left(e^{a_{2} T}-1\right)$
comment

2015

Paper 2, Section II, $26 \mathrm{~K}$
Optimization and Control | Part II, 2015
As a function of policy $\pi$ and initial state $x$ , let
$F(\pi, x)=E_{\pi}\left[\sum_{t=0}^{\infty} \beta^{t} r\left(x_{t}, u_{t}\right) \mid x_{0}=x\right]$
where $\beta \geqslant 1$ and $r(x, u) \geqslant 0$ for all $x, u$ . Suppose that for a specific policy $\pi$ , and all $x$ ,
$F(\pi, x)=\sup _{u}\left\{r(x, u)+\beta E\left[F\left(\pi, x_{1}\right) \mid x_{0}=x, u_{0}=u\right]\right\} .$
Prove that $F(\pi, x) \geqslant F\left(\pi^{\prime}, x\right)$ for all $\pi^{\prime}$ and $x$ .
A gambler plays games in which he may bet 1 or 2 pounds, but no more than his present wealth. Suppose he has $x_{t}$ pounds after $t$ games. If he bets $i$ pounds then $x_{t+1}=x_{t}+i$ , or $x_{t+1}=x_{t}-i$ , with probabilities $p_{i}$ and $1-p_{i}$ respectively. Gambling terminates at the first $\tau$ such that $x_{\tau}=0$ or $x_{\tau}=100$ . His final reward is $(9 / 8)^{\tau / 2} x_{\tau}$ . Let $\pi$ be the policy of always betting 1 pound. Given $p_{1}=1 / 3$ , show that $F(\pi, x) \propto x 2^{x / 2}$ .
Is $\pi$ optimal when $p_{2}=1 / 4$ ?
comment
Paper 3, Section II, K
Optimization and Control | Part II, 2015
A burglar having wealth $x$ may retire, or go burgling another night, in either of towns 1 or 2 . If he burgles in town $i$ then with probability $p_{i}=1-q_{i}$ he will, independently of previous nights, be caught, imprisoned and lose all his wealth. If he is not caught then his wealth increases by 0 or $2 a_{i}$ , each with probability $1 / 2$ and independently of what happens on other nights. Values of $p_{i}$ and $a_{i}$ are the same every night. He wishes to maximize his expected wealth at the point he retires, is imprisoned, or $s$ nights have elapsed.
Using the dynamic programming equation
$F_{s}(x)=\max \left\{x, q_{1} E F_{s-1}\left(x+R_{1}\right), q_{2} E F_{s-1}\left(x+R_{2}\right)\right\}$
with $R_{j}, F_{0}(x)$ appropriately defined, prove that there exists an optimal policy under which he burgles another night if and only if his wealth is less than $x^{*}=\max _{i}\left\{a_{i} q_{i} / p_{i}\right\}$ .
Suppose $q_{1}>q_{2}$ and $q_{1} a_{1}>q_{2} a_{2}$ . Prove that he should never burgle in town 2 .
[Hint: Suppose $x<x^{*}$ , there are $s$ nights to go, and it has been shown that he ought not burgle in town 2 if less than $s$ nights remain. For the case $a_{2}>a_{1}$ , separately consider subcases $x+2 a_{2} \geqslant x^{*}$ and $x+2 a_{2}<x^{*}$ . An interchange argument may help.]
comment
Paper 4, Section II, $25 K$
Optimization and Control | Part II, 2015
Consider the scalar system evolving as
$x_{t}=x_{t-1}+u_{t-1}+\epsilon_{t}, \quad t=1,2, \ldots,$
where $\left\{\epsilon_{t}\right\}_{t=1}^{\infty}$ is a white noise sequence with $E \epsilon_{t}=0$ and $E \epsilon_{t}^{2}=v$ . It is desired to choose controls $\left\{u_{t}\right\}_{t=0}^{h-1}$ to minimize $E\left[\sum_{t=0}^{h-1}\left(\frac{1}{2} x_{t}^{2}+u_{t}^{2}\right)+x_{h}^{2}\right]$ . Show that for $h=6$ the minimal cost is $x_{0}^{2}+6 v$ .
Find a constant $\lambda$ and a function $\phi$ which solve
$\phi(x)+\lambda=\min _{u}\left[\frac{1}{2} x^{2}+u^{2}+E \phi\left(x+u+\epsilon_{1}\right)\right]$
Let $P$ be the class of those policies for which every $u_{t}$ obeys the constraint $\left(x_{t}+u_{t}\right)^{2} \leqslant(0.9) x_{t}^{2}$ . Show that $E_{\pi} \phi\left(x_{t}\right) \leqslant x_{0}^{2}+10 v$ , for all $\pi \in P$ . Find, and prove optimal, a policy which over all $\pi \in P$ minimizes
$\lim _{h \rightarrow \infty} \frac{1}{h} E_{\pi}\left[\sum_{t=0}^{h-1}\left(\frac{1}{2} x_{t}^{2}+u_{t}^{2}\right)\right]$
comment

2014

Paper 2, Section II, J
Optimization and Control | Part II, 2014
Describe the elements of a discrete-time stochastic dynamic programming equation for the problem of maximizing the expected sum of non-negative rewards over an infinite horizon. Give an example to show that there may not exist an optimal policy. Prove that if a policy has a value function that satisfies the dynamic programming equation then the policy is optimal.
A squirrel collects nuts for the coming winter. There are plenty of nuts lying around, but each time the squirrel leaves its lair it risks being caught by a predator. Assume that the outcomes of the squirrel's journeys are independent, that it is caught with probability $p$ , and that it returns safely with a random weight of nuts, exponentially distributed with parameter $\lambda$ . By solving the dynamic programming equation for the value function $F(x)$ , find a policy maximizing the expected weight of nuts collected for the winter. Here the state variable $x$ takes values in $\mathbb{R}_{+}$ (the weight of nuts so far collected) or $-1$ (a no-return state when the squirrel is caught).
comment
Paper 3, Section II, J
Optimization and Control | Part II, 2014
A particle follows a discrete-time trajectory on $\mathbb{R}$ given by
$x_{t+1}=A x_{t}+\xi_{t} u_{t}+\epsilon_{t}$
for $t=1,2, \ldots, T$ , where $T \geqslant 2$ is a fixed integer, $A$ is a real constant, $x_{t}$ is the position of the particle and $u_{t}$ is the control action at time $t$ , and $\left(\xi_{t}, \epsilon_{t}\right)_{t=1}^{T}$ is a sequence of independent random vectors with $\mathbb{E} \xi_{t}=\mathbb{E} \epsilon_{t}=0, \operatorname{var}\left(\xi_{t}\right)=V_{\xi}>0, \operatorname{var}\left(\epsilon_{t}\right)=V_{\epsilon}>0$ and $\operatorname{cov}\left(\xi_{t}, \epsilon_{t}\right)=0$ .
Find the closed-loop control, i.e. the control action $u_{t}$ defined as a function of $\left(x_{1}, \ldots, x_{t} ; u_{1}, \ldots, u_{t-1}\right)$ , that minimizes
$\sum_{t=1}^{T} x_{t}^{2}+c \sum_{t=1}^{T-1} u_{t}$
where $c>0$ is given. [Note that this function is quadratic in $x$ , but linear in $u$ .]
Does the closed-loop control depend on $V_{\epsilon}$ or on $V_{\xi}$ ? Deduce the form of the optimal open-loop control.
comment
Paper 4, Section II, J
Optimization and Control | Part II, 2014
A girl begins swimming from a point $(0,0)$ on the bank of a straight river. She swims at a constant speed $v$ relative to the water. The speed of the downstream current at a distance $y$ from the shore is $c(y)$ . Hence her trajectory is described by
$\dot{x}=v \cos \theta+c(y), \quad \dot{y}=v \sin \theta$
where $\theta$ is the angle at which she swims relative to the direction of the current.
She desires to reach a downstream point $(1,0)$ on the same bank as she starts, as quickly as possible. Construct the Hamiltonian for this problem, and describe how Pontryagin's maximum principle can be used to give necessary conditions that must hold on an optimal trajectory. Given that $c(y)$ is positive, increasing and differentiable in $y$ , show that on an optimal trajectory
$\frac{d}{d t} \tan (\theta(t))=-c^{\prime}(y(t))$
comment

2013

Paper 2, Section II, K
Optimization and Control | Part II, 2013
Suppose $\left\{x_{t}\right\}_{t \geqslant 0}$ is a Markov chain. Consider the dynamic programming equation
$F_{s}(x)=\max \left\{r(x), \beta E\left[F_{s-1}\left(x_{1}\right) \mid x_{0}=x\right]\right\}, \quad s=1,2, \ldots,$
with $r(x)>0, \beta \in(0,1)$ , and $F_{0}(x)=0$ . Prove that:
(i) $F_{s}(x)$ is nondecreasing in $s$ ;
(ii) $F_{s}(x) \leqslant F(x)$ , where $F(x)$ is the value function of an infinite-horizon problem that you should describe;
(iii) $F_{\infty}(x)=\lim _{s \rightarrow \infty} F_{s}(x)=F(x)$ .
A coin lands heads with probability $p$ . A statistician wishes to choose between: $H_{0}: p=1 / 3$ and $H_{1}: p=2 / 3$ , one of which is true. Prior probabilities of $H_{1}$ and $H_{0}$ in the ratio $x: 1$ change after one toss of the coin to ratio $2 x: 1$ (if the toss was a head) or to ratio $x: 2$ (if the toss was a tail). What problem is being addressed by the following dynamic programming equation?
$F(x)=\max \left\{\frac{1}{1+x}, \frac{x}{1+x}, \beta\left[\left(\frac{1}{1+x} \frac{2}{3}+\frac{x}{1+x} \frac{1}{3}\right) F(x / 2)+\left(\frac{1}{1+x} \frac{1}{3}+\frac{x}{1+x} \frac{2}{3}\right) F(2 x)\right]\right\}$
Prove that $G(x)=(1+x) F(x)$ is a convex function of $x$ .
By sketching a graph of $G$ , describe the form of the optimal policy.
comment
Paper 3, Section II, K
Optimization and Control | Part II, 2013
A particle follows a discrete-time trajectory in $\mathbb{R}^{2}$ given by
$\left(\begin{array}{l} x_{t+1} \\ y_{t+1} \end{array}\right)=\left(\begin{array}{ll} 1 & 1 \\ 0 & 1 \end{array}\right)\left(\begin{array}{l} x_{t} \\ y_{t} \end{array}\right)+\left(\begin{array}{c} t \\ 1 \end{array}\right) u_{t}+\left(\begin{array}{c} \epsilon_{t} \\ 0 \end{array}\right)$
where $\left\{\epsilon_{t}\right\}$ is a white noise sequence with $E \epsilon_{t}=0$ and $E \epsilon_{t}^{2}=v$ . Given $\left(x_{0}, y_{0}\right)$ , we wish to choose $\left\{u_{t}\right\}_{t=0}^{9}$ to minimize $C=E\left[x_{10}^{2}+\sum_{t=0}^{9} u_{t}^{2}\right]$ .
Show that for some $\left\{a_{t}\right\}$ this problem can be reduced to one of controlling a scalar state $\xi_{t}=x_{t}+a_{t} y_{t}$ .
Find, in terms of $x_{0}, y_{0}$ , the optimal $u_{0}$ . What is the change in minimum $C$ achievable when the system starts in $\left(x_{0}, y_{0}\right)$ as compared to when it starts in $(0,0)$ ?
Consider now a trajectory starting at $\left(x_{-1}, y_{-1}\right)=(11,-1)$ . What value of $u_{-1}$ is optimal if we wish to minimize $5 u_{-1}^{2}+C$ ?
comment
Paper 4, Section II, K
Optimization and Control | Part II, 2013
Given $r, \rho, \mu, T$ , all positive, it is desired to choose $u(t)>0$ to maximize
$\mu x(T)+\int_{0}^{T} e^{-\rho t} \log u(t) d t$
subject to $\dot{x}(t)=r x(t)-u(t), x(0)=10$ .
Explain what Pontryagin's maximum principle guarantees about a solution to this problem.
Show that no matter whether $x(T)$ is constrained or unconstrained there is a constant $\alpha$ such that the optimal control is of the form $u(t)=\alpha e^{-(\rho-r) t}$ . Find an expression for $\alpha$ under the constraint $x(T)=5$ .
Show that if $x(T)$ is unconstrained then $\alpha=(1 / \mu) e^{-r T}$ .
comment

2012

Paper 2, Section II, J
Optimization and Control | Part II, 2012
Describe the elements of a generic stochastic dynamic programming equation for the problem of maximizing the expected sum of discounted rewards accrued at times $0,1, \ldots$ What is meant by the positive case? What is specially true in this case that is not true in general?
An investor owns a single asset which he may sell once, on any of the days $t=0,1, \ldots$ . On day $t$ he will be offered a price $X_{t}$ . This value is unknown until day $t$ , is independent of all other offers, and a priori it is uniformly distributed on $[0,1]$ . Offers remain open, so that on day $t$ he may sell the asset for the best of the offers made on days $0, \ldots, t$ . If he sells for $x$ on day $t$ then the reward is $x \beta^{t}$ . Show from first principles that if $0<\beta<1$ then there exists $\bar{x}$ such that the expected reward is maximized by selling the first day the offer is at least $\bar{x}$ .
For $\beta=4 / 5$ , find both $\bar{x}$ and the expected reward under the optimal policy.
Explain what is special about the case $\beta=1$ .
comment
Paper 3, Section II, J
Optimization and Control | Part II, 2012
A state variable $x=\left(x_{1}, x_{2}\right) \in \mathbb{R}^{2}$ is subject to dynamics
$\begin{aligned} &\dot{x}_{1}(t)=x_{2}(t) \\ &\dot{x}_{2}(t)=u(t) \end{aligned}$
where $u=u(t)$ is a scalar control variable constrained to the interval $[-1,1]$ . Given an initial value $x(0)=\left(x_{1}, x_{2}\right)$ , let $F\left(x_{1}, x_{2}\right)$ denote the minimal time required to bring the state to $(0,0)$ . Prove that
$\max _{u \in[-1,1]}\left\{-x_{2} \frac{\partial F}{\partial x_{1}}-u \frac{\partial F}{\partial x_{2}}-1\right\}=0$
Explain how this equation figures in Pontryagin's maximum principle.
Use Pontryagin's maximum principle to show that, on an optimal trajectory, $u(t)$ only takes the values 1 and $-1$ , and that it makes at most one switch between them.
Show that $u(t)=1,0 \leqslant t \leqslant 2$ is optimal when $x(0)=(2,-2)$ .
Find the optimal control when $x(0)=(7,-2)$ .
comment
Paper 4, Section II, J
Optimization and Control | Part II, 2012
A factory has a tank of capacity $3 \mathrm{~m}^{3}$ in which it stores chemical waste. Each week the factory produces, independently of other weeks, an amount of waste that is equally likely to be 0,1 , or $2 \mathrm{~m}^{3}$ . If the amount of waste exceeds the remaining space in the tank then the excess must be specially handled at a cost of $£ C$ per $\mathrm{m}^{3}$ . The tank may be emptied or not at the end of each week. Emptying costs $£ D$ , plus a variable cost of $£ \alpha$ for each $\mathrm{m}^{3}$ of its content. It is always emptied when it ends the week full.
It is desired to minimize the average cost per week. Write down equations from which one can determine when it is optimal to empty the tank.
Find the average cost per week of a policy $\pi$ , which empties the tank if and only if its content at the end of the week is 2 or $3 \mathrm{~m}^{3}$ .
Describe the policy improvement algorithm. Explain why, starting from $\pi$ , this algorithm will find an optimal policy in at most three iterations.
Prove that $\pi$ is optimal if and only if $C \geqslant \alpha+(4 / 3) D$ .
comment

2011

Paper 2, Section II, K
Optimization and Control | Part II, 2011
Consider an optimal stopping problem in which the optimality equation takes the form
$F_{t}(x)=\max \left\{r(x), E\left[F_{t+1}\left(x_{t+1}\right)\right]\right\}, \quad t=1, \ldots, N-1,$
$F_{N}(x)=r(x)$ , and where $r(x)>0$ for all $x$ . Let $S$ denote the stopping set of the onestep-look-ahead rule. Show that if $S$ is closed (in a sense you should explain) then the one-step-look-ahead rule is optimal.
$N$ biased coins are to be tossed successively. The probability that the $i$ th coin toss will show a head is known to be $p_{i}\left(0<p_{i}<1\right)$ . At most once, after observing a head, and before tossing the next coin, you may guess that you have just seen the last head (i.e. that all subsequent tosses will show tails). If your guess turns out to be correct then you win $£ 1$ .
Suppose that you have not yet guessed 'last head', and the $i$ th toss is a head. Show that it cannot be optimal to guess that this is the last head if
$\frac{p_{i+1}}{q_{i+1}}+\cdots+\frac{p_{N}}{q_{N}}>1$
where $q_{j}=1-p_{j}$ .
Suppose that $p_{i}=1 / i$ . Show that it is optimal to guess that the last head is the first head (if any) to occur after having tossed at least $i^{*}$ coins, where $i^{*} \approx N / e$ when $N$ is large.
comment
Paper 3, Section II, 28K
Optimization and Control | Part II, 2011
An observable scalar state variable evolves as $x_{t+1}=x_{t}+u_{t}, t=0,1, \ldots$ Let controls $u_{0}, u_{1}, \ldots$ be determined by a policy $\pi$ and define
$C_{s}\left(\pi, x_{0}\right)=\sum_{t=0}^{s-1}\left(x_{t}^{2}+2 x_{t} u_{t}+7 u_{t}^{2}\right) \quad \text { and } \quad C_{s}\left(x_{0}\right)=\inf _{\pi} C_{s}\left(\pi, x_{0}\right)$
Show that it is possible to express $C_{s}\left(x_{0}\right)$ in terms of $\Pi_{s}$ , which satisfies the recurrence
$\Pi_{s}=\frac{6\left(1+\Pi_{s-1}\right)}{7+\Pi_{s-1}}, \quad s=1,2, \ldots$
with $\Pi_{0}=0$ .
Deduce that $C_{\infty}\left(x_{0}\right) \geqslant 2 x_{0}^{2} .\left[C_{\infty}\left(x_{0}\right)\right.$ is defined as $\left.\lim _{s \rightarrow \infty} C_{s}\left(x_{0}\right) .\right]$
By considering the policy $\pi^{*}$ which takes $u_{t}=-(1 / 3)(2 / 3)^{t} x_{0}, t=0,1, \ldots$ , show that $C_{\infty}\left(x_{0}\right)=2 x_{0}^{2}$ .
Give an alternative description of $\pi^{*}$ in closed-loop form.
comment
Paper 4, Section II, K
Optimization and Control | Part II, 2011
Describe the type of optimal control problem that is amenable to analysis using Pontryagin's Maximum Principle.
A firm has the right to extract oil from a well over the interval $[0, T]$ . The oil can be sold at price $£ p$ per unit. To extract oil at rate $u$ when the remaining quantity of oil in the well is $x$ incurs cost at rate $£ u^{2} / x$ . Thus the problem is one of maximizing
$\int_{0}^{T}\left[p u(t)-\frac{u(t)^{2}}{x(t)}\right] d t$
subject to $d x(t) / d t=-u(t), u(t) \geqslant 0, x(t) \geqslant 0$ . Formulate the Hamiltonian for this problem.
Explain why $\lambda(t)$ , the adjoint variable, has a boundary condition $\lambda(T)=0$ .
Use Pontryagin's Maximum Principle to show that under optimal control
$\lambda(t)=p-\frac{1}{1 / p+(T-t) / 4}$
and
$\frac{d x(t)}{d t}=-\frac{2 p x(t)}{4+p(T-t)}$
Find the oil remaining in the well at time $T$ , as a function of $x(0), p$ , and $T$ ,
comment

2010

Paper 2, Section II, J
Optimization and Control | Part II, 2010
(a) Suppose that
$\left(\begin{array}{l} X \\ Y \end{array}\right) \sim N\left(\left(\begin{array}{l} \mu_{X} \\ \mu_{Y} \end{array}\right),\left(\begin{array}{ll} V_{X X} & V_{X Y} \\ V_{Y X} & V_{Y Y} \end{array}\right)\right)$
Prove that conditional on $Y=y$ , the distribution of $X$ is again multivariate normal, with mean $\mu_{X}+V_{X Y} V_{Y Y}^{-1}\left(y-\mu_{Y}\right)$ and covariance $V_{X X}-V_{X Y} V_{Y Y}^{-1} V_{Y X}$ .
(b) The $\mathbb{R}^{d}$ -valued process $X$ evolves in discrete time according to the dynamics
$X_{t+1}=A X_{t}+\varepsilon_{t+1}$
where $A$ is a constant $d \times d$ matrix, and $\varepsilon_{t}$ are independent, with common $N\left(0, \Sigma_{\varepsilon}\right)$ distribution. The process $X$ is not observed directly; instead, all that is seen is the process $Y$ defined as
$Y_{t}=C X_{t}+\eta_{t},$
where $\eta_{t}$ are independent of each other and of the $\varepsilon_{t}$ , with common $N\left(0, \Sigma_{\eta}\right)$ distribution.
If the observer has the prior distribution $X_{0} \sim N\left(\hat{X}_{0}, V_{0}\right)$ for $X_{0}$ , prove that at all later times the distribution of $X_{t}$ conditional on $\mathcal{Y}_{t} \equiv\left(Y_{1}, \ldots, Y_{t}\right)$ is again normally distributed, with mean $\hat{X}_{t}$ and covariance $V_{t}$ which evolve as
$\begin{aligned} \hat{X}_{t+1} &=A \hat{X}_{t}+M_{t} C^{T}\left(\Sigma_{\eta}+C M_{t} C^{T}\right)^{-1}\left(Y_{t+1}-C A \hat{X}_{t}\right) \\ V_{t+1} &=M_{t}-M_{t} C^{T}\left(\Sigma_{\eta}+C M_{t} C^{T}\right)^{-1} C M_{t} \end{aligned}$
where
$M_{t}=A V_{t} A^{T}+\Sigma_{\varepsilon}$
(c) In the special case where both $X$ and $Y$ are one-dimensional, and $A=C=1$ , $\Sigma_{\varepsilon}=0$ , find the form of the updating recursion. Show in particular that
$\frac{1}{V_{t+1}}=\frac{1}{V_{t}}+\frac{1}{\Sigma_{\eta}}$
and that
$\frac{\hat{X}_{t+1}}{V_{t+1}}=\frac{\hat{X}_{t}}{V_{t}}+\frac{Y_{t+1}}{\Sigma_{\eta}}$
Hence deduce that, with probability one,
$\lim _{t \rightarrow \infty} \hat{X}_{t}=\lim _{t \rightarrow \infty} t^{-1} \sum_{j=1}^{t} Y_{j}$
comment
Paper 3, Section II, J
Optimization and Control | Part II, 2010
Consider an infinite-horizon controlled Markov process having per-period costs $c(x, u) \geqslant 0$ , where $x \in \mathcal{X}$ is the state of the system, and $u \in \mathcal{U}$ is the control. Costs are discounted at rate $\beta \in(0,1]$ , so that the objective to be minimized is
$\mathbb{E}\left[\sum_{t \geqslant 0} \beta^{t} c\left(X_{t}, u_{t}\right) \mid X_{0}=x\right]$
What is meant by a policy $\pi$ for this problem?
Let $\mathcal{L}$ denote the dynamic programming operator
$\mathcal{L} f(x) \equiv \inf _{u \in \mathcal{U}}\left\{c(x, u)+\beta \mathbb{E}\left[f\left(X_{1}\right) \mid X_{0}=x, u_{0}=u\right]\right\}$
Further, let $F$ denote the value of the optimal control problem:
$F(x)=\inf _{\pi} \mathbb{E}^{\pi}\left[\sum_{t \geqslant 0} \beta^{t} c\left(X_{t}, u_{t}\right) \mid X_{0}=x\right]$
where the infimum is taken over all policies $\pi$ , and $\mathbb{E}^{\pi}$ denotes expectation under policy $\pi$ . Show that the functions $F_{t}$ defined by
$F_{t+1}=\mathcal{L} F_{t} \quad(t \geqslant 0), \quad F_{0} \equiv 0$
increase to a limit $F_{\infty} \in[0, \infty] .$ Prove that $F_{\infty} \leqslant F$ . Prove that $F=\mathcal{L} F$
Suppose that $\Phi=\mathcal{L} \Phi \geqslant 0$ . Prove that $\Phi \geqslant F$ .
[You may assume that there is a function $u_{*}: \mathcal{X} \rightarrow \mathcal{U}$ such that
$\mathcal{L} \Phi(x)=c\left(x, u_{*}(x)\right)+\beta \mathbb{E}\left[\Phi\left(X_{1}\right) \mid X_{0}=x, u_{0}=u_{*}(x)\right]$
though the result remains true without this simplifying assumption.]
comment
Paper 4, Section II, J
Optimization and Control | Part II, 2010
Dr Seuss' wealth $x_{t}$ at time $t$ evolves as
$\frac{d x}{d t}=r x_{t}+\ell_{t}-c_{t}$
where $r>0$ is the rate of interest earned, $\ell_{t}$ is his intensity of working $(0 \leqslant \ell \leqslant 1)$ , and $c_{t}$ is his rate of consumption. His initial wealth $x_{0}>0$ is given, and his objective is to maximize
$\int_{0}^{T} U\left(c_{t}, \ell_{t}\right) d t$
where $U(c, \ell)=c^{\alpha}(1-\ell)^{\beta}$ , and $T$ is the (fixed) time his contract expires. The constants $\alpha$ and $\beta$ satisfy the inequalities $0<\alpha<1,0<\beta<1$ , and $\alpha+\beta>1$ . At all times, $c_{t}$ must be non-negative, and his final wealth $x_{T}$ must be non-negative. Establish the following properties of the optimal solution $\left(x^{*}, c^{*}, \ell^{*}\right)$ :
(i) $\beta c_{t}^{*}=\alpha\left(1-\ell_{t}^{*}\right)$ ;
(ii) $c_{t}^{*} \propto e^{-\gamma r t}$ , where $\gamma \equiv(\beta-1+\alpha)^{-1}$ ;
(iii) $x_{t}^{*}=A e^{r t}+B e^{-\gamma r t}-r^{-1}$ for some constants $A$ and $B$ .
Hence deduce that the optimal wealth is
$x_{t}^{*}=\frac{\left(1-e^{-\gamma r T}\left(1+r x_{0}\right)\right) e^{r t}+\left(\left(1+r x_{0}\right) e^{r T}-1\right) e^{-\gamma r t}}{r\left(e^{r T}-e^{-\gamma r T}\right)}-\frac{1}{r}$
comment

2009

Paper 2, Section II, I
Optimization and Control | Part II, 2009
In the context of stochastic dynamic programming, explain what is meant by an average-reward optimal policy.
A player has a fair coin and a six-sided die. At each epoch he may choose either to toss the coin or to roll the die. If he tosses the coin and it shows heads then he adds 1 to his total score, and if it shows tails then he adds 0 . If he rolls the die then he adds the number showing. He wins a reward of $£ 1$ whenever his total score is divisible by 3 .
Suppose the player always tosses the coin. Find his average reward per toss.
Still using the above policy, and given that he starts with a total score of $x$ , let $F_{s}(x)$ be the expected total reward over the next $s$ epochs. Find the value of
$\lim _{s \rightarrow \infty}\left[F_{s}(x)-F_{s}(0)\right] .$
Use the policy improvement algorithm to find a policy that produces a greater average reward than the policy of only tossing the coin.
Find the average-reward optimal policy.
comment
Paper 3, Section II, I
Optimization and Control | Part II, 2009
Two scalar systems have dynamics
$x_{t+1}=x_{t}+u_{t}+\epsilon_{t}, \quad y_{t+1}=y_{t}+w_{t}+\eta_{t},$
where $\left\{\epsilon_{t}\right\}$ and $\left\{\eta_{t}\right\}$ are independent sequences of independent and identically distributed random variables of mean 0 and variance 1 . Let
$F(x)=\inf _{\pi} \mathbb{E}\left[\sum_{t=0}^{\infty}\left(x_{t}^{2}+u_{t}^{2}\right)(2 / 3)^{t} \mid x_{0}=x\right]$
where $\pi$ is a policy in which $u_{t}$ depends on only $x_{0}, \ldots, x_{t}$ .
Show that $G(x)=P x^{2}+d$ is a solution to the optimality equation satisfied by $F(x)$ , for some $P$ and $d$ which you should find.
Find the optimal controls.
State a theorem that justifies $F(x)=G(x)$ .
For each of the two cases (a) $\lambda=0$ and (b) $\lambda=1$ , find controls $\left\{u_{t}, w_{t}\right\}$ which minimize
$\mathbb{E}\left[\sum_{t=0}^{\infty}\left(x_{t}^{2}+2 \lambda x_{t} y_{t}+y_{t}^{2}+u_{t}^{2}+w_{t}^{2}\right)(2 / 3+\lambda / 12)^{t} \mid x_{0}=x, y_{0}=y\right]$
comment
Paper 4, Section II, I
Optimization and Control | Part II, 2009
Explain how transversality conditions can be helpful when employing Pontryagin's Maximum Principle to solve an optimal control problem.
A particle in $\mathbb{R}^{2}$ starts at $(0,0.5)$ and follows the dynamics
$\dot{x}=u \sqrt{|y|}, \quad \dot{y}=v \sqrt{|y|}, \quad t \in[0, T],$
where controls $u(t)$ and $v(t)$ are to be chosen subject to $u^{2}(t)+v^{2}(t)=1$ .
Using Pontryagin's maximum principle do the following:
(a) Find controls that minimize $-y(1)$ ;
(b) Suppose we wish to choose $T$ and the controls $u, v$ to minimize $-y(T)+T$ under a constraint $(x(T), y(T))=(1,1)$ . By expressing both $d y / d x$ and $d^{2} y / d x^{2}$ in terms of the adjoint variables, show that on an optimal trajectory,
$1+\left(\frac{d y}{d x}\right)^{2}+2 y \frac{d^{2} y}{d x^{2}}=0$
comment

2008

2.II.29I
Optimization and Control | Part II, 2008
Consider a stochastic controllable dynamical system $P$ with action-space $A$ and countable state-space $S$ . Thus $P=\left(p_{x y}(a): x, y \in S, a \in A\right)$ and $p_{x y}(a)$ denotes the transition probability from $x$ to $y$ when taking action $a$ . Suppose that a cost $c(x, a)$ is incurred each time that action $a$ is taken in state $x$ , and that this cost is uniformly bounded. Write down the dynamic optimality equation for the problem of minimizing the expected long-run average cost.
State in terms of this equation a general result, which can be used to identify an optimal control and the minimal long-run average cost.
A particle moves randomly on the integers, taking steps of size 1 . Suppose we can choose at each step a control parameter $u \in[\alpha, 1-\alpha]$ , where $\alpha \in(0,1 / 2)$ is fixed, which has the effect that the particle moves in the positive direction with probability $u$ and in the negative direction with probability $1-u$ . It is desired to maximize the long-run proportion of time $\pi$ spent by the particle at 0 . Show that there is a solution to the optimality equation for this example in which the relative cost function takes the form $\theta(x)=\mu|x|$ , for some constant $\mu$ .
Determine an optimal control and show that the maximal long-run proportion of time spent at 0 is given by
$\pi=\frac{1-2 \alpha}{2(1-\alpha)} .$
You may assume that it is valid to use an unbounded function $\theta$ in the optimality equation in this example.
comment
3.II.28I
Optimization and Control | Part II, 2008
Let $Q$ be a positive-definite symmetric $m \times m$ matrix. Show that a non-negative quadratic form on $\mathbb{R}^{d} \times \mathbb{R}^{m}$ of the form
$c(x, a)=x^{T} R x+x^{T} S^{T} a+a^{T} S x+a^{T} Q a, \quad x \in \mathbb{R}^{d}, \quad a \in \mathbb{R}^{m}$
is minimized over $a$ , for each $x$ , with value $x^{T}\left(R-S^{T} Q^{-1} S\right) x$ , by taking $a=K x$ , where $K=-Q^{-1} S$ .
Consider for $k \leqslant n$ the controllable stochastic linear system in $\mathbb{R}^{d}$
$X_{j+1}=A X_{j}+B U_{j}+\varepsilon_{j+1}, \quad j=k, k+1, \ldots, n-1,$
starting from $X_{k}=x$ at time $k$ , where the control variables $U_{j}$ take values in $\mathbb{R}^{m}$ , and where $\varepsilon_{k+1}, \ldots, \varepsilon_{n}$ are independent, zero-mean random variables, with $\operatorname{var}\left(\varepsilon_{j}\right)=N_{j}$ . Here, $A, B$ and $N_{j}$ are, respectively, $d \times d, d \times m$ and $d \times d$ matrices. Assume that a cost $c\left(X_{j}, U_{j}\right)$ is incurred at each time $j=k, \ldots, n-1$ and that a final cost $C\left(X_{n}\right)=X_{n}^{T} \Pi_{0} X_{n}$ is incurred at time $n$ . Here, $\Pi_{0}$ is a given non-negative-definite symmetric matrix. It is desired to minimize, over the set of all controls $u$ , the total expected cost $V^{u}(k, x)$ . Write down the optimality equation for the infimal cost function $V(k, x)$ .
Hence, show that $V(k, x)$ has the form
$V(k, x)=x^{T} \Pi_{n-k} x+\gamma_{k}$
for some non-negative-definite symmetric matrix $\Pi_{n-k}$ and some real constant $\gamma_{k}$ . Show how to compute the matrix $\Pi_{n-k}$ and constant $\gamma_{k}$ and how to determine an optimal control.
comment
4.II.29I
Optimization and Control | Part II, 2008
State Pontryagin's maximum principle for the controllable dynamical system with state-space $\mathbb{R}^{+}$ , given by
$\dot{x}_{t}=b\left(t, x_{t}, u_{t}\right), \quad t \geqslant 0,$
where the running costs are given by $c\left(t, x_{t}, u_{t}\right)$ , up to an unconstrained terminal time $\tau$ when the state first reaches 0 , and there is a terminal cost $C(\tau)$ .
A company pays a variable price $p(t)$ per unit time for electrical power, agreed in advance, which depends on the time of day. The company takes on a job at time $t=0$ , which requires a total amount $E$ of electrical energy, but can be processed at a variable level of power consumption $u(t) \in[0,1]$ . If the job is completed by time $\tau$ , then the company will receive a reward $R(\tau)$ . Thus, it is desired to minimize
$\int_{0}^{\tau} u(t) p(t) d t-R(\tau)$
subject to
$\int_{0}^{\tau} u(t) d t=E, \quad u(t) \in[0,1],$
with $\tau>0$ unconstrained. Take as state variable the energy $x_{t}$ still needed at time $t$ to complete the job. Use Pontryagin's maximum principle to show that the optimal control is to process the job on full power or not at all, according as the price $p(t)$ lies below or above a certain threshold value $p^{*}$ .
Show further that, if $\tau^{*}$ is the completion time for the optimal control, then
$p^{*}+\dot{R}\left(\tau^{*}\right)=p\left(\tau^{*}\right)$
Consider a case in which $p$ is periodic, with period one day, where day 1 corresponds to the time interval $[0,2]$ , and $p(t)=(t-1)^{2}$ during day 1 . Suppose also that $R(\tau)=1 /(1+\tau)$ and $E=1 / 2$ . Determine the total energy cost and the reward associated with the threshold $p^{*}=1 / 4$ .
Hence, show that any threshold low enough to carry processing over into day 2 is suboptimal.
Show carefully that the optimal price threshold is given by $p^{*}=1 / 4$ .
comment

2007

2.II.29I
Optimization and Control | Part II, 2007
State Pontryagin's maximum principle in the case where both the terminal time and the terminal state are given.
Show that $\pi$ is the minimum value taken by the integral
$\frac{1}{2} \int_{0}^{1}\left(u_{t}^{2}+v_{t}^{2}\right) d t$
subject to the constraints $x_{0}=y_{0}=z_{0}=x_{1}=y_{1}=0$ and $z_{1}=1$ , where
$\dot{x}_{t}=u_{t}, \quad \dot{y}_{t}=v_{t}, \quad \dot{z}_{t}=u_{t} y_{t}-v_{t} x_{t}, \quad 0 \leqslant t \leqslant 1$
[You may find it useful to note the fact that the problem is rotationally symmetric about the $z$ -axis, so that the angle made by the initial velocity $\left(\dot{x}_{0}, \dot{y}_{0}\right)$ with the positive $x$ -axis may be chosen arbitrarily.]
comment
3.II.28I
Optimization and Control | Part II, 2007
Let $P$ be a discrete-time controllable dynamical system (or Markov decision process) with countable state-space $S$ and action-space $A$ . Consider the $n$ -horizon dynamic optimization problem with instantaneous costs $c(k, x, a)$ , on choosing action $a$ in state $x$ at time $k \leqslant n-1$ , with terminal cost $C(x)$ , in state $x$ at time $n$ . Explain what is meant by a Markov control and how the choice of a control gives rise to a time-inhomogeneous Markov chain.
Suppose we can find a bounded function $V$ and a Markov control $u^{*}$ such that
$V(k, x) \leqslant(c+P V)(k, x, a), \quad 0 \leqslant k \leqslant n-1, \quad x \in S, \quad a \in A$
with equality when $a=u^{*}(k, x)$ , and such that $V(n, x)=C(x)$ for all $x$ . Here $P V(k, x, a)$ denotes the expected value of $V\left(k+1, X_{k+1}\right)$ , given that we choose action $a$ in state $x$ at time $k$ . Show that $u^{*}$ is an optimal Markov control.
A well-shuffled pack of cards is placed face-down on the table. The cards are turned over one by one until none are left. Exactly once you may place a bet of $£ 1000$ on the event that the next two cards will be red. How should you choose the moment to bet? Justify your answer.
comment
4.II.29I
Optimization and Control | Part II, 2007
Consider the scalar controllable linear system, whose state $X_{n}$ evolves by
$X_{n+1}=X_{n}+U_{n}+\varepsilon_{n+1}$
with observations $Y_{n}$ given by
$Y_{n+1}=X_{n}+\eta_{n+1}$
Here, $U_{n}$ is the control variable, which is to be determined on the basis of the observations up to time $n$ , and $\varepsilon_{n}, \eta_{n}$ are independent $N(0,1)$ random variables. You wish to minimize the long-run average expected cost, where the instantaneous cost at time $n$ is $X_{n}^{2}+U_{n}^{2}$ . You may assume that the optimal control in equilibrium has the form $U_{n}=-K \hat{X}_{n}$ , where $\hat{X}_{n}$ is given by a recursion of the form
$\hat{X}_{n+1}=\hat{X}_{n}+U_{n}+H\left(Y_{n+1}-\hat{X}_{n}\right)$
and where $H$ is chosen so that $\Delta_{n}=X_{n}-\hat{X}_{n}$ is independent of the observations up to time $n$ . Show that $K=H=(\sqrt{5}-1) / 2=2 /(\sqrt{5}+1)$ , and determine the minimal long-run average expected cost. You are not expected to simplify the arithmetic form of your answer but should show clearly how you have obtained it.
comment

2006

2.II.29I
Optimization and Control | Part II, 2006
A policy $\pi$ is to be chosen to maximize
$F(\pi, x)=\mathbb{E}_{\pi}\left[\sum_{t \geqslant 0} \beta^{t} r\left(x_{t}, u_{t}\right) \mid x_{0}=x\right]$
where $0<\beta \leqslant 1$ . Assuming that $r \geqslant 0$ , prove that $\pi$ is optimal if $F(\pi, x)$ satisfies the optimality equation.
An investor receives at time $t$ an income of $x_{t}$ of which he spends $u_{t}$ , subject to $0 \leqslant u_{t} \leqslant x_{t}$ . The reward is $r\left(x_{t}, u_{t}\right)=u_{t}$ , and his income evolves as
$x_{t+1}=x_{t}+\left(x_{t}-u_{t}\right) \varepsilon_{t},$
where $\left(\varepsilon_{t}\right)_{t \geqslant 0}$ is a sequence of independent random variables with common mean $\theta>0$ . If $0<\beta \leqslant 1 /(1+\theta)$ , show that the optimal policy is to take $u_{t}=x_{t}$ for all $t$ .
What can you say about the problem if $\beta>1 /(1+\theta) ?$
comment
3.II.28I
Optimization and Control | Part II, 2006
A discrete-time controlled Markov process evolves according to
$X_{t+1}=\lambda X_{t}+u_{t}+\varepsilon_{t}, \quad t=0,1, \ldots,$
where the $\varepsilon$ are independent zero-mean random variables with common variance $\sigma^{2}$ , and $\lambda$ is a known constant.
Consider the problem of minimizing
$F_{t, T}(x)=\mathbb{E}\left[\sum_{j=t}^{T-1} \beta^{j-t} C\left(X_{j}, u_{j}\right)+\beta^{T-t} R\left(X_{T}\right)\right],$
where $C(x, u)=\frac{1}{2}\left(u^{2}+a x^{2}\right), \beta \in(0,1)$ and $R(x)=\frac{1}{2} a_{0} x^{2}+b_{0}$ . Show that the optimal control at time $j$ takes the form $u_{j}=k_{T-j} X_{j}$ for certain constants $k_{i}$ . Show also that the minimized value for $F_{t, T}(x)$ is of the form
$\frac{1}{2} a_{T-t} x^{2}+b_{T-t}$
for certain constants $a_{j}, b_{j}$ . Explain how these constants are to be calculated. Prove that the equation
$f(z) \equiv a+\frac{\lambda^{2} \beta z}{1+\beta z}=z$
has a unique positive solution $z=a_{*}$ , and that the sequence $\left(a_{j}\right)_{j} \geqslant 0$ converges monotonically to $a_{*}$ .
Prove that the sequence $\left(b_{j}\right)_{j \geqslant 0}$ converges, to the limit
$b_{*} \equiv \frac{\beta \sigma^{2} a_{*}}{2(1-\beta)} .$
Finally, prove that $k_{j} \rightarrow k_{*} \equiv-\beta a_{*} \lambda /\left(1+\beta a_{*}\right)$ .
comment
4.II.29I
Optimization and Control | Part II, 2006
An investor has a (possibly negative) bank balance $x(t)$ at time $t$ . For given positive $x(0), T, \mu, A$ and $r$ , he wishes to choose his spending rate $u(t) \geqslant 0$ so as to maximize
$\Phi(u ; \mu) \equiv \int_{0}^{T} e^{-\beta t} \log u(t) d t+\mu e^{-\beta T} x(T),$
where $d x(t) / d t=A+r x(t)-u(t)$ . Find the investor's optimal choice of control $u(t)=u_{*}(t ; \mu)$ .
Let $x_{*}(t ; \mu)$ denote the optimally-controlled bank balance. By considering next how $x_{*}(T ; \mu)$ depends on $\mu$ , show that there is a unique positive $\mu_{*}$ such that $x_{*}\left(T ; \mu_{*}\right)=0$ . If the original problem is modified by setting $\mu=0$ , but requiring that $x(T) \geqslant 0$ , show that the optimal control for this modified problem is $u(t)=u_{*}\left(t ; \mu_{*}\right)$ .
comment

2005

2.II.29I
Optimization and Control | Part II, 2005
Explain what is meant by a time-homogeneous discrete time Markov decision problem.
What is the positive programming case?
A discrete time Markov decision problem has state space $\{0,1, \ldots, N\}$ . In state $i, i \neq 0, N$ , two actions are possible. We may either stop and obtain a terminal reward $r(i) \geqslant 0$ , or may continue, in which case the subsequent state is equally likely to be $i-1$ or $i+1$ . In states 0 and $N$ stopping is automatic (with terminal rewards $r(0)$ and $r(N)$ respectively). Starting in state $i$ , denote by $V_{n}(i)$ and $V(i)$ the maximal expected terminal reward that can be obtained over the first $n$ steps and over the infinite horizon, respectively. Prove that $\lim _{n \rightarrow \infty} V_{n}=V$ .
Prove that $V$ is the smallest concave function such that $V(i) \geqslant r(i)$ for all $i$ .
Describe an optimal policy.
Suppose $r(0), \ldots, r(N)$ are distinct numbers. Show that the optimal policy is unique, or give a counter-example.
comment
3.II.28I
Optimization and Control | Part II, 2005
Consider the problem
$\operatorname{minimize} E\left[x(T)^{2}+\int_{0}^{T} u(t)^{2} d t\right]$
where for $0 \leqslant t \leqslant T$ ,
$\dot{x}(t)=y(t) \text { and } \quad \dot{y}(t)=u(t)+\epsilon(t),$
$u(t)$ is the control variable, and $\epsilon(t)$ is Gaussian white noise. Show that the problem can be rewritten as one of controlling the scalar variable $z(t)$ , where
$z(t)=x(t)+(T-t) y(t) .$
By guessing the form of the optimal value function and ensuring it satisfies an appropriate optimality equation, show that the optimal control is
$u(t)=-\frac{(T-t) z(t)}{1+\frac{1}{3}(T-t)^{3}} .$
Is this certainty equivalence control?
comment
4.II.29I
Optimization and Control | Part II, 2005
A continuous-time control problem is defined in terms of state variable $x(t) \in \mathbb{R}^{n}$ and control $u(t) \in \mathbb{R}^{m}, 0 \leqslant t \leqslant T$ . We desire to minimize $\int_{0}^{T} c(x, t) d t+K(x(T))$ , where $T$ is fixed and $x(T)$ is unconstrained. Given $x(0)$ and $\dot{x}=a(x, u)$ , describe further boundary conditions that can be used in conjunction with Pontryagin's maximum principle to find $x, u$ and the adjoint variables $\lambda_{1}, \ldots, \lambda_{n}$ .
Company 1 wishes to steal customers from Company 2 and maximize the profit it obtains over an interval $[0, T]$ . Denoting by $x_{i}(t)$ the number of customers of Company $i$ , and by $u(t)$ the advertising effort of Company 1 , this leads to a problem
$\operatorname{minimize} \int_{0}^{T}\left[x_{2}(t)+3 u(t)\right] d t$
where $\dot{x}_{1}=u x_{2}, \dot{x}_{2}=-u x_{2}$ , and $u(t)$ is constrained to the interval $[0,1]$ . Assuming $x_{2}(0)>3 / T$ , use Pontryagin's maximum principle to show that the optimal advertising policy is bang-bang, and that there is just one change in advertising effort, at a time $t^{*}$ , where
$3 e^{t^{*}}=x_{2}(0)\left(T-t^{*}\right) .$
comment

2004

B2.15
Optimization and Control | Part II, 2004
A gambler is presented with a sequence of $n \geqslant 6$ random numbers, $N_{1}, N_{2}, \ldots, N_{n}$ , one at a time. The distribution of $N_{k}$ is
$P\left(N_{k}=k\right)=1-P\left(N_{k}=-k\right)=p,$
where $1 /(n-2)<p \leq 1 / 3$ . The gambler must choose exactly one of the numbers, just after it has been presented and before any further numbers are presented, but must wait until all the numbers are presented before his payback can be decided. It costs $£ 1$ to play the game. The gambler receives payback as follows: nothing if he chooses the smallest of all the numbers, $£ 2$ if he chooses the largest of all the numbers, and $£ 1$ otherwise.
Show that there is an optimal strategy of the form "Choose the first number $k$ such that either (i) $N_{k}>0$ and $k \geq n-r_{0}$ , or (ii) $k=n-1$ ", where you should determine the constant $r_{0}$ as explicitly as you can.
comment
B3.14
Optimization and Control | Part II, 2004
The strength of the economy evolves according to the equation
$\ddot{x}_{t}=-\alpha^{2} x_{t}+u_{t},$
where $x_{0}=\dot{x}_{0}=0$ and $u_{t}$ is the effort that the government puts into reform at time $t, t \geq 0$ . The government wishes to maximize its chance of re-election at a given future time $T$ , where this chance is some monotone increasing function of
$x_{T}-\frac{1}{2} \int_{0}^{T} u_{t}^{2} d t$
Use Pontryagin's maximum principle to determine the government's optimal reform policy, and show that the optimal trajectory of $x_{t}$ is
$x_{t}=\frac{t}{2} \alpha^{-2} \cos (\alpha(T-t))-\frac{1}{2} \alpha^{-3} \cos (\alpha T) \sin (\alpha t) .$
comment
B4.14
Optimization and Control | Part II, 2004
Consider the deterministic dynamical system
$\dot{x}_{t}=A x_{t}+B u_{t}$
where $A$ and $B$ are constant matrices, $x_{t} \in \mathbb{R}^{n}$ , and $u_{t}$ is the control variable, $u_{t} \in \mathbb{R}^{m}$ . What does it mean to say that the system is controllable?
Let $y_{t}=e^{-t A} x_{t}-x_{0}$ . Show that if $V_{t}$ is the set of possible values for $y_{t}$ as the control $\left\{u_{s}: 0 \leq x \leq t\right\}$ is allowed to vary, then $V_{t}$ is a vector space.
Show that each of the following three conditions is equivalent to controllability of the system.
(i) The set $\left\{v \in \mathbb{R}^{n}: v^{\top} y_{t}=0\right.$ for all $\left.y_{t} \in V_{t}\right\}=\{0\}$ .
(ii) The matrix $H(t)=\int_{0}^{t} e^{-s A} B B^{\top} e^{-s A^{\top}} d s$ is (strictly) positive definite.
(iii) The matrix $M_{n}=\left[\begin{array}{lllll}B & A B & A^{2} B & \cdots & A^{n-1} B\end{array}\right]$ has rank $n$ .
Consider the scalar system
$\sum_{j=0}^{n} a_{j}\left(\frac{d}{d t}\right)^{n-j} \xi_{t}=u_{t}$
where $a_{0}=1$ . Show that this system is controllable.
comment

2003

B2.15
Optimization and Control | Part II, 2003
The owner of a put option may exercise it on any one of the days $1, \ldots, h$ , or not at all. If he exercises it on day $t$ , when the share price is $x_{t}$ , his profit will be $p-x_{t}$ . Suppose the share price obeys $x_{t+1}=x_{t}+\epsilon_{t}$ , where $\epsilon_{1}, \epsilon_{2}, \ldots$ are i.i.d. random variables for which $E\left|\epsilon_{t}\right|<\infty$ . Let $F_{s}(x)$ be the maximal expected profit the owner can obtain when there are $s$ further days to go and the share price is $x$ . Show that
(a) $F_{s}(x)$ is non-decreasing in $s$ ,
(b) $F_{s}(x)+x$ is non-decreasing in $x$ , and
(c) $F_{s}(x)$ is continuous in $x$ .
Deduce that there exists a non-decreasing sequence, $a_{1}, \ldots, a_{h}$ , such that expected profit is maximized by exercising the option the first day that $x_{t} \leqslant a_{t}$ .
Now suppose that the option never expires, so effectively $h=\infty$ . Show by examples that there may or may not exist an optimal policy of the form 'exercise the option the first day that $x_{t} \leqslant a$ .
comment
B3.14
Optimization and Control | Part II, 2003
State Pontryagin's Maximum Principle (PMP).
In a given lake the tonnage of fish, $x$ , obeys
$d x / d t=0.001(50-x) x-u, \quad 0<x \leqslant 50$
where $u$ is the rate at which fish are extracted. It is desired to maximize
$\int_{0}^{\infty} u(t) e^{-0.03 t} d t$
choosing $u(t)$ under the constraints $0 \leqslant u(t) \leqslant 1.4$ , and $u(t)=0$ if $x(t)=0$ . Assume the PMP with an appropriate Hamiltonian $H(x, u, t, \lambda)$ . Now define $G(x, u, t, \eta)=$ $e^{0.03 t} H(x, u, t, \lambda)$ and $\eta(t)=e^{0.03 t} \lambda(t)$ . Show that there exists $\eta(t), 0 \leqslant t$ such that on the optimal trajectory $u$ maximizes
$G(x, u, t, \eta)=\eta[0.001(50-x) x-u]+u$
and
$d \eta / d t=0.002(x-10) \eta$
Suppose that $x(0)=20$ and that under an optimal policy it is not optimal to extract all the fish. Argue that $\eta(0) \geqslant 1$ is impossible and describe qualitatively what must happen under the optimal policy.
comment
B4.14
Optimization and Control | Part II, 2003
The scalars $x_{t}, y_{t}, u_{t}$ , are related by the equations
$x_{t}=x_{t-1}+u_{t-1}, \quad y_{t}=x_{t-1}+\eta_{t-1}, \quad t=1, \ldots, T,$
where $\left\{\eta_{t}\right\}$ is a sequence of uncorrelated random variables with means of 0 and variances of 1. Given that $\hat{x}_{0}$ is an unbiased estimate of $x_{0}$ of variance 1 , the control variable $u_{t}$ is to be chosen at time $t$ on the basis of the information $W_{t}$ , where $W_{0}=\left(\hat{x}_{0}\right)$ and $W_{t}=\left(\hat{x}_{0}, u_{0}, \ldots, u_{t-1}, y_{1}, \ldots, y_{t}\right), t=1,2, \ldots, T-1$ . Let $\hat{x}_{1}, \ldots, \hat{x}_{T}$ be the Kalman filter estimates of $x_{1}, \ldots, x_{T}$ computed from
$\hat{x}_{t}=\hat{x}_{t-1}+u_{t-1}+h_{t}\left(y_{t}-\hat{x}_{t-1}\right)$
by appropriate choices of $h_{1}, \ldots, h_{T}$ . Show that the variance of $\hat{x}_{t}$ is $V_{t}=1 /(1+t)$ .
Define $F\left(W_{T}\right)=E\left[x_{T}^{2} \mid W_{T}\right]$ and
$F\left(W_{t}\right)=\inf _{u_{t}, \ldots, u_{T-1}} E\left[\sum_{\tau=t}^{T-1} u_{\tau}^{2}+x_{T}^{2} \mid W_{t}\right], \quad t=0, \ldots, T-1$
Show that $F\left(W_{t}\right)=\hat{x}_{t}^{2} P_{t}+d_{t}$ , where $P_{t}=1 /(T-t+1), d_{T}=1 /(1+T)$ and $d_{t-1}=V_{t-1} V_{t} P_{t}+d_{t} .$
How would the expression for $F\left(W_{0}\right)$ differ if $\hat{x}_{0}$ had a variance different from $1 ?$
comment

2002

B2.15
Optimization and Control | Part II, 2002
State Pontryagin's maximum principle (PMP) for the problem of minimizing
$\int_{0}^{T} c(x(t), u(t)) d t+K(x(T))$
where $x(t) \in \mathbb{R}^{n}, u(t) \in \mathbb{R}^{m}, d x / d t=a(x(t), u(t))$ ; here, $x(0)$ and $T$ are given, and $x(T)$ is unconstrained.
Consider the two-dimensional problem in which $d x_{1} / d t=x_{2}, d x_{2} / d t=u$ , $c(x, u)=\frac{1}{2} u^{2}$ and $K(x(T))=\frac{1}{2} q x_{1}(T)^{2}, q>0$ . Show that, by use of a variable $z(t)=x_{1}(t)+x_{2}(t)(T-t)$ , one can rewrite this problem as an equivalent one-dimensional problem.
Use PMP to solve this one-dimensional problem, showing that the optimal control can be expressed as $u(t)=-q z(T)(T-t)$ , where $z(T)=z(0) /\left(1+\frac{1}{3} q T^{3}\right)$ .
Express $u(t)$ in a feedback form of $u(t)=k(t) z(t)$ for some $k(t)$ .
Suppose that the initial state $x(0)$ is perturbed by a small amount to $x(0)+\left(\epsilon_{1}, \epsilon_{2}\right)$ . Give an expression (in terms of $\epsilon_{1}, \epsilon_{2}, x(0), q$ and $T$ ) for the increase in minimal cost.
comment
B3.14
Optimization and Control | Part II, 2002
Consider a scalar system with $x_{t+1}=\left(x_{t}+u_{t}\right) \xi_{t}$ , where $\xi_{0}, \xi_{1}, \ldots$ is a sequence of independent random variables, uniform on the interval $[-a, a]$ , with $a \leqslant 1$ . We wish to choose $u_{0}, \ldots, u_{h-1}$ to minimize the expected value of
$\sum_{t=0}^{h-1}\left(c+x_{t}^{2}+u_{t}^{2}\right)+3 x_{h}^{2},$
where $u_{t}$ is chosen knowing $x_{t}$ but not $\xi_{t}$ . Prove that the minimal expected cost can be written $V_{h}\left(x_{0}\right)=h c+x_{0}^{2} \Pi_{h}$ and derive a recurrence for calculating $\Pi_{1}, \ldots, \Pi_{h}$ .
How does your answer change if $u_{t}$ is constrained to lie in the set $\mathcal{U}\left(x_{t}\right)=\{u:$ $\left.\left|u+x_{t}\right|<\left|x_{t}\right|\right\} ?$
Consider a stopping problem for which there are two options in state $x_{t}, t \geqslant 0$ :
(1) stop: paying a terminal cost $3 x_{t}^{2}$ ; no further costs are incurred;
(2) continue: choosing $u_{t} \in \mathcal{U}\left(x_{t}\right)$ , paying $c+u_{t}^{2}+x_{t}^{2}$ , and moving to state $x_{t+1}=\left(x_{t}+u_{t}\right) \xi_{t} .$
Consider the problem of minimizing total expected cost subject to the constraint that no more than $h$ continuation steps are allowed. Suppose $a=1$ . Show that an optimal policy stops if and only if either $h$ continuation steps have already been taken or $x^{2} \leqslant 2 c / 3$ .
[Hint: Use induction on $h$ to show that a one-step-look-ahead rule is optimal. You should not need to find the optimal $u_{t}$ for the continuation steps.]
comment
B4.14
Optimization and Control | Part II, 2002
A discrete-time decision process is defined on a finite set of states $I$ as follows. Upon entry to state $i_{t}$ at time $t$ the decision-maker observes a variable $\xi_{t}$ . He then chooses the next state freely within $I$ , at a cost of $c\left(i_{t}, \xi_{t}, i_{t+1}\right)$ . Here $\left\{\xi_{0}, \xi_{1}, \ldots\right\}$ is a sequence of integer-valued, identically distributed random variables. Suppose there exist $\left\{\phi_{i}: i \in I\right\}$ and $\lambda$ such that for all $i \in I$
$\phi_{i}+\lambda=\sum_{k \in \mathbb{Z}} P\left(\xi_{t}=k\right) \min _{i^{\prime} \in I}\left[c\left(i, k, i^{\prime}\right)+\phi_{i^{\prime}}\right] .$
Let $\pi$ denote a policy. Show that
$\lambda=\inf _{\pi} \limsup _{t \rightarrow \infty} E_{\pi}\left[\frac{1}{t} \sum_{s=0}^{t-1} c\left(i_{s}, \xi_{s}, i_{s+1}\right)\right]$
At the start of each month a boat manufacturer receives orders for 1, 2 or 3 boats. These numbers are equally likely and independent from month to month. He can produce $j$ boats in a month at a cost of $6+3 j$ units. All orders are filled at the end of the month in which they are ordered. It is possible to make extra boats, ending the month with a stock of $i$ unsold boats, but $i$ cannot be more than 2 , and a holding cost of $c i$ is incurred during any month that starts with $i$ unsold boats in stock. Write down an optimality equation that can be used to find the long-run expected average-cost.
Let $\pi$ be the policy of only ever producing sufficient boats to fill the present month's orders. Show that it is optimal if and only if $c \geqslant 2$ .
Suppose $c<2$ . Starting from $\pi$ , what policy is obtained after applying one step of the policy-improvement algorithm?
comment

2001

B2.15
Optimization and Control | Part II, 2001
A street trader wishes to dispose of $k$ counterfeit Swiss watches. If he offers one for sale at price $u$ he will sell it with probability $a e^{-u}$ . Here $a$ is known and less than 1 . Subsequent to each attempted sale (successful or not) there is a probability $1-\beta$ that he will be arrested and can make no more sales. His aim is to choose the prices at which he offers the watches so as to maximize the expected values of his sales up until the time he is arrested or has sold all $k$ watches.
Let $V(k)$ be the maximum expected amount he can obtain when he has $k$ watches remaining and has not yet been arrested. Explain why $V(k)$ is the solution to
$V(k)=\max _{u>0}\left\{a e^{-u}[u+\beta V(k-1)]+\left(1-a e^{-u}\right) \beta V(k)\right\}$
Denote the optimal price by $u_{k}$ and show that
$u_{k}=1+\beta V(k)-\beta V(k-1)$
and that
$V(k)=a e^{-u_{k}} /(1-\beta)$
Show inductively that $V(k)$ is a nondecreasing and concave function of $k$ .
comment
B3.14
Optimization and Control | Part II, 2001
A file of $X \mathrm{Mb}$ is to be transmitted over a communications link. At each time $t$ the sender can choose a transmission rate, $u(t)$ , within the range $[0,1]$ Mb per second. The charge for transmitting at rate $u(t)$ at time $t$ is $u(t) p(t)$ . The function $p$ is fully known at time 0. If it takes a total time $T$ to transmit the file then there is a delay cost of $\gamma T^{2}$ , $\gamma>0$ . Thus $u$ and $T$ are to be chosen to minimize
$\int_{0}^{T} u(t) p(t) d t+\gamma T^{2}$
where $u(t) \in[0,1], d x(t) / d t=-u(t), x(0)=X$ and $x(T)=0$ . Quoting and applying appropriate results of Pontryagin's maximum principle show that a property of the optimal policy is that there exists $p^{*}$ such that $u(t)=1$ if $p(t)<p^{*}$ and $u(t)=0$ if $p(t)>p^{*}$ .
Show that the optimal $p^{*}$ and $T$ are related by $p^{*}=p(T)+2 \gamma T$ .
Suppose $p(t)=t+1 / t$ and $X=1$ . For what value of $\gamma$ is it optimal to transmit at a constant rate 1 between times $1 / 2$ and $3 / 2$ ?
comment
B4.14
Optimization and Control | Part II, 2001
Consider the scalar system with plant equation $x_{t+1}=x_{t}+u_{t}, t=0,1, \ldots$ and cost
$C_{s}\left(x_{0}, u_{0}, u_{1}, \ldots\right)=\sum_{t=0}^{s}\left[u_{t}^{2}+\frac{4}{3} x_{t}^{2}\right]$
Show from first principles that $\min _{u_{0}, u_{1}, \ldots} C_{s}=V_{s} x_{0}^{2}$ , where $V_{0}=4 / 3$ and for $s=0,1, \ldots$
$V_{s+1}=4 / 3+V_{s} /\left(1+V_{s}\right)$
Show that $V_{s} \rightarrow 2$ as $s \rightarrow \infty$ .
Prove that $C_{\infty}$ is minimized by the stationary control, $u_{t}=-2 x_{t} / 3$ for all $t$ .
Consider the stationary policy $\pi_{0}$ that has $u_{t}=-x_{t}$ for all $t$ . What is the value of $C_{\infty}$ under this policy?
Consider the following algorithm, in which steps 1 and 2 are repeated as many times as desired.
1. For a given stationary policy $\pi_{n}$ , for which $u_{t}=k_{n} x_{t}$ for all $t$ , determine the value of $C_{\infty}$ under this policy as $V^{\pi_{n}} x_{0}^{2}$ by solving for $V^{\pi_{n}}$ in
$V^{\pi_{n}}=k_{n}^{2}+4 / 3+\left(1+k_{n}\right)^{2} V^{\pi_{n}}$
1. Now find $k_{n+1}$ as the minimizer of
$k_{n+1}^{2}+4 / 3+\left(1+k_{n+1}\right)^{2} V^{\pi_{n}}$
and define $\pi_{n+1}$ as the policy for which $u_{t}=k_{n+1} x_{t}$ for all $t$ .
Explain why $\pi_{n+1}$ is guaranteed to be a better policy than $\pi_{n}$ .
Let $\pi_{0}$ be the stationary policy with $u_{t}=-x_{t}$ . Determine $\pi_{1}$ and verify that it minimizes $C_{\infty}$ to within $0.2 \%$ of its optimum.
comment

Optimization And Control