Optimization And Control
Jump to year
Paper 2, Section II, K
commentDuring each of time periods a venture capitalist, Vicky, is presented with an investment opportunity for which the rate of return for that period is a random variable; the rates of return in successive periods are independent identically distributed random variables with distributions concentrated on . Thus, if is Vicky's capital at period , then , where is the proportion of her capital she chooses to invest at period , and is the rate of return for period . Vicky desires to maximize her expected yield over periods, where the yield is defined as , and and are respectively her initial and final capital.
(a) Express the problem of finding an optimal policy in a dynamic programming framework.
(b) Show that in each time period, the optimal strategy can be expressed in terms of the quantity which solves the optimization problem . Show that if . [Do not calculate explicitly.]
(c) Compare her optimal policy with the policy which maximizes her expected final capital .
Paper 3, Section II, K
commentA particle follows a discrete-time trajectory on given by
for . Here is a fixed integer, is a real constant, and are the position of the particle and control action at time , respectively, and is a sequence of independent random vectors with
Find the optimal control, i.e. the control action , defined as a function of , that minimizes
where is given.
On which of and does the optimal control depend?
Find the limiting form of the optimal control as , and the minimal average cost per unit time.
Paper 4, Section II,
commentA file of gigabytes (GB) is to be transmitted over a communications link. At each time the sender can choose a transmission rate within the range GB per second. The charge for transmitting at rate at time is . The function is fully known at time . If it takes a total time to transmit the file then there is a delay cost of . Thus and are to be chosen to minimize
where and . Using Pontryagin's maximum principle, or otherwise, show that a property of the optimal policy is that there exists such that if and if .
Show that the optimal and are related by .
Suppose and . Show that it is optimal to transmit at a constant rate between times , where is the unique positive solution to the equation
Paper 2, Section II, K
commentConsider a Markov decision problem with finite state space , value function and dynamic programming equation , where
Suppose , and for all . Prove there exists a deterministic stationary Markov policy that is optimal, explaining what the italicised words mean.
Let , where , and . Prove that
Deduce that the value iteration algorithm converges to an optimal policy in a finite number of iterations.
Paper 3, Section II, K
commentConsider the system in scalar variables, for :
where is given, are observed at , but and are unobservable, and are independent random variables with mean 0 and variance . Define to be the estimator of with minimum variance amongst all estimators that are unbiased and linear functions of . Suppose and its variance is . After observation at of , a new unbiased estimator of , linear in , is expressed
Find and to minimize the variance of . Hence find in terms of , and . Calculate .
Suppose are Gaussian and thus . Consider minimizing , under the constraint that the control can only depend on . Show that the value function of dynamic programming for this problem can be expressed
where and is independent of and linear in .
Paper 4, Section II, K
commentState transversality conditions that can be used with Pontryagin's maximum principle and say when they are helpful.
Given , it is desired to maximize , where
and is a time-varying control such that and . Suppose that and are positive, and that and . Find the optimal control at times close to . Show that over the optimal control is constant, or makes exactly one switch, the latter happening if and only if
Paper 2, Section II,
commentAs a function of policy and initial state , let
where and for all . Suppose that for a specific policy , and all ,
Prove that for all and .
A gambler plays games in which he may bet 1 or 2 pounds, but no more than his present wealth. Suppose he has pounds after games. If he bets pounds then , or , with probabilities and respectively. Gambling terminates at the first such that or . His final reward is . Let be the policy of always betting 1 pound. Given , show that .
Is optimal when ?
Paper 3, Section II, K
commentA burglar having wealth may retire, or go burgling another night, in either of towns 1 or 2 . If he burgles in town then with probability he will, independently of previous nights, be caught, imprisoned and lose all his wealth. If he is not caught then his wealth increases by 0 or , each with probability and independently of what happens on other nights. Values of and are the same every night. He wishes to maximize his expected wealth at the point he retires, is imprisoned, or nights have elapsed.
Using the dynamic programming equation
with appropriately defined, prove that there exists an optimal policy under which he burgles another night if and only if his wealth is less than .
Suppose and . Prove that he should never burgle in town 2 .
[Hint: Suppose , there are nights to go, and it has been shown that he ought not burgle in town 2 if less than nights remain. For the case , separately consider subcases and . An interchange argument may help.]
Paper 4, Section II,
commentConsider the scalar system evolving as
where is a white noise sequence with and . It is desired to choose controls to minimize . Show that for the minimal cost is .
Find a constant and a function which solve
Let be the class of those policies for which every obeys the constraint . Show that , for all . Find, and prove optimal, a policy which over all minimizes
Paper 2, Section II, J
commentDescribe the elements of a discrete-time stochastic dynamic programming equation for the problem of maximizing the expected sum of non-negative rewards over an infinite horizon. Give an example to show that there may not exist an optimal policy. Prove that if a policy has a value function that satisfies the dynamic programming equation then the policy is optimal.
A squirrel collects nuts for the coming winter. There are plenty of nuts lying around, but each time the squirrel leaves its lair it risks being caught by a predator. Assume that the outcomes of the squirrel's journeys are independent, that it is caught with probability , and that it returns safely with a random weight of nuts, exponentially distributed with parameter . By solving the dynamic programming equation for the value function , find a policy maximizing the expected weight of nuts collected for the winter. Here the state variable takes values in (the weight of nuts so far collected) or (a no-return state when the squirrel is caught).
Paper 3, Section II, J
commentA particle follows a discrete-time trajectory on given by
for , where is a fixed integer, is a real constant, is the position of the particle and is the control action at time , and is a sequence of independent random vectors with and .
Find the closed-loop control, i.e. the control action defined as a function of , that minimizes
where is given. [Note that this function is quadratic in , but linear in .]
Does the closed-loop control depend on or on ? Deduce the form of the optimal open-loop control.
Paper 4, Section II, J
commentA girl begins swimming from a point on the bank of a straight river. She swims at a constant speed relative to the water. The speed of the downstream current at a distance from the shore is . Hence her trajectory is described by
where is the angle at which she swims relative to the direction of the current.
She desires to reach a downstream point on the same bank as she starts, as quickly as possible. Construct the Hamiltonian for this problem, and describe how Pontryagin's maximum principle can be used to give necessary conditions that must hold on an optimal trajectory. Given that is positive, increasing and differentiable in , show that on an optimal trajectory
Paper 2, Section II, K
commentSuppose is a Markov chain. Consider the dynamic programming equation
with , and . Prove that:
(i) is nondecreasing in ;
(ii) , where is the value function of an infinite-horizon problem that you should describe;
(iii) .
A coin lands heads with probability . A statistician wishes to choose between: and , one of which is true. Prior probabilities of and in the ratio change after one toss of the coin to ratio (if the toss was a head) or to ratio (if the toss was a tail). What problem is being addressed by the following dynamic programming equation?
Prove that is a convex function of .
By sketching a graph of , describe the form of the optimal policy.
Paper 3, Section II, K
commentA particle follows a discrete-time trajectory in given by
where is a white noise sequence with and . Given , we wish to choose to minimize .
Show that for some this problem can be reduced to one of controlling a scalar state .
Find, in terms of , the optimal . What is the change in minimum achievable when the system starts in as compared to when it starts in ?
Consider now a trajectory starting at . What value of is optimal if we wish to minimize ?
Paper 4, Section II, K
commentGiven , all positive, it is desired to choose to maximize
subject to .
Explain what Pontryagin's maximum principle guarantees about a solution to this problem.
Show that no matter whether is constrained or unconstrained there is a constant such that the optimal control is of the form . Find an expression for under the constraint .
Show that if is unconstrained then .
Paper 2, Section II, J
commentDescribe the elements of a generic stochastic dynamic programming equation for the problem of maximizing the expected sum of discounted rewards accrued at times What is meant by the positive case? What is specially true in this case that is not true in general?
An investor owns a single asset which he may sell once, on any of the days . On day he will be offered a price . This value is unknown until day , is independent of all other offers, and a priori it is uniformly distributed on . Offers remain open, so that on day he may sell the asset for the best of the offers made on days . If he sells for on day then the reward is . Show from first principles that if then there exists such that the expected reward is maximized by selling the first day the offer is at least .
For , find both and the expected reward under the optimal policy.
Explain what is special about the case .
Paper 3, Section II, J
commentA state variable is subject to dynamics
where is a scalar control variable constrained to the interval . Given an initial value , let denote the minimal time required to bring the state to . Prove that
Explain how this equation figures in Pontryagin's maximum principle.
Use Pontryagin's maximum principle to show that, on an optimal trajectory, only takes the values 1 and , and that it makes at most one switch between them.
Show that is optimal when .
Find the optimal control when .
Paper 4, Section II, J
commentA factory has a tank of capacity in which it stores chemical waste. Each week the factory produces, independently of other weeks, an amount of waste that is equally likely to be 0,1 , or . If the amount of waste exceeds the remaining space in the tank then the excess must be specially handled at a cost of per . The tank may be emptied or not at the end of each week. Emptying costs , plus a variable cost of for each of its content. It is always emptied when it ends the week full.
It is desired to minimize the average cost per week. Write down equations from which one can determine when it is optimal to empty the tank.
Find the average cost per week of a policy , which empties the tank if and only if its content at the end of the week is 2 or .
Describe the policy improvement algorithm. Explain why, starting from , this algorithm will find an optimal policy in at most three iterations.
Prove that is optimal if and only if .
Paper 2, Section II, K
commentConsider an optimal stopping problem in which the optimality equation takes the form
, and where for all . Let denote the stopping set of the onestep-look-ahead rule. Show that if is closed (in a sense you should explain) then the one-step-look-ahead rule is optimal.
biased coins are to be tossed successively. The probability that the th coin toss will show a head is known to be . At most once, after observing a head, and before tossing the next coin, you may guess that you have just seen the last head (i.e. that all subsequent tosses will show tails). If your guess turns out to be correct then you win .
Suppose that you have not yet guessed 'last head', and the th toss is a head. Show that it cannot be optimal to guess that this is the last head if
where .
Suppose that . Show that it is optimal to guess that the last head is the first head (if any) to occur after having tossed at least coins, where when is large.
Paper 3, Section II, 28K
commentAn observable scalar state variable evolves as Let controls be determined by a policy and define
Show that it is possible to express in terms of , which satisfies the recurrence
with .
Deduce that is defined as
By considering the policy which takes , show that .
Give an alternative description of in closed-loop form.
Paper 4, Section II, K
commentDescribe the type of optimal control problem that is amenable to analysis using Pontryagin's Maximum Principle.
A firm has the right to extract oil from a well over the interval . The oil can be sold at price per unit. To extract oil at rate when the remaining quantity of oil in the well is incurs cost at rate . Thus the problem is one of maximizing
subject to . Formulate the Hamiltonian for this problem.
Explain why , the adjoint variable, has a boundary condition .
Use Pontryagin's Maximum Principle to show that under optimal control
and
Find the oil remaining in the well at time , as a function of , and ,
Paper 2, Section II, J
comment(a) Suppose that
Prove that conditional on , the distribution of is again multivariate normal, with mean and covariance .
(b) The -valued process evolves in discrete time according to the dynamics
where is a constant matrix, and are independent, with common distribution. The process is not observed directly; instead, all that is seen is the process defined as
where are independent of each other and of the , with common distribution.
If the observer has the prior distribution for , prove that at all later times the distribution of conditional on is again normally distributed, with mean and covariance which evolve as
where
(c) In the special case where both and are one-dimensional, and , , find the form of the updating recursion. Show in particular that
and that
Hence deduce that, with probability one,
Paper 3, Section II, J
commentConsider an infinite-horizon controlled Markov process having per-period costs , where is the state of the system, and is the control. Costs are discounted at rate , so that the objective to be minimized is
What is meant by a policy for this problem?
Let denote the dynamic programming operator
Further, let denote the value of the optimal control problem:
where the infimum is taken over all policies , and denotes expectation under policy . Show that the functions defined by
increase to a limit Prove that . Prove that
Suppose that . Prove that .
[You may assume that there is a function such that
though the result remains true without this simplifying assumption.]
Paper 4, Section II, J
commentDr Seuss' wealth at time evolves as
where is the rate of interest earned, is his intensity of working , and is his rate of consumption. His initial wealth is given, and his objective is to maximize
where , and is the (fixed) time his contract expires. The constants and satisfy the inequalities , and . At all times, must be non-negative, and his final wealth must be non-negative. Establish the following properties of the optimal solution :
(i) ;
(ii) , where ;
(iii) for some constants and .
Hence deduce that the optimal wealth is
Paper 2, Section II, I
commentIn the context of stochastic dynamic programming, explain what is meant by an average-reward optimal policy.
A player has a fair coin and a six-sided die. At each epoch he may choose either to toss the coin or to roll the die. If he tosses the coin and it shows heads then he adds 1 to his total score, and if it shows tails then he adds 0 . If he rolls the die then he adds the number showing. He wins a reward of whenever his total score is divisible by 3 .
Suppose the player always tosses the coin. Find his average reward per toss.
Still using the above policy, and given that he starts with a total score of , let be the expected total reward over the next epochs. Find the value of
Use the policy improvement algorithm to find a policy that produces a greater average reward than the policy of only tossing the coin.
Find the average-reward optimal policy.
Paper 3, Section II, I
commentTwo scalar systems have dynamics
where and are independent sequences of independent and identically distributed random variables of mean 0 and variance 1 . Let
where is a policy in which depends on only .
Show that is a solution to the optimality equation satisfied by , for some and which you should find.
Find the optimal controls.
State a theorem that justifies .
For each of the two cases (a) and (b) , find controls which minimize
Paper 4, Section II, I
commentExplain how transversality conditions can be helpful when employing Pontryagin's Maximum Principle to solve an optimal control problem.
A particle in starts at and follows the dynamics
where controls and are to be chosen subject to .
Using Pontryagin's maximum principle do the following:
(a) Find controls that minimize ;
(b) Suppose we wish to choose and the controls to minimize under a constraint . By expressing both and in terms of the adjoint variables, show that on an optimal trajectory,
2.II.29I
commentConsider a stochastic controllable dynamical system with action-space and countable state-space . Thus and denotes the transition probability from to when taking action . Suppose that a cost is incurred each time that action is taken in state , and that this cost is uniformly bounded. Write down the dynamic optimality equation for the problem of minimizing the expected long-run average cost.
State in terms of this equation a general result, which can be used to identify an optimal control and the minimal long-run average cost.
A particle moves randomly on the integers, taking steps of size 1 . Suppose we can choose at each step a control parameter , where is fixed, which has the effect that the particle moves in the positive direction with probability and in the negative direction with probability . It is desired to maximize the long-run proportion of time spent by the particle at 0 . Show that there is a solution to the optimality equation for this example in which the relative cost function takes the form , for some constant .
Determine an optimal control and show that the maximal long-run proportion of time spent at 0 is given by
You may assume that it is valid to use an unbounded function in the optimality equation in this example.
3.II.28I
commentLet be a positive-definite symmetric matrix. Show that a non-negative quadratic form on of the form
is minimized over , for each , with value , by taking , where .
Consider for the controllable stochastic linear system in
starting from at time , where the control variables take values in , and where are independent, zero-mean random variables, with . Here, and are, respectively, and matrices. Assume that a cost is incurred at each time and that a final cost is incurred at time . Here, is a given non-negative-definite symmetric matrix. It is desired to minimize, over the set of all controls , the total expected cost . Write down the optimality equation for the infimal cost function .
Hence, show that has the form
for some non-negative-definite symmetric matrix and some real constant . Show how to compute the matrix and constant and how to determine an optimal control.
4.II.29I
State Pontryagin's maximum principle for the controllable dynamical system with state-space , given by
where the running costs are given by , up to an unconstrained terminal time when the state first reaches 0 , and there is a terminal cost .
A company pays a variable price per unit time for electrical power, agreed in advance, which depends on the time of day. The company takes on a job at time , which requires a total amount of electrical energy, but can be processed at a variable level of power consumption . If the job is completed by time , then the company will receive a reward . Thus, it is desired to minimize
subject to