Optimization And Control
Optimization And Control
Jump to year
Paper 2, Section II, K
commentDuring each of time periods a venture capitalist, Vicky, is presented with an investment opportunity for which the rate of return for that period is a random variable; the rates of return in successive periods are independent identically distributed random variables with distributions concentrated on . Thus, if is Vicky's capital at period , then , where is the proportion of her capital she chooses to invest at period , and is the rate of return for period . Vicky desires to maximize her expected yield over periods, where the yield is defined as , and and are respectively her initial and final capital.
(a) Express the problem of finding an optimal policy in a dynamic programming framework.
(b) Show that in each time period, the optimal strategy can be expressed in terms of the quantity which solves the optimization problem . Show that if . [Do not calculate explicitly.]
(c) Compare her optimal policy with the policy which maximizes her expected final capital .
Paper 3, Section II, K
commentA particle follows a discrete-time trajectory on given by
for . Here is a fixed integer, is a real constant, and are the position of the particle and control action at time , respectively, and is a sequence of independent random vectors with
Find the optimal control, i.e. the control action , defined as a function of , that minimizes
where is given.
On which of and does the optimal control depend?
Find the limiting form of the optimal control as , and the minimal average cost per unit time.
Paper 4, Section II,
commentA file of gigabytes (GB) is to be transmitted over a communications link. At each time the sender can choose a transmission rate within the range GB per second. The charge for transmitting at rate at time is . The function is fully known at time . If it takes a total time to transmit the file then there is a delay cost of . Thus and are to be chosen to minimize
where and . Using Pontryagin's maximum principle, or otherwise, show that a property of the optimal policy is that there exists such that if and if .
Show that the optimal and are related by .
Suppose and . Show that it is optimal to transmit at a constant rate between times , where is the unique positive solution to the equation
Paper 2, Section II, K
commentConsider a Markov decision problem with finite state space , value function and dynamic programming equation , where
Suppose , and for all . Prove there exists a deterministic stationary Markov policy that is optimal, explaining what the italicised words mean.
Let , where , and . Prove that
Deduce that the value iteration algorithm converges to an optimal policy in a finite number of iterations.
Paper 3, Section II, K
commentConsider the system in scalar variables, for :
where is given, are observed at , but and are unobservable, and are independent random variables with mean 0 and variance . Define to be the estimator of with minimum variance amongst all estimators that are unbiased and linear functions of . Suppose and its variance is . After observation at of , a new unbiased estimator of , linear in , is expressed
Find and to minimize the variance of . Hence find in terms of , and . Calculate .
Suppose are Gaussian and thus . Consider minimizing , under the constraint that the control can only depend on . Show that the value function of dynamic programming for this problem can be expressed
where and is independent of and linear in .
Paper 4, Section II, K
commentState transversality conditions that can be used with Pontryagin's maximum principle and say when they are helpful.
Given , it is desired to maximize , where
and is a time-varying control such that and . Suppose that and are positive, and that and . Find the optimal control at times close to . Show that over the optimal control is constant, or makes exactly one switch, the latter happening if and only if
Paper 2, Section II,
commentAs a function of policy and initial state , let
where and for all . Suppose that for a specific policy , and all ,
Prove that for all and .
A gambler plays games in which he may bet 1 or 2 pounds, but no more than his present wealth. Suppose he has pounds after games. If he bets pounds then , or , with probabilities and respectively. Gambling terminates at the first such that or . His final reward is . Let be the policy of always betting 1 pound. Given , show that .
Is optimal when ?
Paper 3, Section II, K
commentA burglar having wealth may retire, or go burgling another night, in either of towns 1 or 2 . If he burgles in town then with probability he will, independently of previous nights, be caught, imprisoned and lose all his wealth. If he is not caught then his wealth increases by 0 or , each with probability and independently of what happens on other nights. Values of and are the same every night. He wishes to maximize his expected wealth at the point he retires, is imprisoned, or nights have elapsed.
Using the dynamic programming equation
with appropriately defined, prove that there exists an optimal policy under which he burgles another night if and only if his wealth is less than .
Suppose and . Prove that he should never burgle in town 2 .
[Hint: Suppose , there are nights to go, and it has been shown that he ought not burgle in town 2 if less than nights remain. For the case , separately consider subcases and . An interchange argument may help.]
Paper 4, Section II,
commentConsider the scalar system evolving as
where is a white noise sequence with and . It is desired to choose controls to minimize . Show that for the minimal cost is .
Find a constant and a function which solve
Let be the class of those policies for which every obeys the constraint . Show that , for all . Find, and prove optimal, a policy which over all minimizes
Paper 2, Section II, J
commentDescribe the elements of a discrete-time stochastic dynamic programming equation for the problem of maximizing the expected sum of non-negative rewards over an infinite horizon. Give an example to show that there may not exist an optimal policy. Prove that if a policy has a value function that satisfies the dynamic programming equation then the policy is optimal.
A squirrel collects nuts for the coming winter. There are plenty of nuts lying around, but each time the squirrel leaves its lair it risks being caught by a predator. Assume that the outcomes of the squirrel's journeys are independent, that it is caught with probability , and that it returns safely with a random weight of nuts, exponentially distributed with parameter . By solving the dynamic programming equation for the value function , find a policy maximizing the expected weight of nuts collected for the winter. Here the state variable takes values in (the weight of nuts so far collected) or (a no-return state when the squirrel is caught).
Paper 3, Section II, J
commentA particle follows a discrete-time trajectory on given by
for , where is a fixed integer, is a real constant, is the position of the particle and is the control action at time , and is a sequence of independent random vectors with and .
Find the closed-loop control, i.e. the control action defined as a function of , that minimizes
where is given. [Note that this function is quadratic in , but linear in .]
Does the closed-loop control depend on or on ? Deduce the form of the optimal open-loop control.
Paper 4, Section II, J
commentA girl begins swimming from a point on the bank of a straight river. She swims at a constant speed relative to the water. The speed of the downstream current at a distance from the shore is . Hence her trajectory is described by
where is the angle at which she swims relative to the direction of the current.
She desires to reach a downstream point on the same bank as she starts, as quickly as possible. Construct the Hamiltonian for this problem, and describe how Pontryagin's maximum principle can be used to give necessary conditions that must hold on an optimal trajectory. Given that is positive, increasing and differentiable in , show that on an optimal trajectory
Paper 2, Section II, K
commentSuppose is a Markov chain. Consider the dynamic programming equation
with , and . Prove that:
(i) is nondecreasing in ;
(ii) , where is the value function of an infinite-horizon problem that you should describe;
(iii) .
A coin lands heads with probability . A statistician wishes to choose between: and , one of which is true. Prior probabilities of and in the ratio change after one toss of the coin to ratio (if the toss was a head) or to ratio (if the toss was a tail). What problem is being addressed by the following dynamic programming equation?
Prove that is a convex function of .
By sketching a graph of , describe the form of the optimal policy.
Paper 3, Section II, K
commentA particle follows a discrete-time trajectory in given by
where is a white noise sequence with and . Given , we wish to choose to minimize .
Show that for some this problem can be reduced to one of controlling a scalar state .
Find, in terms of , the optimal . What is the change in minimum achievable when the system starts in as compared to when it starts in ?
Consider now a trajectory starting at . What value of is optimal if we wish to minimize ?
Paper 4, Section II, K
commentGiven , all positive, it is desired to choose to maximize
subject to .
Explain what Pontryagin's maximum principle guarantees about a solution to this problem.
Show that no matter whether is constrained or unconstrained there is a constant such that the optimal control is of the form . Find an expression for under the constraint .
Show that if is unconstrained then .
Paper 2, Section II, J
commentDescribe the elements of a generic stochastic dynamic programming equation for the problem of maximizing the expected sum of discounted rewards accrued at times What is meant by the positive case? What is specially true in this case that is not true in general?
An investor owns a single asset which he may sell once, on any of the days . On day he will be offered a price . This value is unknown until day , is independent of all other offers, and a priori it is uniformly distributed on . Offers remain open, so that on day he may sell the asset for the best of the offers made on days . If he sells for on day then the reward is . Show from first principles that if then there exists such that the expected reward is maximized by selling the first day the offer is at least .
For , find both and the expected reward under the optimal policy.
Explain what is special about the case .
Paper 3, Section II, J
commentA state variable is subject to dynamics
where is a scalar control variable constrained to the interval . Given an initial value , let denote the minimal time required to bring the state to . Prove that
Explain how this equation figures in Pontryagin's maximum principle.
Use Pontryagin's maximum principle to show that, on an optimal trajectory, only takes the values 1 and , and that it makes at most one switch between them.
Show that is optimal when .
Find the optimal control when .
Paper 4, Section II, J
commentA factory has a tank of capacity in which it stores chemical waste. Each week the factory produces, independently of other weeks, an amount of waste that is equally likely to be 0,1 , or . If the amount of waste exceeds the remaining space in the tank then the excess must be specially handled at a cost of per . The tank may be emptied or not at the end of each week. Emptying costs , plus a variable cost of for each of its content. It is always emptied when it ends the week full.
It is desired to minimize the average cost per week. Write down equations from which one can determine when it is optimal to empty the tank.
Find the average cost per week of a policy , which empties the tank if and only if its content at the end of the week is 2 or .
Describe the policy improvement algorithm. Explain why, starting from , this algorithm will find an optimal policy in at most three iterations.
Prove that is optimal if and only if .
Paper 2, Section II, K
commentConsider an optimal stopping problem in which the optimality equation takes the form
, and where for all . Let denote the stopping set of the onestep-look-ahead rule. Show that if is closed (in a sense you should explain) then the one-step-look-ahead rule is optimal.
biased coins are to be tossed successively. The probability that the th coin toss will show a head is known to be . At most once, after observing a head, and before tossing the next coin, you may guess that you have just seen the last head (i.e. that all subsequent tosses will show tails). If your guess turns out to be correct then you win .
Suppose that you have not yet guessed 'last head', and the th toss is a head. Show that it cannot be optimal to guess that this is the last head if
where .
Suppose that . Show that it is optimal to guess that the last head is the first head (if any) to occur after having tossed at least coins, where when is large.
Paper 3, Section II, 28K
commentAn observable scalar state variable evolves as Let controls be determined by a policy and define
Show that it is possible to express in terms of , which satisfies the recurrence
with .
Deduce that is defined as
By considering the policy which takes , show that .
Give an alternative description of in closed-loop form.
Paper 4, Section II, K
commentDescribe the type of optimal control problem that is amenable to analysis using Pontryagin's Maximum Principle.
A firm has the right to extract oil from a well over the interval . The oil can be sold at price per unit. To extract oil at rate when the remaining quantity of oil in the well is incurs cost at rate . Thus the problem is one of maximizing
subject to . Formulate the Hamiltonian for this problem.
Explain why , the adjoint variable, has a boundary condition .
Use Pontryagin's Maximum Principle to show that under optimal control
and
Find the oil remaining in the well at time , as a function of , and ,
Paper 2, Section II, J
comment(a) Suppose that
Prove that conditional on , the distribution of is again multivariate normal, with mean and covariance .
(b) The -valued process evolves in discrete time according to the dynamics
where is a constant matrix, and are independent, with common distribution. The process is not observed directly; instead, all that is seen is the process defined as
where are independent of each other and of the , with common distribution.
If the observer has the prior distribution for , prove that at all later times the distribution of conditional on is again normally distributed, with mean and covariance which evolve as
where
(c) In the special case where both and are one-dimensional, and , , find the form of the updating recursion. Show in particular that
and that
Hence deduce that, with probability one,
Paper 3, Section II, J
commentConsider an infinite-horizon controlled Markov process having per-period costs , where is the state of the system, and is the control. Costs are discounted at rate , so that the objective to be minimized is
What is meant by a policy for this problem?
Let denote the dynamic programming operator
Further, let denote the value of the optimal control problem:
where the infimum is taken over all policies , and denotes expectation under policy . Show that the functions defined by
increase to a limit Prove that . Prove that
Suppose that . Prove that .
[You may assume that there is a function such that
though the result remains true without this simplifying assumption.]
Paper 4, Section II, J
commentDr Seuss' wealth at time evolves as
where is the rate of interest earned, is his intensity of working , and is his rate of consumption. His initial wealth is given, and his objective is to maximize
where , and is the (fixed) time his contract expires. The constants and satisfy the inequalities , and . At all times, must be non-negative, and his final wealth must be non-negative. Establish the following properties of the optimal solution :
(i) ;
(ii) , where ;
(iii) for some constants and .
Hence deduce that the optimal wealth is
Paper 2, Section II, I
commentIn the context of stochastic dynamic programming, explain what is meant by an average-reward optimal policy.
A player has a fair coin and a six-sided die. At each epoch he may choose either to toss the coin or to roll the die. If he tosses the coin and it shows heads then he adds 1 to his total score, and if it shows tails then he adds 0 . If he rolls the die then he adds the number showing. He wins a reward of whenever his total score is divisible by 3 .
Suppose the player always tosses the coin. Find his average reward per toss.
Still using the above policy, and given that he starts with a total score of , let be the expected total reward over the next epochs. Find the value of
Use the policy improvement algorithm to find a policy that produces a greater average reward than the policy of only tossing the coin.
Find the average-reward optimal policy.
Paper 3, Section II, I
commentTwo scalar systems have dynamics
where and are independent sequences of independent and identically distributed random variables of mean 0 and variance 1 . Let
where is a policy in which depends on only .
Show that is a solution to the optimality equation satisfied by , for some and which you should find.
Find the optimal controls.
State a theorem that justifies .
For each of the two cases (a) and (b) , find controls which minimize
Paper 4, Section II, I
commentExplain how transversality conditions can be helpful when employing Pontryagin's Maximum Principle to solve an optimal control problem.
A particle in starts at and follows the dynamics
where controls and are to be chosen subject to .
Using Pontryagin's maximum principle do the following:
(a) Find controls that minimize ;
(b) Suppose we wish to choose and the controls to minimize under a constraint . By expressing both and in terms of the adjoint variables, show that on an optimal trajectory,
2.II.29I
commentConsider a stochastic controllable dynamical system with action-space and countable state-space . Thus and denotes the transition probability from to when taking action . Suppose that a cost is incurred each time that action is taken in state , and that this cost is uniformly bounded. Write down the dynamic optimality equation for the problem of minimizing the expected long-run average cost.
State in terms of this equation a general result, which can be used to identify an optimal control and the minimal long-run average cost.
A particle moves randomly on the integers, taking steps of size 1 . Suppose we can choose at each step a control parameter , where is fixed, which has the effect that the particle moves in the positive direction with probability and in the negative direction with probability . It is desired to maximize the long-run proportion of time spent by the particle at 0 . Show that there is a solution to the optimality equation for this example in which the relative cost function takes the form , for some constant .
Determine an optimal control and show that the maximal long-run proportion of time spent at 0 is given by
You may assume that it is valid to use an unbounded function in the optimality equation in this example.
3.II.28I
commentLet be a positive-definite symmetric matrix. Show that a non-negative quadratic form on of the form
is minimized over , for each , with value , by taking , where .
Consider for the controllable stochastic linear system in
starting from at time , where the control variables take values in , and where are independent, zero-mean random variables, with . Here, and are, respectively, and matrices. Assume that a cost is incurred at each time and that a final cost is incurred at time . Here, is a given non-negative-definite symmetric matrix. It is desired to minimize, over the set of all controls , the total expected cost . Write down the optimality equation for the infimal cost function .
Hence, show that has the form
for some non-negative-definite symmetric matrix and some real constant . Show how to compute the matrix and constant and how to determine an optimal control.
4.II.29I
commentState Pontryagin's maximum principle for the controllable dynamical system with state-space , given by
where the running costs are given by , up to an unconstrained terminal time when the state first reaches 0 , and there is a terminal cost .
A company pays a variable price per unit time for electrical power, agreed in advance, which depends on the time of day. The company takes on a job at time , which requires a total amount of electrical energy, but can be processed at a variable level of power consumption . If the job is completed by time , then the company will receive a reward . Thus, it is desired to minimize
subject to
with unconstrained. Take as state variable the energy still needed at time to complete the job. Use Pontryagin's maximum principle to show that the optimal control is to process the job on full power or not at all, according as the price lies below or above a certain threshold value .
Show further that, if is the completion time for the optimal control, then
Consider a case in which is periodic, with period one day, where day 1 corresponds to the time interval , and during day 1 . Suppose also that and . Determine the total energy cost and the reward associated with the threshold .
Hence, show that any threshold low enough to carry processing over into day 2 is suboptimal.
Show carefully that the optimal price threshold is given by .
2.II.29I
commentState Pontryagin's maximum principle in the case where both the terminal time and the terminal state are given.
Show that is the minimum value taken by the integral
subject to the constraints and , where
[You may find it useful to note the fact that the problem is rotationally symmetric about the -axis, so that the angle made by the initial velocity with the positive -axis may be chosen arbitrarily.]
3.II.28I
commentLet be a discrete-time controllable dynamical system (or Markov decision process) with countable state-space and action-space . Consider the -horizon dynamic optimization problem with instantaneous costs , on choosing action in state at time , with terminal cost , in state at time . Explain what is meant by a Markov control and how the choice of a control gives rise to a time-inhomogeneous Markov chain.
Suppose we can find a bounded function and a Markov control such that
with equality when , and such that for all . Here denotes the expected value of , given that we choose action in state at time . Show that is an optimal Markov control.
A well-shuffled pack of cards is placed face-down on the table. The cards are turned over one by one until none are left. Exactly once you may place a bet of on the event that the next two cards will be red. How should you choose the moment to bet? Justify your answer.
4.II.29I
commentConsider the scalar controllable linear system, whose state evolves by
with observations given by
Here, is the control variable, which is to be determined on the basis of the observations up to time , and are independent random variables. You wish to minimize the long-run average expected cost, where the instantaneous cost at time is . You may assume that the optimal control in equilibrium has the form , where is given by a recursion of the form
and where is chosen so that is independent of the observations up to time . Show that , and determine the minimal long-run average expected cost. You are not expected to simplify the arithmetic form of your answer but should show clearly how you have obtained it.
2.II.29I
commentA policy is to be chosen to maximize
where . Assuming that , prove that is optimal if satisfies the optimality equation.
An investor receives at time an income of of which he spends , subject to . The reward is , and his income evolves as
where is a sequence of independent random variables with common mean . If , show that the optimal policy is to take for all .
What can you say about the problem if
3.II.28I
commentA discrete-time controlled Markov process evolves according to
where the are independent zero-mean random variables with common variance , and is a known constant.
Consider the problem of minimizing
where and . Show that the optimal control at time takes the form for certain constants . Show also that the minimized value for is of the form
for certain constants . Explain how these constants are to be calculated. Prove that the equation
has a unique positive solution , and that the sequence converges monotonically to .
Prove that the sequence converges, to the limit
Finally, prove that .
4.II.29I
commentAn investor has a (possibly negative) bank balance at time . For given positive and , he wishes to choose his spending rate so as to maximize
where . Find the investor's optimal choice of control .
Let denote the optimally-controlled bank balance. By considering next how depends on , show that there is a unique positive such that . If the original problem is modified by setting , but requiring that , show that the optimal control for this modified problem is .
2.II.29I
commentExplain what is meant by a time-homogeneous discrete time Markov decision problem.
What is the positive programming case?
A discrete time Markov decision problem has state space . In state , two actions are possible. We may either stop and obtain a terminal reward , or may continue, in which case the subsequent state is equally likely to be or . In states 0 and stopping is automatic (with terminal rewards and respectively). Starting in state , denote by and the maximal expected terminal reward that can be obtained over the first steps and over the infinite horizon, respectively. Prove that .
Prove that is the smallest concave function such that for all .
Describe an optimal policy.
Suppose are distinct numbers. Show that the optimal policy is unique, or give a counter-example.
3.II.28I
commentConsider the problem
where for ,
is the control variable, and is Gaussian white noise. Show that the problem can be rewritten as one of controlling the scalar variable , where
By guessing the form of the optimal value function and ensuring it satisfies an appropriate optimality equation, show that the optimal control is
Is this certainty equivalence control?
4.II.29I
commentA continuous-time control problem is defined in terms of state variable and control . We desire to minimize , where is fixed and is unconstrained. Given and , describe further boundary conditions that can be used in conjunction with Pontryagin's maximum principle to find and the adjoint variables .
Company 1 wishes to steal customers from Company 2 and maximize the profit it obtains over an interval . Denoting by the number of customers of Company , and by the advertising effort of Company 1 , this leads to a problem
where , and is constrained to the interval . Assuming , use Pontryagin's maximum principle to show that the optimal advertising policy is bang-bang, and that there is just one change in advertising effort, at a time , where
B2.15
commentA gambler is presented with a sequence of random numbers, , one at a time. The distribution of is
where . The gambler must choose exactly one of the numbers, just after it has been presented and before any further numbers are presented, but must wait until all the numbers are presented before his payback can be decided. It costs to play the game. The gambler receives payback as follows: nothing if he chooses the smallest of all the numbers, if he chooses the largest of all the numbers, and otherwise.
Show that there is an optimal strategy of the form "Choose the first number such that either (i) and , or (ii) ", where you should determine the constant as explicitly as you can.
B3.14
commentThe strength of the economy evolves according to the equation
where and is the effort that the government puts into reform at time . The government wishes to maximize its chance of re-election at a given future time , where this chance is some monotone increasing function of
Use Pontryagin's maximum principle to determine the government's optimal reform policy, and show that the optimal trajectory of is
B4.14
commentConsider the deterministic dynamical system
where and are constant matrices, , and is the control variable, . What does it mean to say that the system is controllable?
Let . Show that if is the set of possible values for as the control is allowed to vary, then is a vector space.
Show that each of the following three conditions is equivalent to controllability of the system.
(i) The set for all .
(ii) The matrix is (strictly) positive definite.
(iii) The matrix has rank .
Consider the scalar system
where . Show that this system is controllable.
B2.15
commentThe owner of a put option may exercise it on any one of the days , or not at all. If he exercises it on day , when the share price is , his profit will be . Suppose the share price obeys , where are i.i.d. random variables for which . Let be the maximal expected profit the owner can obtain when there are further days to go and the share price is . Show that
(a) is non-decreasing in ,
(b) is non-decreasing in , and
(c) is continuous in .
Deduce that there exists a non-decreasing sequence, , such that expected profit is maximized by exercising the option the first day that .
Now suppose that the option never expires, so effectively . Show by examples that there may or may not exist an optimal policy of the form 'exercise the option the first day that .
B3.14
commentState Pontryagin's Maximum Principle (PMP).
In a given lake the tonnage of fish, , obeys
where is the rate at which fish are extracted. It is desired to maximize
choosing under the constraints , and if . Assume the PMP with an appropriate Hamiltonian . Now define and . Show that there exists such that on the optimal trajectory maximizes
and
Suppose that and that under an optimal policy it is not optimal to extract all the fish. Argue that is impossible and describe qualitatively what must happen under the optimal policy.
B4.14
commentThe scalars , are related by the equations
where is a sequence of uncorrelated random variables with means of 0 and variances of 1. Given that is an unbiased estimate of of variance 1 , the control variable is to be chosen at time on the basis of the information , where and . Let be the Kalman filter estimates of computed from
by appropriate choices of . Show that the variance of is .
Define and
Show that , where and
How would the expression for differ if had a variance different from
B2.15
commentState Pontryagin's maximum principle (PMP) for the problem of minimizing
where ; here, and are given, and is unconstrained.
Consider the two-dimensional problem in which , and . Show that, by use of a variable , one can rewrite this problem as an equivalent one-dimensional problem.
Use PMP to solve this one-dimensional problem, showing that the optimal control can be expressed as , where .
Express in a feedback form of for some .
Suppose that the initial state is perturbed by a small amount to . Give an expression (in terms of and ) for the increase in minimal cost.
B3.14
commentConsider a scalar system with , where is a sequence of independent random variables, uniform on the interval , with . We wish to choose to minimize the expected value of
where is chosen knowing but not . Prove that the minimal expected cost can be written and derive a recurrence for calculating .
How does your answer change if is constrained to lie in the set
Consider a stopping problem for which there are two options in state :
(1) stop: paying a terminal cost ; no further costs are incurred;
(2) continue: choosing , paying , and moving to state
Consider the problem of minimizing total expected cost subject to the constraint that no more than continuation steps are allowed. Suppose . Show that an optimal policy stops if and only if either continuation steps have already been taken or .
[Hint: Use induction on to show that a one-step-look-ahead rule is optimal. You should not need to find the optimal for the continuation steps.]
B4.14
commentA discrete-time decision process is defined on a finite set of states as follows. Upon entry to state at time the decision-maker observes a variable . He then chooses the next state freely within , at a cost of . Here is a sequence of integer-valued, identically distributed random variables. Suppose there exist and such that for all
Let denote a policy. Show that
At the start of each month a boat manufacturer receives orders for 1, 2 or 3 boats. These numbers are equally likely and independent from month to month. He can produce boats in a month at a cost of units. All orders are filled at the end of the month in which they are ordered. It is possible to make extra boats, ending the month with a stock of unsold boats, but cannot be more than 2 , and a holding cost of is incurred during any month that starts with unsold boats in stock. Write down an optimality equation that can be used to find the long-run expected average-cost.
Let be the policy of only ever producing sufficient boats to fill the present month's orders. Show that it is optimal if and only if .
Suppose . Starting from , what policy is obtained after applying one step of the policy-improvement algorithm?
B2.15
commentA street trader wishes to dispose of counterfeit Swiss watches. If he offers one for sale at price he will sell it with probability . Here is known and less than 1 . Subsequent to each attempted sale (successful or not) there is a probability that he will be arrested and can make no more sales. His aim is to choose the prices at which he offers the watches so as to maximize the expected values of his sales up until the time he is arrested or has sold all watches.
Let be the maximum expected amount he can obtain when he has watches remaining and has not yet been arrested. Explain why is the solution to
Denote the optimal price by and show that
and that
Show inductively that is a nondecreasing and concave function of .
B3.14
commentA file of is to be transmitted over a communications link. At each time the sender can choose a transmission rate, , within the range Mb per second. The charge for transmitting at rate at time is . The function is fully known at time 0. If it takes a total time to transmit the file then there is a delay cost of , . Thus and are to be chosen to minimize
where and . Quoting and applying appropriate results of Pontryagin's maximum principle show that a property of the optimal policy is that there exists such that if and if .
Show that the optimal and are related by .
Suppose and . For what value of is it optimal to transmit at a constant rate 1 between times and ?
B4.14
commentConsider the scalar system with plant equation and cost
Show from first principles that , where and for
Show that as .
Prove that is minimized by the stationary control, for all .
Consider the stationary policy that has for all . What is the value of under this policy?
Consider the following algorithm, in which steps 1 and 2 are repeated as many times as desired.
- For a given stationary policy , for which for all , determine the value of under this policy as by solving for in
- Now find as the minimizer of
and define as the policy for which for all .
Explain why is guaranteed to be a better policy than .
Let be the stationary policy with . Determine and verify that it minimizes to within of its optimum.