







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Methods of Dynamic Analysis and Control course major topics are dynamic programming, linear-quadratic model, continuous time dynamics, The Maximum Principle, Event uncertainty, Time consistency, Discounting and others. This lecture major points are: Dynamic Programming, Linear Quadratic, Dynamic Programming Equation, Familiar Static Problem, Control Rule, Arbitrary Time, Environmental State, Linear-Quadratic Control, International Economic, Infinite Horizon Autonomous Problem
Typology: Study notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








V. Dynamic Programming
Choose a sequence of ut , i.e. {u}Tt =1 , to max -Σti=1 (x (^2) i + u (^2) i ) subject to xi = xi-1 + u (^) i, x0 given.
x is the state variable and u is the control. We solve this problem by "working backwards" from the final time, T. At i = T the problem is simply:
choose uT to max-[(x (^) T-1 + u (^) T) 2 + u (^2) T ],
obtained by substituting in the constraint. We can solve this problem (easily!) for arbitrary values of x (^) T-1 and obtain the optimal control u*(x (^) T-1, T). This function (the solution to the optimization problem) is called the control rule. It is in "feedback form", i.e., it is expressed as a function of the state. Substituting this control rule into the maximand, we obtain the value function J(xT-1,T). Both the control rule and value function have two arguments, the state variable and calendar time.
Now we step back a period, to time T-1. At that time we know the previous value of the state, xT-2, and we know how we will behave in the future, conditional on the state. We can therefore write the problem at T-1 as
choose uT-1 to max -[(xT-2 + u (^) T-1)^2 + u (^2) T-1] + J(x (^) T-2 + uT-1 , T).
Note that this is a familiar static problem. At time T-1 I have a 2 period (i.e. dynamic) problem, but I have converted it to a static problem. Notice also that I substituted the constraint into the period T-1 payoff and the value function at T.
I can solve the problem at T-1 and obtain the control rule, u*(x (^) T-2,T-1). Again, note that this control rule is conditional upon xT-2. I need to know the value of x (^) T-2 to find the value of uT-1,
but I do not need the value of x (^) T-2 to find the control rule u*(xT-2 ,T-1). Note the distinction between a function (the control rule) and the value of the function. Substitute the control rule into the maximand to obtain the value function at T-1:
J(xT-2 ,T-1) = max -[(xT-2 + u (^) T-1) 2 + u (^2) T-1 ] + J(x (^) T-2 + u (^) T-1, T) = -[(x (^) T-2 + u* (^) T-1) 2 + u* (^2) T-1] + J(xT-2 + u*T-1, T).
I got rid of the max operator by substituting in the optimal control. The reason that I can actually solve this problem, of course, is that I know the function J(xT-1 ,T).
I can keep "stepping back" in time in this manner. At arbitrary time t, I can write the problem as
J(xt-1 ,t) = max { -[x^2 t + u (^2) t ] + J(x (^) t, t+1)}, subject to xt = x (^) t-1 + u (^) t, xt-1 given.
The last equation is the Dynamic Programming Equation for the original optimal control problem. You should be able to actually solve this problem. Solving the problem means finding the control rules and value functions. You will find that the control rule is linear in the state and the value function is linear-quadratic in the state. The coefficients in these functions depend on the number of "periods to go", T-t.
You should also be able to write down the dynamic programming equation for any discrete time control problem. (Exam question.)
Exercises:
Solve the control problem described above for T = 3
An environmental state can take two values. The flow of damages in a good state is D (^) g = 0 and the flow of damages in a bad state is D (^) b > 0. The amount of abatement period t is xt and the cost of abatement is c(xt), where c is increasing and convex. The probability that the state changes from good to bad is p(xt ). If the state is bad, it never changes back to good. The discount factor is β = 1/(1+r), and the objective is to minimize the expected present discounted value of total costs (abatement costs and environmental damages.) Write down the dynamic programming equation and the first order condition for optimality of x in a good state. How does the optimal level of emissions (in a good state) depend on β?
Suppose that emissions costs are c(x, θ) = (a + θ)x + bx^2 /2, where θ is iid with mean 0 and standard deviation σ. The regulator knows σ but does not observe the value of θ in any period. The regulator can choose (i) a quota, i.e., the level of xt directly, or (ii) a subsidy τt. If he chooses a subsidy, firms choose abatement to minimize c(x,θ) - τx. (In each period, the government announces τ and firms choose abatement after observing the realization of θ.) The subsidy is a pure transfer. Which policy gives the government a higher expected payoff? Make any assumptions you want.
J y (^) T 1 , T max xT
AT y (^) T 1 CT x (^) T ′ HT AT y (^) T 1 CT x (^) T
uT
u′T HT uT
where HT ≡ KT
terms that involve xT are:
x′T CT HT CT x (^) T 2 x′T C′T HT AT y (^) T 1
max w.r.t. xT. The F.O.C. is
2 C′T HT AT y (^) T 1 CT x (^) T 0 ⇒
x (^) T C′T HT CT^1 CT HT AT y (^) T 1
x (^) T GT y (^) T 1
This is the control rule. Sub (4) into (3′) to get
(5) J y (^) T 1 , T AT CT GT y (^) T 1 ′ HT AT CT GT y (^) T 1 constant
constant = E u′T HT uT
E tr HT u (^) T u′T tr HT VT
conclude: J y (^) T 1 , T quadratic in y (^) T-
x (^) T linear in yT-
"Guess" that J y (^) t , t 1 is of form
(6) J y (^) t , t 1 At 1 Ct 1 Gt 1 y (^) t ′ Ht 1 At 1 Ct 1 Gt 1 qt 1
and xt+1 = Gt+1 y (^) t
If the guess is correct at time t+1 we can show by direct substitution that it is also correct at time t. (This means that we use the "guess" to solve the problem at time t, and we show that the solution is of the conjectured form.) We know that this guess is correct at time T. By induction, then, the guess is correct at all t.
Use DPE and state equation
J y (^) t 1 , t max xt
u (^) t
Aty (^) t 1 Ctx (^) t ut ′ Kt Aty (^) t 1 Ctx (^) t ut
At 1 Ct 1 Gt 1 Aty (^) t 1 Ctx (^) t ut ′ Ht 1 At 1 Ct 1 Gt 1 Aty (^) t 1 Ctx (^) t ut
take expectations
J y (^) t 1 , t max xt
Aty (^) t 1 Ctx (^) t ′ Kt At 1 Ct 1 Gt 1 ′ Ht 1 At 1 Ct 1 Gt 1.
Aty (^) t 1 Ctx (^) t constant.
(the "constant" does not involve x or y but changes over time)
define:
Ht Kt At 1 Ct 1 Gt 1 ′ Ht 1 At 1 Ct 1 Gt 1
so (7) can be written
(9) J y (^) t 1 , t max + constant xt
At y (^) t 1 Ctx (^) t ′ Ht At y (^) t 1 Ctx (^) t
π yτ^ T τ t
T
τ t
β τ^ t^ y ′τ Q yτ
This is the present value of the stream of future profits. Suppose decision maker has a CARA utility function, with risk parameter k > 0.
Can model risk aversion using LEG. Linear feedback law depends on variance of r.v. (Certainty Equivalence does not hold.) Can also solve problem in which control appears linearly. Example: dynamic hedging with uncertain production; see Karp, Larry S. "Dynamic Hedging with Uncertain Production." International Economic Review, Vol. 29, No. 4 (1988) pp. 621-637.
(ii) Uncertain parameters. e.g. you know the mean and variance of matrices A and B (multiplicative rather than additive noise). You can solve this problem to get linear feedback rule. Certainty equivalence does not hold.
Derivation of Dynamic Programming Equation (DPE), continuous time
Statement of the problem:
(1) J(x , t) max {u} ⌡
T
t
f (x , u , τ) dτ K(x , T)
s.t. x˙ = g(x,u,τ) xt given
J ( ) is Value Function - the maximized value of objective.
J(x , t) max {u} ⌡
⌠t dt t
f ( )dτ ⌡
t dt
f ( ) dτ K (x , T )
max {u} ⌡
⌠t dt t
f ( )dτ J (x dx , t dt)
(by Principle of Optimality)
expand around dt = 0
J( ) max u
f ( )dt J(x , t)
∂x
dx
∂t
dt h.o.t.
divide by dt, recognize that J (^) t is independent of control u
(2) J (^) t max u
f
∂x
g( )
Equation (2) is the DPE.
(3) H max u
f λg( )
If partial derivative exist,
(4) – J (^) t = H(t)
(5) J (^) x = λ
Use (4) and (5) to develop heuristic argument for transversality conditions of Maximum Principle. (The point of this exercise is to help you develop intuition for the transversality conditions - to see why they make sense - and to help you remember them.)
Using (1), we have as a definition
(6) J (x, T) ≡ K (x, T)
Question: When is J differentiable in x and t at T? Suppose it is differentiable in x and T. Then (6) implies
(7) J (^) x(x , T ) ∂K(x , t) ∂x
and (5) and (7) imply
f (x , u , t) e rt^ L(x , u)
˙x g(x , u)
T ∞
"guess" J (x,t) = e–rtV(x) ⇒
J (^) t = – re –rtV(x)
J (^) x = e –rtVx(x)
Note: V(x) does not depend on t
Sub above three equalities into (2), multiply by ert, ⇒
(11) rV(x) max u
L(x , u) Vx(x) g(x , u)
Here V (^) x equals the current value costate variable. V(x) is the current value of the program, given x.
Remember that in section 3 we showed how to use the FOC of from the Maximum Principal to write the optimal control as an ODE in the state variable. We are going to achieve the same result using DP.
Steps: (i) Write the FOC for (11) to obtain the optimal u as an implicit function of x and Vx. (ii) Differentiate the FOC to obtain an ODE for the optimal control u in x; this equation contains the terms V (^) x and Vxx, which we want to eliminate. (iii) Differentiate the DPE, using envelope theorem, to obtain an equation involving V (^) x and Vxx. (iv) Use this equation, together with the FOC, to eliminate Vx and V (^) xx from the ODE for the control u.
Details are:
(12) L (^) u (x,u) + V (^) x(x)g (^) u(x,u) = 0
Differentiate (12)
(13) Lu u = 0 du dx
Lux Vx x gu(x , u) Vx gu x Vx gu u du dx
Differentiate (11) using envelope theorem
(14) rVx Lx (x , u) Vxx g (x , u) Vx gx (x , u)
Equations (12) and (14) can be used to eliminate Vx and V (^) xx from (13), resulting in a first order ODE for u in x.
In order to complete the solution, we need a boundary condition. This is given by the (optimal) steady state. (Remember we were able to obtain the same ODE for u in x using a slightly different approach, with the Maximum Principle.) At the steady state g() = 0 and (14) imply [r - gx ]Vx = Lx and (12) implies Vx = -L (^) u/g (^) u. Using these two equations gives Lx = -(r - g (^) x)Lu/g (^) u. This equation and g = 0 comprise two equations in two unknowns, the steady state values of x and u.
The problem:
max x t ⌡
T
0
y ′ Qy x ′ Rx dt y ′ (T )Q(T )y(T ) 2
(Q,R,A, and B may depend on time)
(1) s.t. y˙ = Ay + Bx y(0) = y 0
Note that this problem has no inequality constraints. The introduction of such constraints means that the problem is no longer linear-quadratic. We will solve this problem using Maximum Principal and then DP.
Solution using Maximum Principle: derivation of the Ricatti matrix equation.
y ′ Qx x ′ Rx λ ′ Ay Bx
∂x
Rx B ′ λ 0 ⇒ x R 1 B ′ λ
∂x
Qy A ′ λ λ˙
(4a) Try λ = Sy ⇒ (4b) λ˙ = S˙y + Sy˙