Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Dynamic Programming - Methods of Dynamic Analysis and Control - Lecture Notes, Study notes of Dynamics

Cochin University of Science and Technology Dynamics

Methods of Dynamic Analysis and Control course major topics are dynamic programming, linear-quadratic model, continuous time dynamics, The Maximum Principle, Event uncertainty, Time consistency, Discounting and others. This lecture major points are: Dynamic Programming, Linear Quadratic, Dynamic Programming Equation, Familiar Static Problem, Control Rule, Arbitrary Time, Environmental State, Linear-Quadratic Control, International Economic, Infinite Horizon Autonomous Problem

Typology: Study notes

2012/2013

Uploaded on 08/30/2013

faroq 🇮🇳

4.1

(14)

101 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

V. Dynamic Programming

1) The basic idea of dynamic programming (discrete time).

2) The linear quadratic (LQ) discrete time control problem with additive errors.

3) Two problems related to the LQ problem.

4) Derivation of continuous time dynamic programming equation (DPE).

5) Relation between DP and maximum principle.

6) DPE for autonomous problem.

7) Writing optimal control rule as an ODE in the state.

8) LQ continuous time problem.

1. The basic idea of using dynamic programming is to convert a dynamic problem into a

succession of static problems. We can illustrate this using the following problem:

Choose a sequence of ut, i.e. {u}T

t=1

, to max -Σt

i=1(x2

i+u

2

i

) subject to xi=x

i-1 +u

i

,x

0given.

x is the state variable and u is the control. We solve this problem by "working backwards" from

the final time, T. Ati=Ttheproblem is simply:

choose uTto max-[(xT-1 +u

T

)

2+u

2

T],

obtained by substituting in the constraint. We can solve this problem (easily!) for arbitrary

values of xT-1 and obtain the optimal control u*(xT-1, T). This function (the solution to the

optimization problem) is called the control rule. It is in "feedback form", i.e., it is expressed as

a function of the state. Substituting this control rule into the maximand, we obtain the value

function J(xT-1,T). Both the control rule and value function have two arguments, the state

variable and calendar time.

Now we step back a period, to time T-1. At that time we know the previous value of the state,

xT-2, and we know how we will behave in the future, conditional on the state. We can therefore

write the problem at T-1 as

choose uT-1 to max -[(xT-2 +u

T-1)2+u

2

T-1] + J(xT-2 +u

T-1, T).

Note that this is a familiar static problem. At time T-1 I have a 2 period (i.e. dynamic) problem,

but I have converted it to a static problem. Notice also that I substituted the constraint into the

period T-1 payoff and the value function at T.

I can solve the problem at T-1 and obtain the control rule, u*(xT-2,T-1). Again, note that this

control rule is conditional upon xT-2. I need to know the value of xT-2 to find the value of uT-1,

docsity.com docsity.com

Discover Study notes of Dynamics Cochin University of Science and Technology

Partial preview of the text

Download Dynamic Programming - Methods of Dynamic Analysis and Control - Lecture Notes and more Study notes Dynamics in PDF only on Docsity!

V. Dynamic Programming

The basic idea of dynamic programming (discrete time).
The linear quadratic (LQ) discrete time control problem with additive errors.
Two problems related to the LQ problem.
Derivation of continuous time dynamic programming equation (DPE).
Relation between DP and maximum principle.
DPE for autonomous problem.
Writing optimal control rule as an ODE in the state.
LQ continuous time problem.

The basic idea of using dynamic programming is to convert a dynamic problem into a succession of static problems. We can illustrate this using the following problem:

Choose a sequence of ut , i.e. {u}Tt =1 , to max -Σti=1 (x (^2) i + u (^2) i ) subject to xi = xi-1 + u (^) i, x0 given.

x is the state variable and u is the control. We solve this problem by "working backwards" from the final time, T. At i = T the problem is simply:

choose uT to max-[(x (^) T-1 + u (^) T) 2 + u (^2) T ],

obtained by substituting in the constraint. We can solve this problem (easily!) for arbitrary values of x (^) T-1 and obtain the optimal control u*(x (^) T-1, T). This function (the solution to the optimization problem) is called the control rule. It is in "feedback form", i.e., it is expressed as a function of the state. Substituting this control rule into the maximand, we obtain the value function J(xT-1,T). Both the control rule and value function have two arguments, the state variable and calendar time.

Now we step back a period, to time T-1. At that time we know the previous value of the state, xT-2, and we know how we will behave in the future, conditional on the state. We can therefore write the problem at T-1 as

choose uT-1 to max -[(xT-2 + u (^) T-1)^2 + u (^2) T-1] + J(x (^) T-2 + uT-1 , T).

Note that this is a familiar static problem. At time T-1 I have a 2 period (i.e. dynamic) problem, but I have converted it to a static problem. Notice also that I substituted the constraint into the period T-1 payoff and the value function at T.

I can solve the problem at T-1 and obtain the control rule, u*(x (^) T-2,T-1). Again, note that this control rule is conditional upon xT-2. I need to know the value of x (^) T-2 to find the value of uT-1,

but I do not need the value of x (^) T-2 to find the control rule u*(xT-2 ,T-1). Note the distinction between a function (the control rule) and the value of the function. Substitute the control rule into the maximand to obtain the value function at T-1:

J(xT-2 ,T-1) = max -[(xT-2 + u (^) T-1) 2 + u (^2) T-1 ] + J(x (^) T-2 + u (^) T-1, T) = -[(x (^) T-2 + u* (^) T-1) 2 + u* (^2) T-1] + J(xT-2 + u*T-1, T).

I got rid of the max operator by substituting in the optimal control. The reason that I can actually solve this problem, of course, is that I know the function J(xT-1 ,T).

I can keep "stepping back" in time in this manner. At arbitrary time t, I can write the problem as

J(xt-1 ,t) = max { -[x^2 t + u (^2) t ] + J(x (^) t, t+1)}, subject to xt = x (^) t-1 + u (^) t, xt-1 given.

The last equation is the Dynamic Programming Equation for the original optimal control problem. You should be able to actually solve this problem. Solving the problem means finding the control rules and value functions. You will find that the control rule is linear in the state and the value function is linear-quadratic in the state. The coefficients in these functions depend on the number of "periods to go", T-t.

You should also be able to write down the dynamic programming equation for any discrete time control problem. (Exam question.)

Exercises:

Solve the control problem described above for T = 3
An environmental state can take two values. The flow of damages in a good state is D (^) g = 0 and the flow of damages in a bad state is D (^) b > 0. The amount of abatement period t is xt and the cost of abatement is c(xt), where c is increasing and convex. The probability that the state changes from good to bad is p(xt ). If the state is bad, it never changes back to good. The discount factor is β = 1/(1+r), and the objective is to minimize the expected present discounted value of total costs (abatement costs and environmental damages.) Write down the dynamic programming equation and the first order condition for optimality of x in a good state. How does the optimal level of emissions (in a good state) depend on β?
Suppose that emissions costs are c(x, θ) = (a + θ)x + bx^2 /2, where θ is iid with mean 0 and standard deviation σ. The regulator knows σ but does not observe the value of θ in any period. The regulator can choose (i) a quota, i.e., the level of xt directly, or (ii) a subsidy τt. If he chooses a subsidy, firms choose abatement to minimize c(x,θ) - τx. (In each period, the government announces τ and firms choose abatement after observing the realization of θ.) The subsidy is a pure transfer. Which policy gives the government a higher expected payoff? Make any assumptions you want.

J y (^) T 1 , T max xT

AT y (^) T 1 CT x (^) T ′ HT AT y (^) T 1 CT x (^) T

E

uT

u′T HT uT

where HT ≡ KT

terms that involve xT are:

x′T CT HT CT x (^) T 2 x′T C′T HT AT y (^) T 1

max w.r.t. xT. The F.O.C. is

2 C′T HT AT y (^) T 1 CT x (^) T 0 ⇒

x (^) T C′T HT CT^1 CT HT AT y (^) T 1

≡ GT

x (^) T GT y (^) T 1

This is the control rule. Sub (4) into (3′) to get

(5) J y (^) T 1 , T AT CT GT y (^) T 1 ′ HT AT CT GT y (^) T 1 constant

constant = E u′T HT uT

E tr HT u (^) T u′T tr HT VT

conclude: J y (^) T 1 , T quadratic in y (^) T-

x (^) T linear in yT-

"Guess" that J y (^) t , t 1 is of form

(6) J y (^) t , t 1 At 1 Ct 1 Gt 1 y (^) t ′ Ht 1 At 1 Ct 1 Gt 1 qt 1

and xt+1 = Gt+1 y (^) t

If the guess is correct at time t+1 we can show by direct substitution that it is also correct at time t. (This means that we use the "guess" to solve the problem at time t, and we show that the solution is of the conjectured form.) We know that this guess is correct at time T. By induction, then, the guess is correct at all t.

Use DPE and state equation

J y (^) t 1 , t max xt

E

u (^) t

Aty (^) t 1 Ctx (^) t ut ′ Kt Aty (^) t 1 Ctx (^) t ut

At 1 Ct 1 Gt 1 Aty (^) t 1 Ctx (^) t ut ′ Ht 1 At 1 Ct 1 Gt 1 Aty (^) t 1 Ctx (^) t ut

take expectations

J y (^) t 1 , t max xt

Aty (^) t 1 Ctx (^) t ′ Kt At 1 Ct 1 Gt 1 ′ Ht 1 At 1 Ct 1 Gt 1.

Aty (^) t 1 Ctx (^) t constant.

(the "constant" does not involve x or y but changes over time)

define:

Ht Kt At 1 Ct 1 Gt 1 ′ Ht 1 At 1 Ct 1 Gt 1

so (7) can be written

(9) J y (^) t 1 , t max + constant xt

At y (^) t 1 Ctx (^) t ′ Ht At y (^) t 1 Ctx (^) t

π yτ^ T τ t

T

τ t

β τ^ t^ y ′τ Q yτ

This is the present value of the stream of future profits. Suppose decision maker has a CARA utility function, with risk parameter k > 0.

Can model risk aversion using LEG. Linear feedback law depends on variance of r.v. (Certainty Equivalence does not hold.) Can also solve problem in which control appears linearly. Example: dynamic hedging with uncertain production; see Karp, Larry S. "Dynamic Hedging with Uncertain Production." International Economic Review, Vol. 29, No. 4 (1988) pp. 621-637.

(ii) Uncertain parameters. e.g. you know the mean and variance of matrices A and B (multiplicative rather than additive noise). You can solve this problem to get linear feedback rule. Certainty equivalence does not hold.

Derivation of Dynamic Programming Equation (DPE), continuous time

Statement of the problem:

(1) J(x , t) max {u} ⌡

T

t

f (x , u , τ) dτ K(x , T)

s.t. x˙ = g(x,u,τ) xt given

J ( ) is Value Function - the maximized value of objective.

J(x , t) max {u} ⌡

⌠t dt t

f ( )dτ ⌡

⌠T

t dt

f ( ) dτ K (x , T )

max {u} ⌡

⌠t dt t

f ( )dτ J (x dx , t dt)

(by Principle of Optimality)

expand around dt = 0

J( ) max u

f ( )dt J(x , t)

∂J

∂x

dx

∂J

∂t

dt h.o.t.

divide by dt, recognize that J (^) t is independent of control u

(2) J (^) t max u

f

∂J

∂x

g( )

Equation (2) is the DPE.

Relation between D.P. and Maximum Principle

(3) H max u

f λg( )

If partial derivative exist,

(4) – J (^) t = H(t)

(5) J (^) x = λ

Use (4) and (5) to develop heuristic argument for transversality conditions of Maximum Principle. (The point of this exercise is to help you develop intuition for the transversality conditions - to see why they make sense - and to help you remember them.)

Using (1), we have as a definition

(6) J (x, T) ≡ K (x, T)

Question: When is J differentiable in x and t at T? Suppose it is differentiable in x and T. Then (6) implies

(7) J (^) x(x , T ) ∂K(x , t) ∂x

and (5) and (7) imply

Infinite horizon autonomous problem

f (x , u , t) e rt^ L(x , u)

˙x g(x , u)

T ∞

"guess" J (x,t) = e–rtV(x) ⇒

J (^) t = – re –rtV(x)

J (^) x = e –rtVx(x)

Note: V(x) does not depend on t

Sub above three equalities into (2), multiply by ert, ⇒

(11) rV(x) max u

L(x , u) Vx(x) g(x , u)

Here V (^) x equals the current value costate variable. V(x) is the current value of the program, given x.

Writing the optimal control as an ODE in x

Remember that in section 3 we showed how to use the FOC of from the Maximum Principal to write the optimal control as an ODE in the state variable. We are going to achieve the same result using DP.

Steps: (i) Write the FOC for (11) to obtain the optimal u as an implicit function of x and Vx. (ii) Differentiate the FOC to obtain an ODE for the optimal control u in x; this equation contains the terms V (^) x and Vxx, which we want to eliminate. (iii) Differentiate the DPE, using envelope theorem, to obtain an equation involving V (^) x and Vxx. (iv) Use this equation, together with the FOC, to eliminate Vx and V (^) xx from the ODE for the control u.

Details are:

(12) L (^) u (x,u) + V (^) x(x)g (^) u(x,u) = 0

Differentiate (12)

(13) Lu u = 0 du dx

Lux Vx x gu(x , u) Vx gu x Vx gu u du dx

Differentiate (11) using envelope theorem

(14) rVx Lx (x , u) Vxx g (x , u) Vx gx (x , u)

Equations (12) and (14) can be used to eliminate Vx and V (^) xx from (13), resulting in a first order ODE for u in x.

In order to complete the solution, we need a boundary condition. This is given by the (optimal) steady state. (Remember we were able to obtain the same ODE for u in x using a slightly different approach, with the Maximum Principle.) At the steady state g() = 0 and (14) imply [r - gx ]Vx = Lx and (12) implies Vx = -L (^) u/g (^) u. Using these two equations gives Lx = -(r - g (^) x)Lu/g (^) u. This equation and g = 0 comprise two equations in two unknowns, the steady state values of x and u.

LQ continuous time problem

The problem:

max x t ⌡

T

0

y ′ Qy x ′ Rx dt y ′ (T )Q(T )y(T ) 2

(Q,R,A, and B may depend on time)

(1) s.t. y˙ = Ay + Bx y(0) = y 0

Note that this problem has no inequality constraints. The introduction of such constraints means that the problem is no longer linear-quadratic. We will solve this problem using Maximum Principal and then DP.

Solution using Maximum Principle: derivation of the Ricatti matrix equation.

H

y ′ Qx x ′ Rx λ ′ Ay Bx

(2) ∂H

∂x

Rx B ′ λ 0 ⇒ x R 1 B ′ λ

∂H

∂x

Qy A ′ λ λ˙

(4a) Try λ = Sy ⇒ (4b) λ˙ = S˙y + Sy˙

5:

Dynamic Programming - Methods of Dynamic Analysis and Control - Lecture Notes, Study notes of Dynamics

Related documents

Partial preview of the text

Download Dynamic Programming - Methods of Dynamic Analysis and Control - Lecture Notes and more Study notes Dynamics in PDF only on Docsity!

E

≡ GT

E

⌠T

∂J

∂J

∂J

H

(2) ∂H

∂H