Dynamic Programming: Techniques for Maximizing Utility with Finite & Infinite Horizons, Assignments of Introduction to Business Management

An introduction to dynamic programming, a technique used to solve optimization problems involving decisions over time. Both finite and infinite horizon problems, focusing on maximizing expected utility. It includes explanations of bellman equations and the principle of optimality, as well as examples of their applications. Students studying economics, operations research, or related fields will find this document useful for understanding dynamic programming concepts.

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-o2g
koofers-user-o2g 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
8DynamicProgramming
Wearegoingtodiscuss multiperiodmodelsthataremoregeneralthantheCAPM,the
APT,and thearbitrageresultsthatwehavealready studied.Inorder tostudythese
models,weneedtounderstand atechnique called dynamicprogramming.Dynamic
programmingisusedtosolveproblems thatinvolveoptimizationover time.Ithasbeen
usedextensivelybyeconomistsinall ¯elds.Ouroptimization probleminvolveschoos-
ingconsumptionthroughtimetomaximize expected utility,sowewill usedynamic
programmingtosolveit.
YoucanreadKreps'appendix(distributedinclass) to gainsomeintuitionfor
thedynamicprogrammingtechnique.Krepsgives severalreferencestomoreadvanced
textson dynamicmethods.Ifyouareinterestedinworkingwiththesetypesof models,
youwill probablywant toconsultamore completetext.Thereareseveralalternatives
todynamicprogramming forsolvingmultiperiod problems.Forexample,thereisa
techniqueknowasoptimalcontrolthatis sort of a continuous-time counterpart to
dynamicprogramming.
Dynamicprogrammingproblems canalways be categorizedaseither¯niteorin¯nite
horizon problems.Whilethetechniquesforsolvingthesetwotypesof problems are
somewhatdi®erent,theinferencesobtainedfromthetwotypesareusuallythesame.
Forsimplicity,wewill discuss the¯nitehorizoncase¯rst.Afterunderstanding¯nite
horizon problems,wewill examinetheimplicationsof allowingthehorizonto go to
in¯nity.
Dynamicprogrammingworks well forproblems inwhichagents maketheirdecisions
basedonjusta fewvariablesthatwewill call statevariables.Instatisticalterms, it
works well whenjusta fewstatevariablesaresu±cientstatisticsforpredictingthe
future.Itisfairlycommonin dynamicprogrammingmodelsto assumethatall state
variablesare currentvalues,suchascurrentwealthorcurrentprices.Whenonly
currentstatevariablesareassumedtomatter,wesaythat themodel isaMarkovian
60
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Dynamic Programming: Techniques for Maximizing Utility with Finite & Infinite Horizons and more Assignments Introduction to Business Management in PDF only on Docsity!

8 Dynamic Programming

We are going to discuss multiperiod models that are more general than the CAPM, the APT, and the arbitrage results that we have already studied. In order to study these models, we need to understand a technique called dynamic programming. Dynamic programming is used to solve problems that involve optimization over time. It has been used extensively by economists in all ¯elds. Our optimization problem involves choos- ing consumption through time to maximize expected utility, so we will use dynamic programming to solve it. You can read Kreps' appendix (distributed in class) to gain some intuition for the dynamic programming technique. Kreps gives several references to more advanced texts on dynamic methods. If you are interested in working with these types of models, you will probably want to consult a more complete text. There are several alternatives to dynamic programming for solving multiperiod problems. For example, there is a technique know as optimal control that is sort of a continuous-time counterpart to dynamic programming. Dynamic programming problems can always be categorized as either ¯nite or in¯nite horizon problems. While the techniques for solving these two types of problems are somewhat di®erent, the inferences obtained from the two types are usually the same. For simplicity, we will discuss the ¯nite horizon case ¯rst. After understanding ¯nite horizon problems, we will examine the implications of allowing the horizon to go to in¯nity. Dynamic programming works well for problems in which agents make their decisions based on just a few variables that we will call state variables. In statistical terms, it works well when just a few state variables are su±cient statistics for predicting the future. It is fairly common in dynamic programming models to assume that all state variables are current values, such as current wealth or current prices. When only current state variables are assumed to matter, we say that the model is a Markovian

model. A Markov process is a stochastic process that satis¯es,

Markov Process : f (xtjxt¡ 1 ; xt¡ 2 ; : : :) = f (xtjxt¡ 1 ): (137)

8.1 Dynamic Programming with a Finite Horizon

We need to set up some notation before proceding with our discussion of dynamic programming. Let I = f 1 ; 2 ; 3 ; :::g be the set of possible future states and let A be a ¯nite set of feasible actions that you can take. De¯ne R(i,a) as the expected current reward when the state is i 2 I and the action chosen is a 2 A. De¯ne the value function, Vn(i), as the maximum attainable sum of expected current and future rewards when n periods remain and the current state is i 2 I. In ¯nite horizon problems, we always start with optimization in the last period and then work backwards to get to the present decision. We begin by thinking about what the value function will be when we have just one period left,

V 1 (i) = max a 2 A R(i; a); : (138)

The optimal policy in the last period is to just maximize your reward given the state. Now let pij (a) equal the probability that state j occurs next period given that state i describes today and that you choose action a. We can express your value function with two periods left as a function of your ¯nal value function,

V 2 (i) = max a 2 A fR(i; a) + X j

pij V 1 (j)g (139)

If we de¯ne a 2 (i) as your optimal policy when the current state is i and you have two periods to go, then we want to ¯nd a function, a 2 (i) that solves (139). We can do this by ¯rst ¯nding the optimal policy with one period left, a 1 (i), by solving (138). Second, we plug our values for a 1 (i) into (139) and in a third step we solve (139). This is what

for all future points in time then the particular strategy is optimal. Kreps motivates this with a little math. He says we can easily convert the problem

maxx;y f(x; y) (143)

into the equivalent problem maxx [maxy f (x; y)]: (144)

This mathematical operation is essentially what we have done above in converting the value equation, (141), into the Bellman equation, (142). We use the Bellman equation to learn about the optimal policy function, an(i). Once again, the optimal policy function is a rule that describes the optimal choice of action when the state is i and there are n periods left. There are three principal ways to use Bellman equations:

  1. We can sometimes use Bellman equations to obtain explicit analytic solutions for Vn(i) and an(i).
  2. We often use Bellman equations to characterize the properties of Vn(i) and an(i).
  3. We can sometimes solve for Vn(i) and an(i) numerically. We can always do this in principle, but some problems become too large to be tractable.

The Bellman equation is a fundamental building block in a dynamic programming model. Deriving the appropriate Bellman equation is usually the ¯rst step in analyzing a dynamic, ¯nite horizon model. Once we have a Bellman equation, we typically look at the ¯rst order conditions that solve the equation's maximization problem. These ¯rst order conditions are often rich enough to provide us with the elements of an interesting model. We also use a condition called an envelope condition at times. Envelope conditions are derived by applying the envelope theorem. The envelope theorem can be understood as follows.

Suppose that we want to maximize f(x; a) over x. We can think of a as being a state variable and x as being a choice variable. For every value of a in this problem there will be a maximizing value of x. In what Varian calls \su±ciently regular" cases, we can think of de¯ning a function, x(a) that gives the optimal x value for each value of a. We can also think of the value function in these terms as V (a) = f (x(a); a). If we take the derivative of the value function with respect to the state variable, we obtain

@V (a) @a =^

@f(x(a); a) @x

@x(a) @a +^

@f (x(a); a) @a :^ (145)

But we know that x(a) is the value of x that maximizes f, so

@f (x(a); a) @x

@x(a) @a = 0;^ (146)

and @V (a) @a =^

@f (x(a); a) @a jx=x(a):^ (147) This is a very simple statement of the envelope theorem. In the dynamic programming context, if we take the derivative of the value function with respect to the state variables and if we hold the choice variables (the actions) at their optimal levels, then we can consider the derivatives of the value function with respect to the choice variables to be equal to zero.

8.2 Example: The Gambler's Problem

Let's work through a simple example to illustrate the method. Suppose that in each of T periods a gambler can bet up to his entire wealth. With probability p the gambler wins and the size of his reward is equal to the size of his bet. With probability (1 ¡ p) the gambler loses the amount of his bet. The gambler's objective is to maximize E[ln(¯nal wealth)]. Let x equal the gambler's current wealth and ® 2 [0; 1] equal the

  1. Induction step - under the supposition, prove the hypothesis for n.

If we follow these steps then our proof by induction will be complete. We showed above that V 1 (x) = ln(x) because it will never be optimal for the gambler to gamble in the last period. We will now hypothesize that Vn¡ 1 (x) = ln(x). We need to show that Vn(x) = ln(x) under this supposition. This must be true since we can express our value function as,

Vn(x) = max ® 2 [0;1]fp ln(x + ®x) + (1 ¡ p) ln(x ¡ ®x)g; (153)

= max ® 2 [0;1] ln(x) + fp ln(1 + ®) + (1 ¡ p) ln(1 ¡ ®)g; (154)

and because the maximum value of the term in brackets is zero. Thus, the value function with n periods to go is

Vn(x) = ln(x) (155)

The problem in every period is the same and our proof by induction is ¯nished. What about the case when p > 12? When p > 12 the optimal ® with one period remaining will still be given by 2p ¡ 1. If we substitute this value of ® into the value function with one period left, we obtain

V 1 (x) = p ln[x + (2p ¡ 1)x] + (1 ¡ p) ln[x ¡ (2p ¡ 1)x] (156)

= ln[x] + ln[2] + p ln[p] + (1 ¡ p) ln[1 ¡ p] = ln[x] + C: (157)

We will leave as a homework problem the task of deriving the value function in this problem and showing that the optimal policy function is always

an(x) = 2p ¡ 1 (158)

for each period, n. Note that this policy rule is a stationary rule - it does not change with time. This is a nice property for optimal policy functions to have.

8.3 Dynamic Programming with an In¯nite Horizon

The models that we will examine all have ¯nite horizons. It is useful, however, to know what happens to dynamic programming when the horizon is not assumed to be ¯nite. There are two big changes that occur when going from ¯nite problems to in¯nite problems. First, in ¯nite problems, we can always start at the last period and then work forward to derive our answer. In in¯nite horizon problems, there is no last period to begin at. Thus, we cannot usually just write down a Bellman equation and derive an optimal policy. Rather, we have to conjecture a form for the value function and an optimal policy rule and then we have to determine whether these conjectures are correct. We usually make conjectures that seem reasonable - people often \tweak" results that others have found in the past. We validate our conjectures by showing that it is not possible to improve upon our policy functions. Something very similar to the induction proof outlined above is implemented for this purpose. Second, in in¯nite horizon problems we usually need some sort of convergence result that is commonly referred to as a transversality condition. What do transversality conditions look like? They can look something like

tlim!1 Et¯t^ @V @tx(x )x^ = 0^ (159)

Intuitively, they can involve restrictions like the restriction that the discounted terminal value of a stock goes to zero as the time to liquidation goes to in¯nity. If you want to know more about in¯nite horizon methods you can consult one of the references in Kreps.