an introduction to stochastic, Exams of Differential Equations

These are an evolving set of notes for Mathematics 195 at UC Berkeley. This course is for advanced undergraduate math majors and surveys without too many ...

Typology: Exams

2022/2023

Uploaded on 05/11/2023

arij
arij 🇺🇸

4.8

(8)

230 documents

1 / 139

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
AN INTRODUCTION TO STOCHASTIC
DIFFERENTIAL EQUATIONS
VERSION 1.2
Lawrence C. Evans
Department of Mathematics
UC Berkeley
Chapter 1: Introduction
Chapter 2: A crash course in basic probability theory
Chapter 3: Brownian motion and “white noise”
Chapter 4: Stochastic integrals, Itˆo’s formula
Chapter 5: Stochastic differential equations
Chapter 6: Applications
Exercises
Appendices
References
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download an introduction to stochastic and more Exams Differential Equations in PDF only on Docsity!

AN INTRODUCTION TO STOCHASTIC

DIFFERENTIAL EQUATIONS

VERSION 1.

Lawrence C. Evans Department of Mathematics UC Berkeley

Chapter 1: Introduction

Chapter 2: A crash course in basic probability theory

Chapter 3: Brownian motion and “white noise”

Chapter 4: Stochastic integrals, Itˆo’s formula

Chapter 5: Stochastic differential equations

Chapter 6: Applications

Exercises

Appendices

References

PREFACE

These are an evolvingset of notes for Mathematics 195 at UC Berkeley. This course is for advanced undergraduate math majors and surveys without too many precise details random differential equations and some applications.

Stochastic differential equations is usually, and justly, regarded as a graduate level subject. A really careful treatment assumes the students’ familiarity with probability theory, measure theory, ordinary differential equations, and perhaps partial differential equations as well. This is all too much to expect of undergrads.

But white noise, Brownian motion and the random calculus are wonderful topics, too good for undergraduates to miss out on.

Therefore as an experiment I tried to design these lectures so that strong students could follow most of the theory, at the cost of some omission of detail and precision. I for instance downplayed most measure theoretic issues, but did emphasize the intuitive idea of σ–algebras as “containing information”. Similarly, I “prove” many formulas by confirming them in easy cases (for simple random variables or for step functions), and then just stating that by approximation these rules hold in general. I also did not reproduce in class some of the more complicated proofs provided in these notes, although I did try to explain the guiding ideas.

My thanks especially to Lisa Goldberg, who several years ago presented the class with several lectures on financial applications, and to Fraydoun Rezakhanlou, who has taught from these notes and added several improvements. I am also grateful to Jonathan Weare for several computer simulations illustratingthe text.

  • Define what it means for X(·) to solve (1).
  • Show (1) has a solution, discuss uniqueness, asymptotic behavior, dependence upon x 0 , b, B, etc.

B. SOME HEURISTICS Let us first study (1) in the case m = n, x 0 = 0, b ≡ 0, and B ≡ I. The solution of (1) in this settingturns out to be the n-dimensional Wiener process, or Brownian motion, denoted W(·). Thus we may symbolically write

W˙ (·) = ξ(·),

thereby assertingthat “white noise” is the time derivative of the Wiener process. Now return to the general case of the equation (1), write (^) dtd instead of the dot: dX(t) dt =^ b(X(t)) +^ B(X(t))^

dW(t) dt ,

and finally multiply by “dt”:

(SDE)

{ (^) dX(t) = b(X(t))dt + B(X(t))dW(t)

X(0) = x 0. This expression, properly interpreted, is a stochastic differential equation. We say that X(·) solves (SDE) provided

(2) X(t) = x 0 +

∫ (^) t

0

b(X(s)) ds +

∫ (^) t

0

B(X(s)) dW for all times t > 0.

Now we must:

  • Construct W(·): See Chapter 3.
  • Define the stochastic integral

∫ (^) t 0 · · ·^ dW^ : See Chapter 4.

  • Show (2) has a solution, etc.: See Chapter 5.

And once all this is accomplished, there will still remain these modeling problems:

  • Does (SDE) truly model the physical situation?
  • Is the term ξ(·) in (1) “really” white noise, or is it rather some ensemble of smooth, but highly oscillatory functions? See Chapter 6. As we will see later these questions are subtle, and different answers can yield completely different solutions of (SDE). Part of the trouble is the strange form of the chain rule in the stochastic calculus:

C. IT ˆO’S FORMULA Assume n = 1 and X(·) solves the SDE

(3) dX = b(X)dt + dW.

Suppose next that u : R → R is a given smooth function. We ask: what stochastic differential equation does Y (t) := u(X(t)) (t ≥ 0)

solve? Offhand, we would guess from (3) that

dY = u′dX = u′bdt + u′dW,

accordingto the usual chain rule, where ′^ = (^) dxd. This is wrong, however! In fact, as we will see,

(4) dW ≈ (dt)^1 /^2

in some sense. Consequently if we compute dY and keep all terms of order dt or (dt) 12 , we obtain

dY = u′dX +^12 u′′(dX)^2 +...

= u′(bdt ︸ +︷︷ dW ︸ from (3)

) +^1

u′′(bdt + dW )^2 +...

u′b +^12 u′′

dt + u′dW + {terms of order (dt)^3 /^2 and higher}.

Here we used the “fact” that (dW )^2 = dt, which follows from (4). Hence

dY =

u′b +^1 2

u′′

dt + u′dW,

with the extra term “ 12 u′′dt” not present in ordinary calculus.

A major goal of these notes is to provide a rigorous interpretation for calculations like these, involvingstochastic differentials.

Example 1. Accordingto Itˆo’s formula, the solution of the stochastic differential equation

{ (^) dY = Y dW,

Y (0) = 1

is Y (t) := eW^ (t)−^2 t^ ,

and not what might seem the obvious guess, namely Yˆ (t) := eW^ (t). 

CHAPTER 2: A CRASH COURSE IN BASIC PROBABILITY THEORY.

A. Basic definitions B. Expected value, variance C. Distribution functions D. Independence E. Borel–Cantelli Lemma F. Characteristic functions G. StrongLaw of Large Numbers, Central Limit Theorem H. Conditional expectation I. Martingales

This chapter is a very rapid introduction to the measure theoretic foundations of prob- ability theory. More details can be found in any good introductory text, for instance Bremaud [Br], Chung[C] or Lamperti [L1].

A. BASIC DEFINITIONS.

Let us begin with a puzzle: Bertrand’s paradox. Take a circle of radius 2 inches in the plane and choose a chord of this circle at random. What is the probability this chord intersects the concentric circle of radius 1 inch? Solution #1 Any such chord (provided it does not hit the center) is uniquely deter- mined by the location of its midpoint.

Thus probability of hittinginner circle = (^) area of larger circle area of inner circle =^14.

Solution #2 By symmetry under rotation we may assume the chord is vertical. The diameter of the large circle is 4 inches and the chord will hit the small circle if it falls within its 2-inch diameter.

Hence probability of hittinginner circle = 2 inches4 inches =^12. Solution #3 By symmetry we may assume one end of the chord is at the far left point of the larger circle. The angle θ the chord makes with the horizontal lies between ± π 2 and the chord hits the inner circle if θ lies between ± π 6.

θ

Therefore probability of hittinginner circle =

2 π 6 2 π 2

=^13.

PROBABILITY SPACES. This example shows that we must carefully define what we mean by the term “random”. The correct way to do so is by introducingas follows the precise mathematical structure of a probability space.

We start with a set, denoted Ω, certain subsets of which we will in a moment interpret as being“events”.

DEFINTION. A σ-algebra is a collection U of subsets of Ω with these properties: (i) ∅, Ω ∈ U. (ii) If A ∈ U, then Ac^ ∈ U. (iii) If A 1 , A 2 , · · · ∈ U, then ⋃∞ k=

Ak,

⋂^ ∞

k=

Ak ∈ U.

Here Ac^ := Ω − A is the complement of A.

for sets B ∈ B. Then (Rn, B, P ) is a probability space. We call P the Dirac mass concen- trated at the point z, and write P = δz. 

A probability space is the proper settingfor mathematical probability theory. This means that we must first of all carefully identify an appropriate (Ω, U, P ) when we try to solve problems. The reader should convince himself or herself that the three “solutions” to Bertrand’s paradox discussed above represent three distinct interpretations of the phrase “at random”, that is, to three distinct models of (Ω, U, P ).

Here is another example.

Example 4 (Buffon’s needle problem). The plane is ruled by parallel lines 2 inches apart and a 1-inch longneedle is dropped at random on the plane. What is the probability that it hits one of the parallel lines? The first issue is to find some appropriate probability space (Ω, U, P ). For this, let { (^) h = distance from the center of needle to nearest line,

θ = angle (≤ π 2 ) that the needle makes with the horizontal.

These fully determine the position of the needle, up to translations and reflection. Let us next take (^)  



Ω = [0, π 2 ) ︸ ︷︷ ︸ values of θ

× [0 ︸ ︷︷ ︸, 1],

values of h

U = Borel subsets of Ω,

P (B) = 2 ·area ofπ B for each B ∈ U.

We denote by A the event that the needle hits a horizontal line. We can now check that this happens provided (^) sinh θ ≤ 12. Consequently A = {(θ, h) ∈ Ω | h ≤ sin 2 θ}, and so P (A) = 2(area of π A)= (^2) π

∫ π 2 0

1 2 sin^ θ dθ^ =^

1 π.^ 

RANDOM VARIABLES. We can think of the probability space as beingan essential mathematical construct, which is nevertheless not “directly observable”. We are therefore interested in introducingmappings X from Ω to Rn, the values of which we can observe.

Remember from Example 2 above that

B denotes the collection of Borel subsets of Rn, which is the smallest σ-algebra of subsets of Rn^ containingall open sets.

We may henceforth informally just think of B as containingall the “nice, well-behaved” subsets of Rn.

DEFINTION. Let (Ω, U, P ) be a probability space. A mapping

X : Ω → Rn

is called an n-dimensional random variable if for each B ∈ B, we have

X−^1 (B) ∈ U.

We equivalently say that X is U-measurable.

Notation, comments. We usually write “X” and not “X(ω)”. This follows the custom within probability theory of mostly not displayingthe dependence of random variables on the sample point ω ∈ Ω. We also denote P (X−^1 (B)) as “P (X ∈ B)”, the probability that X is in B. In these notes we will usually use capital letters to denote random variables. Boldface usually means a vector-valued mapping. We will also use without further comment various standard facts from measure theory, for instance that sums and products of random variables are random variables. 

Example 1. Let A ∈ U. Then the indicator function of A,

χA (ω) :=

{ (^1) if ω ∈ A 0 if ω /∈ A,

is a random variable. Example 2. More generally, if A 1 , A 2 ,... , Am ∈ U, with Ω = ∪mi=1Ai, and a 1 , a 2 ,... , am are real numbers, then

X =

∑^ m i=

aiχAi

is a random variable, called a simple function. 

 ω 

 ω 2 



Two sample paths of a stochastic process

B. EXPECTED VALUE, VARIANCE.

Integration with respect to a measure. ∑ If (Ω, U, P ) is a probability space and X = k i=1 aiχAi is a real-valued simple random variable, we define the^ integral^ of^ X^ by ∫

Ω

X dP :=

∑^ k i=

aiP (Ai).

If next X is a nonnegative random variable, we define ∫

Ω

X dP := sup Y ≤X,Y simple

Ω

Y dP.

Finally if X : Ω → R is a random variable, we write ∫

Ω

X dP :=

Ω

X+^ dP −

Ω

X−^ dP,

provided at least one of the integrals on the right is finite. Here X+^ = max(X, 0) and X−^ = max(−X, 0); so that X = X+^ − X−.

Next, suppose X : Ω → Rn^ is a vector-valued random variable, X = (X^1 , X^2 ,... , Xn). Then we write (^) ∫

Ω

X dP =

Ω

X^1 dP,

Ω

X^2 dP, · · · ,

Ω

Xn^ dP

We will assume without further comment the usual rules for these integrals. 

DEFINITION. We call E(X) :=

Ω

X dP

the expected value (or mean value) of X.

DEFINITION. We call V (X) :=

Ω

|X − E(X)|^2 dP

the variance of X.

Observe that V (X) = E(|X − E(X)|^2 ) = E(|X|^2 ) − |E(X)|^2.

LEMMA (Chebyshev’s inequality). If X is a random variable and 1 ≤ p < ∞, then

P (|X| ≥ λ) ≤ (^) λ^1 p E(|X|p) for all λ > 0.

Proof. We have

E(|X|p) =

Ω

|X|p^ dP ≥

{|X|≥λ}

|X|p^ dP ≥ λpP (|X| ≥ λ).



C. DISTRIBUTION FUNCTIONS.

Let (Ω, U, P ) be a probability space and suppose X : Ω → Rn^ is a random variable.

Notation. Let x = (x 1 ,... , xn) ∈ Rn, y = (y 1 ,... , yn) ∈ Rn. Then

x ≤ y

means xi ≤ yi for i = 1,... , n. 

DEFINITIONS. (i) The distribution function of X is the function FX : Rn^ → [0, 1] defined by FX(x) := P (X ≤ x) for all x ∈ Rn (ii) If X 1 ,... , Xm : Ω → Rn^ are random variables, their joint distribution function is FX 1 ,...,Xm : (Rn)m^ → [0, 1],

FX 1 ,...,Xm (x 1 ,... , xm) := P (X 1 ≤ x 1 ,... , Xm ≤ xm) for all xi ∈ Rn, i = 1,... , m.

DEFINITION. Suppose X : Ω → Rn^ is a random variable and F = FX its distribution function. If there exists a nonnegative, integrable function f : Rn^ → R such that

F (x) = F (x 1 ,... , xn) =

∫ (^) x 1

−∞

∫ (^) xn

−∞

f (y 1 ,... , yn) dyn... dy 1 ,

then f is called the density function for X.

It follows then that

(1) P (X ∈ B) =

B

f (x) dx for all B ∈ B

This formula is important as the expression on the right hand side is an ordinary integral, and can often be explicitly calculated.

In particular,

E(X) =

Rn

xf (x) dx and V (X) =

Rn

|x − E(X)|^2 f (x) dx.

Remark. Hence we can compute E(X), V (X), etc. in terms of integrals over Rn. This is an important observation, since as mentioned before the probability space (Ω, U, P ) is “unobservable”: All that we “see” are the values X takes on in Rn. Indeed, all quantities of interest in probability theory can be computed in Rn^ in terms of the density f. 

Proof. Suppose first g is a simple function on Rn:

g =

∑^ m i=

biχBi (Bi ∈ B).

Then

E(g(X)) =

∑^ m i=

bi

Ω

χBi (X) dP =

∑^ m i=

biP (X ∈ Bi).

But also

Rn

g(x)f (x) dx =

∑^ m i=

bi

Bi

f (x) dx

∑^ m

i=

biP (X ∈ Bi) by (1).

Consequently the formula holds for all simple functions g and, by approximation, it holds therefore for general functions g. 

Example. If X is N (m, σ^2 ), then

E(X) = √^1

2 πσ^2

−∞

xe−^

(x− 2 σm 2 )^2 dx = m

and

V (X) = √^1 2 πσ^2

−∞

(x − m)^2 e−^

(x− 2 σm 2 ) 2 dx = σ^2.

Therefore m is indeed the mean, and σ^2 the variance. 



 ω

D. INDEPENDENCE.

MOTIVATION. Let (Ω, U, P ) be a probability space, and let A, B ∈ U be two events, with P (B) > 0. We want to find a reasonable definition of

P (A | B), the probability of A, given B.

Think this way. Suppose some point ω ∈ Ω is selected “at random” and we are told ω ∈ B. What then is the probability that ω ∈ A also?

Since we know ω ∈ B, we can regard B as beinga new probability space. Therefore we can define Ω :=˜ B, U˜ := {C ∩ B | C ∈ U} and P˜ := (^) P P(B) ; so that P˜ ( Ω) = 1.˜ Then the

probability that ω lies in A is P˜ (A ∩ B) = P^ P(A (B∩B) ).

This observation motivates the following

DEFINITION. We write

P (A | B) := P^ (A^ ∩^ B) P (B)

if P (B) > 0.

Now what should it mean to say “A and B are independent”? This should mean P (A | B) = P (A), since presumably any information that the event B has occurred is irrelevant in determiningthe probability that A has occurred. Thus

P (A) = P (A | B) = P^ ( PA (^ ∩B^ )B)

and so P (A ∩ B) = P (A)P (B)

if P (B) > 0. We take this for the definition, even if P (B) = 0:

DEFINITION. Two events A and B are called independent if

P (A ∩ B) = P (A)P (B).

This concept and its ramifications are the hallmarks of probability theory.

To gain some insight, the reader may wish to check that if A and B are independent events, then so are Ac^ and B. Likewise, Ac^ and Bc^ are independent.

THEOREM. The random variables X 1 , · · · , Xm : Ω → Rn^ are independent if and only if

(2) FX 1 ,··· ,Xm (x 1 ,... , xm) = FX 1 (x 1 ) · · · FXm (xm) for all xi ∈ Rn, i = 1,... , m.

If the random variables have densities, (2) is equivalent to

(3) fX 1 ,··· ,Xm (x 1 ,... , xm) = fX 1 (x 1 ) · · · fXm (xm) for all xi ∈ Rn, i = 1,... , m,

where the functions f are the appropriate densities.

Proof. 1. Assume first that {Xk}mk=1 are independent. Then

FX 1 ···Xm (x 1 ,... , xm) = P (X 1 ≤ x 1 ,... , Xm ≤ xm) = P (X 1 ≤ x 1 ) · · · P (Xm ≤ xm) = FX 1 (x 1 ) · · · FXm (xm).

  1. We prove the converse statement for the case that all the random variables have densities. Select Ai ∈ U(Xi), i = 1,... , m. Then Ai = X− i 1 (Bi) for some Bi ∈ B. Hence

P (A 1 ∩ · · · ∩ Am) = P (X 1 ∈ B 1 ,... , Xm ∈ Bm)

B 1 ×...×Bm

fX 1 ···Xm (x 1 ,... , xm) dx 1 · · · dxm

B 1

fX 1 (x 1 ) dx 1

Bm

fXm (xm) dxm

by (3) = P (X 1 ∈ B 1 ) · · · P (Xm ∈ Bm) = P (A 1 ) · · · P (Am).

Therefore U(X 1 ), · · · , U(Xm) are independent σ-algebras. 

One of the most important properties of independent random variables is this:

THEOREM. If X 1 ,... , Xm are independent, real-valued random variables, with

E(|Xi|) < ∞ (i = 1,... , m),

then E(|X 1 · · · Xm|) < ∞ and

E(X 1 · · · Xm) = E(X 1 ) · · · E(Xm).

Proof. Suppose that each Xi is bounded and has a density. Then

E(X 1 · · · Xm) =

Rm

x 1 · · · xm fX 1 ···Xm (x 1 ,... , xm) dx 1... xm

=

R

x 1 fX 1 (x 1 ) dx 1

R

xm fXm (xm) dxm

by (3) = E(X 1 ) · · · E(Xm). 

THEOREM. If X 1 ,... , Xm are independent, real-valued random variables, with

V (Xi) < ∞ (i = 1,... , m),

then V (X 1 + · · · + Xm) = V (X 1 ) + · · · + V (Xm).

Proof. Use induction, the case m = 2 holdingas follows. Let m 1 := EX 1 , m 2 := E(X 2 ). Then E(X 1 + X 2 ) = m 1 + m 2 and

V (X 1 + X 2 ) =

Ω

(X 1 + X 2 − (m 1 + m 2 ))^2 dP

=

Ω

(X 1 − m 1 )^2 dP +

Ω

(X 2 − m 2 )^2 dP

  • 2

Ω

(X 1 − m 1 )(X 2 − m 2 ) dP = V (X 1 ) + V (X 2 ) + 2E ︸ (X (^1) ︷︷ − m (^1) ︸ =

)E ︸ (X (^2) ︷︷ − m (^2) ︸

where we used independence in the next last step. 

E. BOREL–CANTELLI LEMMA.

We introduce next a simple and very useful way to check if some sequence A 1 ,... , An,... of events “occurs infinitely often”.

DEFINITION. Let A 1 ,... , An,... be events in a probability space. Then the event

⋂^ ∞ n=

⋃^ ∞

m=n

Am = {ω ∈ Ω | ω belongs to infinitely many of the An},

is called “An infinitely often”, abbreviated “An i.o.”. 

BOREL–CANTELLI LEMMA. If

n=1 P^ (An)^ <^ ∞, then^ P^ (An^ i.o.) = 0. Proof. By definition An i.o. = ⋂∞ n=1^ ⋃∞ m=n Am, and so for each n

P (An i.o.) ≤ P

m=n

Am

∑^ ∞

m=n

P (Am).

The limit of the left-hand side is zero as n → ∞ because

P (Am) < ∞. 

APPLICATION. We illustrate a typical use of the Borel–Cantelli Lemma. A sequence of random variables {Xk}∞ k=1 defined on some probability space converges in probability to a random variable X, provided

klim→∞ P^ (|Xk^ −^ X|^ > 1) = 0 for each 1 > 0.