




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These are an evolving set of notes for Mathematics 195 at UC Berkeley. This course is for advanced undergraduate math majors and surveys without too many ...
Typology: Exams
1 / 139
This page cannot be seen from the preview
Don't miss anything!





























































































Lawrence C. Evans Department of Mathematics UC Berkeley
Chapter 1: Introduction
Chapter 2: A crash course in basic probability theory
Chapter 3: Brownian motion and “white noise”
Chapter 4: Stochastic integrals, Itˆo’s formula
Chapter 5: Stochastic differential equations
Chapter 6: Applications
Exercises
Appendices
References
These are an evolvingset of notes for Mathematics 195 at UC Berkeley. This course is for advanced undergraduate math majors and surveys without too many precise details random differential equations and some applications.
Stochastic differential equations is usually, and justly, regarded as a graduate level subject. A really careful treatment assumes the students’ familiarity with probability theory, measure theory, ordinary differential equations, and perhaps partial differential equations as well. This is all too much to expect of undergrads.
But white noise, Brownian motion and the random calculus are wonderful topics, too good for undergraduates to miss out on.
Therefore as an experiment I tried to design these lectures so that strong students could follow most of the theory, at the cost of some omission of detail and precision. I for instance downplayed most measure theoretic issues, but did emphasize the intuitive idea of σ–algebras as “containing information”. Similarly, I “prove” many formulas by confirming them in easy cases (for simple random variables or for step functions), and then just stating that by approximation these rules hold in general. I also did not reproduce in class some of the more complicated proofs provided in these notes, although I did try to explain the guiding ideas.
My thanks especially to Lisa Goldberg, who several years ago presented the class with several lectures on financial applications, and to Fraydoun Rezakhanlou, who has taught from these notes and added several improvements. I am also grateful to Jonathan Weare for several computer simulations illustratingthe text.
B. SOME HEURISTICS Let us first study (1) in the case m = n, x 0 = 0, b ≡ 0, and B ≡ I. The solution of (1) in this settingturns out to be the n-dimensional Wiener process, or Brownian motion, denoted W(·). Thus we may symbolically write
W˙ (·) = ξ(·),
thereby assertingthat “white noise” is the time derivative of the Wiener process. Now return to the general case of the equation (1), write (^) dtd instead of the dot: dX(t) dt =^ b(X(t)) +^ B(X(t))^
dW(t) dt ,
and finally multiply by “dt”:
(SDE)
{ (^) dX(t) = b(X(t))dt + B(X(t))dW(t)
X(0) = x 0. This expression, properly interpreted, is a stochastic differential equation. We say that X(·) solves (SDE) provided
(2) X(t) = x 0 +
∫ (^) t
0
b(X(s)) ds +
∫ (^) t
0
B(X(s)) dW for all times t > 0.
Now we must:
∫ (^) t 0 · · ·^ dW^ : See Chapter 4.
And once all this is accomplished, there will still remain these modeling problems:
C. IT ˆO’S FORMULA Assume n = 1 and X(·) solves the SDE
(3) dX = b(X)dt + dW.
Suppose next that u : R → R is a given smooth function. We ask: what stochastic differential equation does Y (t) := u(X(t)) (t ≥ 0)
solve? Offhand, we would guess from (3) that
dY = u′dX = u′bdt + u′dW,
accordingto the usual chain rule, where ′^ = (^) dxd. This is wrong, however! In fact, as we will see,
(4) dW ≈ (dt)^1 /^2
in some sense. Consequently if we compute dY and keep all terms of order dt or (dt) 12 , we obtain
dY = u′dX +^12 u′′(dX)^2 +...
= u′(bdt ︸ +︷︷ dW ︸ from (3)
u′′(bdt + dW )^2 +...
u′b +^12 u′′
dt + u′dW + {terms of order (dt)^3 /^2 and higher}.
Here we used the “fact” that (dW )^2 = dt, which follows from (4). Hence
dY =
u′b +^1 2
u′′
dt + u′dW,
with the extra term “ 12 u′′dt” not present in ordinary calculus.
A major goal of these notes is to provide a rigorous interpretation for calculations like these, involvingstochastic differentials.
Example 1. Accordingto Itˆo’s formula, the solution of the stochastic differential equation
{ (^) dY = Y dW,
Y (0) = 1
is Y (t) := eW^ (t)−^2 t^ ,
and not what might seem the obvious guess, namely Yˆ (t) := eW^ (t).
A. Basic definitions B. Expected value, variance C. Distribution functions D. Independence E. Borel–Cantelli Lemma F. Characteristic functions G. StrongLaw of Large Numbers, Central Limit Theorem H. Conditional expectation I. Martingales
This chapter is a very rapid introduction to the measure theoretic foundations of prob- ability theory. More details can be found in any good introductory text, for instance Bremaud [Br], Chung[C] or Lamperti [L1].
Let us begin with a puzzle: Bertrand’s paradox. Take a circle of radius 2 inches in the plane and choose a chord of this circle at random. What is the probability this chord intersects the concentric circle of radius 1 inch? Solution #1 Any such chord (provided it does not hit the center) is uniquely deter- mined by the location of its midpoint.
Thus probability of hittinginner circle = (^) area of larger circle area of inner circle =^14.
Solution #2 By symmetry under rotation we may assume the chord is vertical. The diameter of the large circle is 4 inches and the chord will hit the small circle if it falls within its 2-inch diameter.
Hence probability of hittinginner circle = 2 inches4 inches =^12. Solution #3 By symmetry we may assume one end of the chord is at the far left point of the larger circle. The angle θ the chord makes with the horizontal lies between ± π 2 and the chord hits the inner circle if θ lies between ± π 6.
θ
Therefore probability of hittinginner circle =
2 π 6 2 π 2
PROBABILITY SPACES. This example shows that we must carefully define what we mean by the term “random”. The correct way to do so is by introducingas follows the precise mathematical structure of a probability space.
We start with a set, denoted Ω, certain subsets of which we will in a moment interpret as being“events”.
DEFINTION. A σ-algebra is a collection U of subsets of Ω with these properties: (i) ∅, Ω ∈ U. (ii) If A ∈ U, then Ac^ ∈ U. (iii) If A 1 , A 2 , · · · ∈ U, then ⋃∞ k=
Ak,
k=
Ak ∈ U.
Here Ac^ := Ω − A is the complement of A.
for sets B ∈ B. Then (Rn, B, P ) is a probability space. We call P the Dirac mass concen- trated at the point z, and write P = δz.
A probability space is the proper settingfor mathematical probability theory. This means that we must first of all carefully identify an appropriate (Ω, U, P ) when we try to solve problems. The reader should convince himself or herself that the three “solutions” to Bertrand’s paradox discussed above represent three distinct interpretations of the phrase “at random”, that is, to three distinct models of (Ω, U, P ).
Here is another example.
Example 4 (Buffon’s needle problem). The plane is ruled by parallel lines 2 inches apart and a 1-inch longneedle is dropped at random on the plane. What is the probability that it hits one of the parallel lines? The first issue is to find some appropriate probability space (Ω, U, P ). For this, let { (^) h = distance from the center of needle to nearest line,
θ = angle (≤ π 2 ) that the needle makes with the horizontal.
These fully determine the position of the needle, up to translations and reflection. Let us next take (^)
Ω = [0, π 2 ) ︸ ︷︷ ︸ values of θ
values of h
U = Borel subsets of Ω,
P (B) = 2 ·area ofπ B for each B ∈ U.
We denote by A the event that the needle hits a horizontal line. We can now check that this happens provided (^) sinh θ ≤ 12. Consequently A = {(θ, h) ∈ Ω | h ≤ sin 2 θ}, and so P (A) = 2(area of π A)= (^2) π
∫ π 2 0
1 2 sin^ θ dθ^ =^
1 π.^
RANDOM VARIABLES. We can think of the probability space as beingan essential mathematical construct, which is nevertheless not “directly observable”. We are therefore interested in introducingmappings X from Ω to Rn, the values of which we can observe.
Remember from Example 2 above that
B denotes the collection of Borel subsets of Rn, which is the smallest σ-algebra of subsets of Rn^ containingall open sets.
We may henceforth informally just think of B as containingall the “nice, well-behaved” subsets of Rn.
DEFINTION. Let (Ω, U, P ) be a probability space. A mapping
X : Ω → Rn
is called an n-dimensional random variable if for each B ∈ B, we have
We equivalently say that X is U-measurable.
Notation, comments. We usually write “X” and not “X(ω)”. This follows the custom within probability theory of mostly not displayingthe dependence of random variables on the sample point ω ∈ Ω. We also denote P (X−^1 (B)) as “P (X ∈ B)”, the probability that X is in B. In these notes we will usually use capital letters to denote random variables. Boldface usually means a vector-valued mapping. We will also use without further comment various standard facts from measure theory, for instance that sums and products of random variables are random variables.
Example 1. Let A ∈ U. Then the indicator function of A,
χA (ω) :=
{ (^1) if ω ∈ A 0 if ω /∈ A,
is a random variable. Example 2. More generally, if A 1 , A 2 ,... , Am ∈ U, with Ω = ∪mi=1Ai, and a 1 , a 2 ,... , am are real numbers, then
X =
∑^ m i=
aiχAi
is a random variable, called a simple function.
ω
ω 2
Two sample paths of a stochastic process
B. EXPECTED VALUE, VARIANCE.
Integration with respect to a measure. ∑ If (Ω, U, P ) is a probability space and X = k i=1 aiχAi is a real-valued simple random variable, we define the^ integral^ of^ X^ by ∫
Ω
X dP :=
∑^ k i=
aiP (Ai).
If next X is a nonnegative random variable, we define ∫
Ω
X dP := sup Y ≤X,Y simple
Ω
Y dP.
Finally if X : Ω → R is a random variable, we write ∫
Ω
X dP :=
Ω
X+^ dP −
Ω
X−^ dP,
provided at least one of the integrals on the right is finite. Here X+^ = max(X, 0) and X−^ = max(−X, 0); so that X = X+^ − X−.
Next, suppose X : Ω → Rn^ is a vector-valued random variable, X = (X^1 , X^2 ,... , Xn). Then we write (^) ∫
Ω
X dP =
Ω
X^1 dP,
Ω
X^2 dP, · · · ,
Ω
Xn^ dP
We will assume without further comment the usual rules for these integrals.
DEFINITION. We call E(X) :=
Ω
X dP
the expected value (or mean value) of X.
DEFINITION. We call V (X) :=
Ω
|X − E(X)|^2 dP
the variance of X.
Observe that V (X) = E(|X − E(X)|^2 ) = E(|X|^2 ) − |E(X)|^2.
LEMMA (Chebyshev’s inequality). If X is a random variable and 1 ≤ p < ∞, then
P (|X| ≥ λ) ≤ (^) λ^1 p E(|X|p) for all λ > 0.
Proof. We have
E(|X|p) =
Ω
|X|p^ dP ≥
{|X|≥λ}
|X|p^ dP ≥ λpP (|X| ≥ λ).
Let (Ω, U, P ) be a probability space and suppose X : Ω → Rn^ is a random variable.
Notation. Let x = (x 1 ,... , xn) ∈ Rn, y = (y 1 ,... , yn) ∈ Rn. Then
x ≤ y
means xi ≤ yi for i = 1,... , n.
DEFINITIONS. (i) The distribution function of X is the function FX : Rn^ → [0, 1] defined by FX(x) := P (X ≤ x) for all x ∈ Rn (ii) If X 1 ,... , Xm : Ω → Rn^ are random variables, their joint distribution function is FX 1 ,...,Xm : (Rn)m^ → [0, 1],
FX 1 ,...,Xm (x 1 ,... , xm) := P (X 1 ≤ x 1 ,... , Xm ≤ xm) for all xi ∈ Rn, i = 1,... , m.
DEFINITION. Suppose X : Ω → Rn^ is a random variable and F = FX its distribution function. If there exists a nonnegative, integrable function f : Rn^ → R such that
F (x) = F (x 1 ,... , xn) =
∫ (^) x 1
−∞
∫ (^) xn
−∞
f (y 1 ,... , yn) dyn... dy 1 ,
then f is called the density function for X.
It follows then that
(1) P (X ∈ B) =
B
f (x) dx for all B ∈ B
This formula is important as the expression on the right hand side is an ordinary integral, and can often be explicitly calculated.
In particular,
Rn
xf (x) dx and V (X) =
Rn
|x − E(X)|^2 f (x) dx.
Remark. Hence we can compute E(X), V (X), etc. in terms of integrals over Rn. This is an important observation, since as mentioned before the probability space (Ω, U, P ) is “unobservable”: All that we “see” are the values X takes on in Rn. Indeed, all quantities of interest in probability theory can be computed in Rn^ in terms of the density f.
Proof. Suppose first g is a simple function on Rn:
g =
∑^ m i=
biχBi (Bi ∈ B).
Then
E(g(X)) =
∑^ m i=
bi
Ω
χBi (X) dP =
∑^ m i=
biP (X ∈ Bi).
But also
∫
Rn
g(x)f (x) dx =
∑^ m i=
bi
Bi
f (x) dx
∑^ m
i=
biP (X ∈ Bi) by (1).
Consequently the formula holds for all simple functions g and, by approximation, it holds therefore for general functions g.
Example. If X is N (m, σ^2 ), then
2 πσ^2
−∞
xe−^
(x− 2 σm 2 )^2 dx = m
and
V (X) = √^1 2 πσ^2
−∞
(x − m)^2 e−^
(x− 2 σm 2 ) 2 dx = σ^2.
Therefore m is indeed the mean, and σ^2 the variance.
ω
MOTIVATION. Let (Ω, U, P ) be a probability space, and let A, B ∈ U be two events, with P (B) > 0. We want to find a reasonable definition of
P (A | B), the probability of A, given B.
Think this way. Suppose some point ω ∈ Ω is selected “at random” and we are told ω ∈ B. What then is the probability that ω ∈ A also?
Since we know ω ∈ B, we can regard B as beinga new probability space. Therefore we can define Ω :=˜ B, U˜ := {C ∩ B | C ∈ U} and P˜ := (^) P P(B) ; so that P˜ ( Ω) = 1.˜ Then the
probability that ω lies in A is P˜ (A ∩ B) = P^ P(A (B∩B) ).
This observation motivates the following
DEFINITION. We write
P (A | B) := P^ (A^ ∩^ B) P (B)
if P (B) > 0.
Now what should it mean to say “A and B are independent”? This should mean P (A | B) = P (A), since presumably any information that the event B has occurred is irrelevant in determiningthe probability that A has occurred. Thus
P (A) = P (A | B) = P^ ( PA (^ ∩B^ )B)
and so P (A ∩ B) = P (A)P (B)
if P (B) > 0. We take this for the definition, even if P (B) = 0:
DEFINITION. Two events A and B are called independent if
P (A ∩ B) = P (A)P (B).
This concept and its ramifications are the hallmarks of probability theory.
To gain some insight, the reader may wish to check that if A and B are independent events, then so are Ac^ and B. Likewise, Ac^ and Bc^ are independent.
THEOREM. The random variables X 1 , · · · , Xm : Ω → Rn^ are independent if and only if
(2) FX 1 ,··· ,Xm (x 1 ,... , xm) = FX 1 (x 1 ) · · · FXm (xm) for all xi ∈ Rn, i = 1,... , m.
If the random variables have densities, (2) is equivalent to
(3) fX 1 ,··· ,Xm (x 1 ,... , xm) = fX 1 (x 1 ) · · · fXm (xm) for all xi ∈ Rn, i = 1,... , m,
where the functions f are the appropriate densities.
Proof. 1. Assume first that {Xk}mk=1 are independent. Then
FX 1 ···Xm (x 1 ,... , xm) = P (X 1 ≤ x 1 ,... , Xm ≤ xm) = P (X 1 ≤ x 1 ) · · · P (Xm ≤ xm) = FX 1 (x 1 ) · · · FXm (xm).
B 1 ×...×Bm
fX 1 ···Xm (x 1 ,... , xm) dx 1 · · · dxm
B 1
fX 1 (x 1 ) dx 1
Bm
fXm (xm) dxm
by (3) = P (X 1 ∈ B 1 ) · · · P (Xm ∈ Bm) = P (A 1 ) · · · P (Am).
Therefore U(X 1 ), · · · , U(Xm) are independent σ-algebras.
One of the most important properties of independent random variables is this:
THEOREM. If X 1 ,... , Xm are independent, real-valued random variables, with
E(|Xi|) < ∞ (i = 1,... , m),
then E(|X 1 · · · Xm|) < ∞ and
E(X 1 · · · Xm) = E(X 1 ) · · · E(Xm).
Proof. Suppose that each Xi is bounded and has a density. Then
E(X 1 · · · Xm) =
Rm
x 1 · · · xm fX 1 ···Xm (x 1 ,... , xm) dx 1... xm
=
R
x 1 fX 1 (x 1 ) dx 1
R
xm fXm (xm) dxm
by (3) = E(X 1 ) · · · E(Xm).
THEOREM. If X 1 ,... , Xm are independent, real-valued random variables, with
V (Xi) < ∞ (i = 1,... , m),
then V (X 1 + · · · + Xm) = V (X 1 ) + · · · + V (Xm).
Proof. Use induction, the case m = 2 holdingas follows. Let m 1 := EX 1 , m 2 := E(X 2 ). Then E(X 1 + X 2 ) = m 1 + m 2 and
V (X 1 + X 2 ) =
Ω
(X 1 + X 2 − (m 1 + m 2 ))^2 dP
=
Ω
(X 1 − m 1 )^2 dP +
Ω
(X 2 − m 2 )^2 dP
Ω
(X 1 − m 1 )(X 2 − m 2 ) dP = V (X 1 ) + V (X 2 ) + 2E ︸ (X (^1) ︷︷ − m (^1) ︸ =
where we used independence in the next last step.
We introduce next a simple and very useful way to check if some sequence A 1 ,... , An,... of events “occurs infinitely often”.
DEFINITION. Let A 1 ,... , An,... be events in a probability space. Then the event
⋂^ ∞ n=
m=n
Am = {ω ∈ Ω | ω belongs to infinitely many of the An},
is called “An infinitely often”, abbreviated “An i.o.”.
BOREL–CANTELLI LEMMA. If
n=1 P^ (An)^ <^ ∞, then^ P^ (An^ i.o.) = 0. Proof. By definition An i.o. = ⋂∞ n=1^ ⋃∞ m=n Am, and so for each n
P (An i.o.) ≤ P
m=n
Am
m=n
P (Am).
The limit of the left-hand side is zero as n → ∞ because
P (Am) < ∞.
APPLICATION. We illustrate a typical use of the Borel–Cantelli Lemma. A sequence of random variables {Xk}∞ k=1 defined on some probability space converges in probability to a random variable X, provided
klim→∞ P^ (|Xk^ −^ X|^ > 1) = 0 for each 1 > 0.