










































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A study guide for Probability Theory, covering topics such as preliminaries, central limit theorems, conditioning, martingales, and Markov chains. It includes key terms and references to Durrett's Probability: Theory and Examples 5th Ed. and Probability with Martingales by David Williams. The guide also covers probability spaces, distributions, and random variables. likely useful as study notes or a summary for university students studying Probability Theory.
Typology: Study notes
1 / 50
This page cannot be seen from the preview
Don't miss anything!











































References: Durrett, Probability: Theory and Examples 5th Ed., Chapters 1–
Additional References:
28 What is the Weak Law of Large Numbers? Sketch a proof. What if we don’t have finite variance?....................................... 36 2.3 Borel-Cantelli Lemmas................................... 37 29 What are the Borel-Cantelli Lemmas? How do they relate?.............. 37 30 What is the second Borel-Cantelli Lemma? What happens if we remove the indepen- dence condition?...................................... 37 31 Assume Xk → X in probability and g is a continuous function. Is it true that g(Xk) → g(X)?............................................ 37 32 Use the Borel-Cantelli Lemmas to construct a sequence of random variables that con- verges in probability but not almost surely........................ 37 2.4 Strong Law of Large Numbers.............................. 38 33 State the Strong Law of Large Numbers. Sketch a proof (you may assume EX i^4 < ∞). Can we weaken the assumptions? What happens if E|Xi| = ∞?............ 38 34 Let X 1 , X 2 ,... be i.i.d and non-negative with EXi = ∞. What can we say about Sn/n? 38 2.5 Convergence of Random Series.............................. 39 35 Consider a sequence of i.i.d variables X 1 , X 2 ,.. .. How can we express Xn → 0 a.s. in terms of a convergence of something in probability?................... 39 36 What is Kolmogorov’s 0-1 Law. What is the definition of a tail σ-algebra? What about tail random variables?.................................... 39 37 State Kolmogorov’s Maximal Inequality. How does it compare to Chebyshev’s Inequality? 39
Chapter 3 - Central Limit Theorems 39 3.1 The De Moivre-Laplace Theorem............................ 39 38 Give a concrete example of the Central Limit Theorem. How could you prove this directly?........................................... 39 3.2 Weak Convergence...................................... 40 39 What is weak convergence? How does it relate to convergence in probability and a.s. convergence?........................................ 40 40 What is an example of a r.v. that converges weakly but not in probability?...... 40 41 Why do we only get convergence at continuity points for weak convergence?..... 40 3.3 Characteristic Functions.................................. 40 42 What is the significance of the inversion formula for characteristic functions?.... 40 43 Give an example where the characteristic functions ϕn of Xn converges to ϕ(t) discon- tinuous at t = 0. What is the limit of the distribution function of the Xn?...... 41 44 Suppose you have X 1 , X 2 ,... and corresponding characteristic functions ϕ 1 , ϕ 2 ,... con- verging point-wise to ϕ(t). What can you say? What is tightness? How do continuity and tightness relate?.................................... 41 3.4 Central Limit Theorems.................................. 41 45 State the i.i.d. central limit theorem. Prove it (possibly with added assumptions).. 41 46 The type of convergence in the Central Limit Theorem is the convergence in distribu- tion. Why isn’t the convergence almost sure?...................... 41 47 Suppose X 1 , X 2 ,... are bounded but
n var(Xn) =^ ∞, what can you say about the limiting behavior of Sn?................................... 41 48 What is the Lindeberg-Feller Central Limit Theorem? How does it relate to the i.i.d. Central Limit Theorem?.................................. 42 3.6 Poisson Convergence.................................... 42 49 What is Poisson Convergence and why is it called the “law of rare events”?...... 42
66 Apply optional stopping to get a formula for the probability that the simple symmetric
Memorization (– key terms –)
Chapter 1 - Preliminaries
1 probability space, measure space
probability space: (Ω, F, P ) - Ω outcomes, F events, and P : F → [0, 1] assigns probabilities to events
measure space: (Ω, F) - Ω outcomes, F events
2 σ-field/algebra, σ-field generated by A
σ-field: F a non-empty collection of subsets of Ω satisfying:
i A ∈ F =⇒ AC^ ∈ F
ii Ai ∈ F countable sequence, then ∪iAi ∈ F
σ-field generated by A: smallest σ-field containing the collection A, denoted σ(A)
3 measure, probability measure
measure: “non-negative countable additive set function”, i.e. μ : F → R such that
i μ(A) ≥ μ(∅) for all A ∈ F
ii Ai ∈ F countable sequence of disjoint sets, then μ(∪iAi) =
i μ(Ai)
probability measure: μ(Ω) = 1, usually denoted P
4 monotonicity, subadditivity
μ a measure on (Ω, F)
monotonicity: A ⊆ B =⇒ μ(A) ≤ μ(B)
subadditivity: A ⊂ ∪iAi =⇒ μ(A) ≤
i μ(Ai) 5 continuity from below/above
μ a measure on (Ω, F)
if Ai ↑ A (A 1 ⊂ A 2 ⊂ · · · and ∪iAi = A) then μ(Ai) ↑ μ(A)
if Ai ↓ A (A 1 ⊃ A 2 ⊃ · · · and ∩iAi = A) then μ(Ai) ↓ μ(A)
6 discrete probability spaces
Ω a countable set, F all subsets of Ω P (A) =
ω∈A
p(ω)
where p(ω) ≥ 0 and
ω∈Ω p(ω) = 1 [i.e. each^ ω^ gets assigned its own point probability and sets are simply sums of the point probabilities]
Discrete uniform probability - Ω finite and p(ω) = 1/|Ω| for all ω ∈ Ω.
7 Borel sets
the smallest σ-algebra containing the open sets in Rd^ (with the usual Euclidean topology)
8 Stieltjes measure function
A function F : R → R such that F is (i) nondecreasing and (ii) right continuous (limy↓x F (y) = F (x))
9 Lebesgue measures on R and Rd
R: The unique measure on (R, R) such that μ((a, b]) = b − a.
Rd: The unique measure on (R, R) such that μ(A) = area of A for all finite rectangles A.
10 semi-algebra, algebra (field), algebra generated by S
semi-algebra: S such that (i) closed under finite intersection, (ii) S ∈ S implies SC^ is a finite disjoint union of sets in S
algebra: A such that (i) closed under finite intersections, (ii) closed under complements (it follows closed under finite unions)
algebra generated by S: S, collection of finite disjoint unions of sets in S (is an algebra)
11 measure on an algebra
given algebra A a measure on A, μ is a set function μ : A → R such that
(i) μ(A) ≥ μ(∅) = 0 for all A ∈ A and
(ii) Ai ∈ A are disjoint and their union is in A, then μ(∪iAi) =
i μ(Ai). 12 σ-finite
a measure μ on an algebra A is σ-finite if there is a sequence of sets An ∈ A such that μ(An) < ∞ for all n and ∪nAn = Ω (could also assume that An ↑ Ω or the An are disjoint)
13 countably generated σ-field/algebra
F, a σ-field is countably generated if there is a countable collection C ⊂ F such that σ(C) = F
14 random variable (F-measurable)
a real valued function X : Ω → R such that for every Borel set B ⊂ R, X−^1 (B) ∈ F, the specific σ-field on Ω (if specification needed, X is F-measurable)
15 indicator function of a set
example of a random variable where A ∈ F
(^1) A(ω) =
1 ω ∈ A 0 ω /∈ A
16 distribution (function) of a random variable
When X is a random variable on a probability space (Ω, F, P ) then its distribution is a probability measure, μ, on R given by μ(A) = P (X ∈ A) = P (X−^1 (A))
the associated distribution function is given by F (x) = P (X ≤ x) = P (X−^1 ((−∞, x]))
17 equal in distribution
two random variables whose resulting distributions (measures) on R are the same, this occurs exactly
when they have the same distribution function also, denoted by X d = Y
28 extended real line
R∗^ = [−∞, ∞] with Borel sets generated by [−∞, a), (a, b), (b, ∞]
29 simple function
ϕ =
∑n i=0 ai^1 Ai where^ Ai^ are disjoint sets with^ μ(Ai)^ <^ ∞ 30 integration of simple functions ∫ ϕdμ =
∫ (^) ∑n
i=
ai (^1) Ai dμ =
∑^ n
i=
aiμ(Ai)
31 φ ≥ ψ almost everywhere
φ ≥ ψ almost everywhere ⇐⇒ μ({ω : φ(ω) < ψ(ω)}) = 0
32 integral of bounded functions ∫ f dμ = sup ϕ≤f
ϕdμ = inf ψ≥f
ψdμ
33 integral of non-negative functions ∫ f dμ = sup 0 ≤h≤f
hdμ : h bounded, μ({x : h(x) > 0 }) < ∞
34 integrable functions ∫ |f |dμ < ∞ (since |f | is non-negative function)
35 integral of (integrable) functions
f +^ = max(f, 0) and f −^ = max(−f, 0)
so that f = f +^ − f −^ and f +, f −^ are both non-negative functions ∫ f dμ =
f +dμ −
f −dμ
36 basic properties of integrals
If f, g are both integrable/non-negative/bounded/simple:
(i) If f ≥ 0 a.e. then
f dμ ≥ 0
(ii) For all a ∈ R,
af dμ = a
f dμ
(iii)
f + gdμ =
f dμ +
gdμ
(iv) If g ≤ f a.e.
gdμ ≤
f dμ
(v) If g = f a.e.
gdμ =
f dμ
(vi)
∣∫^ f dμ
∣ (^) ≤ ∫^ |f |dμ
37 Jensen’s Inequality (integral)
If ϕ is convex (technical: λϕ(x) + (1 − λ)ϕ(y) ≥ ϕ(λx + (1 − λ)y), λ ∈ (0, 1))
μ probability measure, and f and ϕ(f ) integrable
ϕ
f dμ
ϕ(f )dμ
38 ||f ||p
||f ||p =
|f |pdμ
) 1 /p
for 1 ≤ p < ∞
39 H¨older’s Inequality (integral)
If p, q ∈ (1, ∞) and (^1) p + (^1) q = 1 then (^) ∫
|f g|dμ ≤ ||f ||p||g||q
40 Cauchy-Schwarz Inequality ∫ |f g|dμ ≤ ||f || 2 ||g|| 2 =
f 2 dμ
g^2 dμ
41 Bounded Convergence Theorem
Finite measure set E (‘bounded’: μ(E) < ∞) and fn supported on E (vanishes on EC^ )
fn bounded (i.e. |fn| ≤ M for some M )
fn → f in measure (measure zero set in the limit |fn(ω) − f (ω)| > ε)
∫ f dμ = lim n→∞
fndμ
42 Fatou’s Lemma
fn ≥ 0 =⇒ lim inf n→∞
fndμ ≥
lim inf n→∞ fn
dμ
43 Monotone Convergence Theorem
fn ≥ 0 and fn ↑ f =⇒
fndμ ↑
f dμ
44 Dominated Convergence Theorem
If fn → f a.e., |fn| ≤ g for all n where g is integrable then ∫ fndμ →
f dμ
55 computing EX integrals (change of variable formula)
Measure space (S, S, P ) X a random element (variable?) on (S, S) with μ(A) = P (X ∈ A) = P (X−^1 (A))
If f measurable function (S, S) → (R, R) with f ≥ 0 or E|f (X)| < ∞ then
Ef (x) =
S
f (y)μ(dy) =
f dμ
If X has density function F (x) =
∫ (^) x −∞ g(x)dx^ then
Ef (X) =
−∞
f (x)g(x)dx
56 Bernoulli Distribution
this is a discrete distribution
p some parameter, P (X = 1) = p and P (X = 0) = 1 − p
57 Poisson Distribution
this is a discrete distribution
λ some parameter, P (X = k) = e−λλk/k! for k = 0, 1 , 2 ,...
58 Other formulas for Expected Values
If X ≥ 0 EX =
i=0 P^ (X^ ≥^ i) or^ EX^ =^
0 P^ (X^ ≥^ x)dx^ (useful when nice formula for^ P^ (X^ ≥^ x)) so for example E|X| =
0 P^ (|X|^ > x)dx^ (can be derived by Fubini’s Theorem)
59 product measure
Take (X, A, μ 1 ) and (Y, B, μ 2 ) with σ-finite measures, then μ = μ 1 × μ 2 is the unique measure on X × Y such that μ(A × B) = μ 1 (A)μ 2 (B)
60 Fubini’s Theorem
Fubini’s gives conditions for switching multiple integrals using product spaces
Fubini’s Theorem: Let μ 1 , μ 2 be σ-finite with μ = μ 1 × μ 2. If f ≥ 0 or
|f |dμ < ∞ then ∫
X
Y
f dμ 2 dμ 1 =
X×Y
f dμ =
Y
X
f dμ 1 dμ 2
Typical application to summation/sum+integral combinations
Chapter 2 - Law of Large Numbers
61 independence of σ-fields, random variables
independence for σ-fields: (finite version) F 1 ,... , Fn σ-fields (all contained in some larger σ-field with P probability measure) are independent
if for any choice of Ai ∈ Fi for all i = 1,... , n
P (∩iAi) =
i
P (Ai)
(infinite collections are independent if all finite sub-collections are independent)
Independence for random variables: X 1 ,... , Xn random variables from (Ω, F, P ) → (R, R) are independent if σ(Xi)’s are all independent which is equivalent to when
P (X 1 ∈ C 1 ,... , Xn ∈ Cn) = P (∩i{Xi ∈ Ci}) =
i
P (Xi ∈ Ci)
for any collection of Ci ∈ R
62 Independence of events and arbitrary collections of events
Most simply, P (A ∩ B) = P (A)P (B)
Independence of Events: Generally, A 1 ,... , An are independent events in (Ω, F, P ) if for any sub-collection of sets (i.e. I ⊆ { 1 , 2 ,... , n} P (∩i∈I Ai =
i∈I
P (Ai)
Independence of Collections of Sets: Given A 1 ,... , An collections of sets, these are independent if for any choice of Ai ∈ Ai of a subcollection (i.e. I ⊆ { 1 ,... , n}) we have Ais are independent. Can always assume Ω ∈ Ai and take the full collection every time.
63 Pairwise Independent, how it differs from Independent
Any pairs are independent (P (A ∩ B) = P (A)P (B))
Independent =⇒ Pairwise Independent but pairwise is strictly weaker
Example:
X 1 , X 2 , X 3 with P (Xi = 0) = P (Xi = 1) = 1/ 2 A 1 = {X 2 = X 3 }, A 2 = {X 1 = X 3 }, and A 3 = {X 1 = X 2 }
A 1 ∩ A 2 = A 1 ∩ A 2 ∩ A 3 so the probabilities are the same, but P (Ai) = 1/2 so adding A 3 changes the RHS of the independence equation.
64 π-system, λ-system, relationship to σ-fields
π-system - closed under intersection
λ-system - Ω ∈ L, countable unions of increasing sets contained, and set subtraction contained (A ⊆ B ⇒ B ∩ AC^ ∈ L)
π − λ systems ⇐⇒ σ-algebra, so π,λ kinda split up the σ algebra properties
65 Dynkin’s π-λ Theorem
π-λ Theorem: If P is a π-system and L is a λ-system with P ⊆ L then σ(P) ⊂ L.
π-system - closed under intersection
λ-system - Ω ∈ L, countable unions contained, and set subtraction contained (A ⊆ B ⇒ B ∩ AC^ ∈ L)
66 distribution of collections of independent variables
Y = (X 1 ,... , Xn) for independent random variables Xi each with distribution μi(Ai) = P (Xi ∈ Ai) (so∏ Y a random vector) then the distribution measure for Y is μ = μ 1 × · · · μn where μ(A 1 × An) =
i μi(Ai).
78 Weak Law of Large Numbers
Weak Law of Large Numbers: Let X 1 , X 2 ,... be i.i.d with finite variance (can weaken to E|Xi| <
∞). Let Sn = X 1 + X 2 + · · · + Xn and μ = EX 1. Then Sn/n → μ converges in probability.
79 Borel-Cantelli Lemma
Borel-Cantelli Lemma
∑^ ∞
n=
P (An) < ∞ ⇒ P (An i.o.) = 0 = P (lim sup n→∞
An) = P ( lim n→∞ ∪∞ m=nAm)
80 Second Borel-Cantelli Lemma
Second Borel-Cantelli Lemma: If An are independent and
P (An) = ∞ then
P ({x : x ∈ An infinitely often}) = P (lim sup n→∞
An) = P ( lim n→∞ ∪∞ m=nAm) = P (An infinitely often) = 1
81 Strong Law of Large Numbers
SLLN: Let X 1 , X 2 ,... be i.i.d. with EXi = μ and EX i^4 < ∞. (can weaken to E|Xi| < ∞) Then
Sn n
X 1 + · · · Xn n −^ a.s.−→ μ
Generalizations: Let X 1 , X 2 ,... be pairwise independent and identically distributed with EXi = μ and E|Xi| < ∞. Then Sn n
X 1 + · · · Xn n
a.s. −−→ μ
Let X 1 , X 2 ,... be i.i.d with EX i+ = ∞ and EX i− < ∞ (hence EXi = ∞). Then
Sn n
X 1 + · · · Xn n
−^ a.s.−→ EX i =^ ∞
82 tail σ-field
T only depends on tail behavior, i.e. changing finitely many values does not affect it
Formally, Fn = σ(Xn, Xn+1,.. .) and T = ∩nFn
Examples: {limn→∞ Sn exists} ∈ T , Bn ∈ R then {Xn ∈ Bn i.o.} ∈ T
83 Kolmogorov’s 0-1 Law
X 1 , X 2 ,... are independent, then A ∈ T implies P (A) ∈ { 0 , 1 } (almost always or almost never)
84 exchangeable σ-field and Hewitt-Savage 0-1 Law
exchangeable σ-field invariant sets under finite permutation of values/variables (contains tail σ-field)
Hewitt-Savage 0-1 Law: If X 1 , X 2 ,... i.i.d then A ∈ E implies P (A) ∈ { 0 , 1 }
85 Kolmogorov’s Maximal Inequality
X 1 , X 2 ,... , Xn independent with EXi = 0 and var(Xi) < ∞, Sn = X 1 + · · · Xn as usual
max 1 ≤k≤n |Sk| ≥ x
≤ x−^2 var(Sn)
86 General Idea of Kolmogorov’s Three Series Theorem
gives (3) equivalent conditions on Xi and their truncations to show that the series converges a.s.
Chapter 3 - Central Limit Theorems
87 Stirling’s Formula
n! ∼ nne−n
2 πn
88 weak convergence
Fn ⇒ F∞ when limn Fn(y) = F∞(y) for all points y where F∞ is continuous.
Equivalently, Xn ⇒ X∞ or μn ⇒ μ∞ where μi is the probability measure for Xi.
89 (equivalent) properties of weak convergence
Xn ⇒ X∞ if and only if Eg(Xn) → Eg(X∞) for all bounded continuous functions g.
Xn ⇒ X∞ if and only if there exists Yn with the same distribution such that Yn → Y∞ a.s.
Also some equivalent conditions in terms of open/closed/Borel sets
90 Helly’s Selection Theorem/vague convergence
Given a sequence Fn of dist. fun.
There is a subsequence Fn(k) that ‘weakly’ converges to a function G that is right continuous and nondecreasing (but may not go to 0,1 in the limits).
91 tight
the sequence Fn is tight if for every ε > 0 there is an Mε so that
lim sup n→∞
1 − Fn(Mε) + Fn(−Mε) ≤ ε ⇐⇒ 1 − ε ≤ lim inf n→∞ Fn(Mε) − Fn(−Mε)
92 tightness criteria
Fn has a subsequence converging weakly to G and G is a distribution function if and only if Fn are tight.