Probability Theory Study Guide, Study notes of Probability and Statistics

A study guide for Probability Theory, covering topics such as preliminaries, central limit theorems, conditioning, martingales, and Markov chains. It includes key terms and references to Durrett's Probability: Theory and Examples 5th Ed. and Probability with Martingales by David Williams. The guide also covers probability spaces, distributions, and random variables. likely useful as study notes or a summary for university students studying Probability Theory.

Typology: Study notes

2022/2023

Uploaded on 05/11/2023

laalamani
laalamani 🇺🇸

3.7

(3)

218 documents

1 / 50

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
UC Berkeley Qualifying Exam
Anya Michaelsen, October 2021
Probability Theory Study Guide
Major topic: Probability Theory (Probability)
References: Durrett, Probability: Theory and Examples 5th Ed., Chapters 1–5
Preliminaries: σ-algebras, Dynkin’s π-λtheorem, independence, Borel–Cantelli lemmas, Kol-
mogorov’s 0-1 law, Kolmogorov’s maximal inequality, strong and weak laws of large numbers
Central limit theorems: weak convergence, characteristic functions, tightness, I.I.D. central
limit theorem, Lindeberg–Feller central limit theorem
Conditioning: conditional probability and expectation, regular conditional probabilities
Martinagles: stopping times, upcrossing inequality, uniform integrability, A.S. convergence,
Doob’s decomposition, Doob’s inequality, Lpconvergence, L1convergence, reverse martingale
convergence, optional stopping theorem, Wald’s identity
Markov chains: countable state space, stationary measures, convergence theorems, recurrence
and transience, asymptotic behavior
Additional References:
Probability with Martingales by David Williams
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32

Partial preview of the text

Download Probability Theory Study Guide and more Study notes Probability and Statistics in PDF only on Docsity!

UC Berkeley Qualifying Exam

Anya Michaelsen, October 2021

Probability Theory Study Guide

Major topic: Probability Theory (Probability)

References: Durrett, Probability: Theory and Examples 5th Ed., Chapters 1–

  • Preliminaries: σ-algebras, Dynkin’s π-λ theorem, independence, Borel–Cantelli lemmas, Kol- mogorov’s 0-1 law, Kolmogorov’s maximal inequality, strong and weak laws of large numbers
  • Central limit theorems: weak convergence, characteristic functions, tightness, I.I.D. central limit theorem, Lindeberg–Feller central limit theorem
  • Conditioning: conditional probability and expectation, regular conditional probabilities
  • Martinagles: stopping times, upcrossing inequality, uniform integrability, A.S. convergence, Doob’s decomposition, Doob’s inequality, Lp^ convergence, L^1 convergence, reverse martingale convergence, optional stopping theorem, Wald’s identity
  • Markov chains: countable state space, stationary measures, convergence theorems, recurrence and transience, asymptotic behavior

Additional References:

  • Probability with Martingales by David Williams

Contents

28 What is the Weak Law of Large Numbers? Sketch a proof. What if we don’t have finite variance?....................................... 36 2.3 Borel-Cantelli Lemmas................................... 37 29 What are the Borel-Cantelli Lemmas? How do they relate?.............. 37 30 What is the second Borel-Cantelli Lemma? What happens if we remove the indepen- dence condition?...................................... 37 31 Assume Xk → X in probability and g is a continuous function. Is it true that g(Xk) → g(X)?............................................ 37 32 Use the Borel-Cantelli Lemmas to construct a sequence of random variables that con- verges in probability but not almost surely........................ 37 2.4 Strong Law of Large Numbers.............................. 38 33 State the Strong Law of Large Numbers. Sketch a proof (you may assume EX i^4 < ∞). Can we weaken the assumptions? What happens if E|Xi| = ∞?............ 38 34 Let X 1 , X 2 ,... be i.i.d and non-negative with EXi = ∞. What can we say about Sn/n? 38 2.5 Convergence of Random Series.............................. 39 35 Consider a sequence of i.i.d variables X 1 , X 2 ,.. .. How can we express Xn → 0 a.s. in terms of a convergence of something in probability?................... 39 36 What is Kolmogorov’s 0-1 Law. What is the definition of a tail σ-algebra? What about tail random variables?.................................... 39 37 State Kolmogorov’s Maximal Inequality. How does it compare to Chebyshev’s Inequality? 39

Chapter 3 - Central Limit Theorems 39 3.1 The De Moivre-Laplace Theorem............................ 39 38 Give a concrete example of the Central Limit Theorem. How could you prove this directly?........................................... 39 3.2 Weak Convergence...................................... 40 39 What is weak convergence? How does it relate to convergence in probability and a.s. convergence?........................................ 40 40 What is an example of a r.v. that converges weakly but not in probability?...... 40 41 Why do we only get convergence at continuity points for weak convergence?..... 40 3.3 Characteristic Functions.................................. 40 42 What is the significance of the inversion formula for characteristic functions?.... 40 43 Give an example where the characteristic functions ϕn of Xn converges to ϕ(t) discon- tinuous at t = 0. What is the limit of the distribution function of the Xn?...... 41 44 Suppose you have X 1 , X 2 ,... and corresponding characteristic functions ϕ 1 , ϕ 2 ,... con- verging point-wise to ϕ(t). What can you say? What is tightness? How do continuity and tightness relate?.................................... 41 3.4 Central Limit Theorems.................................. 41 45 State the i.i.d. central limit theorem. Prove it (possibly with added assumptions).. 41 46 The type of convergence in the Central Limit Theorem is the convergence in distribu- tion. Why isn’t the convergence almost sure?...................... 41 47 Suppose X 1 , X 2 ,... are bounded but

n var(Xn) =^ ∞, what can you say about the limiting behavior of Sn?................................... 41 48 What is the Lindeberg-Feller Central Limit Theorem? How does it relate to the i.i.d. Central Limit Theorem?.................................. 42 3.6 Poisson Convergence.................................... 42 49 What is Poisson Convergence and why is it called the “law of rare events”?...... 42

50 State and prove the Central Limit Theorem. Can you state a version of the Central

66 Apply optional stopping to get a formula for the probability that the simple symmetric

Memorization (– key terms –)

Chapter 1 - Preliminaries

1.1 Probability Spaces

1 probability space, measure space

probability space: (Ω, F, P ) - Ω outcomes, F events, and P : F → [0, 1] assigns probabilities to events

measure space: (Ω, F) - Ω outcomes, F events

2 σ-field/algebra, σ-field generated by A

σ-field: F a non-empty collection of subsets of Ω satisfying:

i A ∈ F =⇒ AC^ ∈ F

ii Ai ∈ F countable sequence, then ∪iAi ∈ F

σ-field generated by A: smallest σ-field containing the collection A, denoted σ(A)

3 measure, probability measure

measure: “non-negative countable additive set function”, i.e. μ : F → R such that

i μ(A) ≥ μ(∅) for all A ∈ F

ii Ai ∈ F countable sequence of disjoint sets, then μ(∪iAi) =

i μ(Ai)

probability measure: μ(Ω) = 1, usually denoted P

4 monotonicity, subadditivity

μ a measure on (Ω, F)

monotonicity: A ⊆ B =⇒ μ(A) ≤ μ(B)

subadditivity: A ⊂ ∪iAi =⇒ μ(A) ≤

i μ(Ai) 5 continuity from below/above

μ a measure on (Ω, F)

if Ai ↑ A (A 1 ⊂ A 2 ⊂ · · · and ∪iAi = A) then μ(Ai) ↑ μ(A)

if Ai ↓ A (A 1 ⊃ A 2 ⊃ · · · and ∩iAi = A) then μ(Ai) ↓ μ(A)

6 discrete probability spaces

Ω a countable set, F all subsets of Ω P (A) =

ω∈A

p(ω)

where p(ω) ≥ 0 and

ω∈Ω p(ω) = 1 [i.e. each^ ω^ gets assigned its own point probability and sets are simply sums of the point probabilities]

Discrete uniform probability - Ω finite and p(ω) = 1/|Ω| for all ω ∈ Ω.

7 Borel sets

the smallest σ-algebra containing the open sets in Rd^ (with the usual Euclidean topology)

8 Stieltjes measure function

A function F : R → R such that F is (i) nondecreasing and (ii) right continuous (limy↓x F (y) = F (x))

9 Lebesgue measures on R and Rd

R: The unique measure on (R, R) such that μ((a, b]) = b − a.

Rd: The unique measure on (R, R) such that μ(A) = area of A for all finite rectangles A.

10 semi-algebra, algebra (field), algebra generated by S

semi-algebra: S such that (i) closed under finite intersection, (ii) S ∈ S implies SC^ is a finite disjoint union of sets in S

algebra: A such that (i) closed under finite intersections, (ii) closed under complements (it follows closed under finite unions)

algebra generated by S: S, collection of finite disjoint unions of sets in S (is an algebra)

11 measure on an algebra

given algebra A a measure on A, μ is a set function μ : A → R such that

(i) μ(A) ≥ μ(∅) = 0 for all A ∈ A and

(ii) Ai ∈ A are disjoint and their union is in A, then μ(∪iAi) =

i μ(Ai). 12 σ-finite

a measure μ on an algebra A is σ-finite if there is a sequence of sets An ∈ A such that μ(An) < ∞ for all n and ∪nAn = Ω (could also assume that An ↑ Ω or the An are disjoint)

13 countably generated σ-field/algebra

F, a σ-field is countably generated if there is a countable collection C ⊂ F such that σ(C) = F

1.2 Distributions

14 random variable (F-measurable)

a real valued function X : Ω → R such that for every Borel set B ⊂ R, X−^1 (B) ∈ F, the specific σ-field on Ω (if specification needed, X is F-measurable)

15 indicator function of a set

example of a random variable where A ∈ F

(^1) A(ω) =

1 ω ∈ A 0 ω /∈ A

16 distribution (function) of a random variable

When X is a random variable on a probability space (Ω, F, P ) then its distribution is a probability measure, μ, on R given by μ(A) = P (X ∈ A) = P (X−^1 (A))

the associated distribution function is given by F (x) = P (X ≤ x) = P (X−^1 ((−∞, x]))

17 equal in distribution

two random variables whose resulting distributions (measures) on R are the same, this occurs exactly

when they have the same distribution function also, denoted by X d = Y

28 extended real line

R∗^ = [−∞, ∞] with Borel sets generated by [−∞, a), (a, b), (b, ∞]

1.4 Integration

29 simple function

ϕ =

∑n i=0 ai^1 Ai where^ Ai^ are disjoint sets with^ μ(Ai)^ <^ ∞ 30 integration of simple functions ∫ ϕdμ =

∫ (^) ∑n

i=

ai (^1) Ai dμ =

∑^ n

i=

aiμ(Ai)

31 φ ≥ ψ almost everywhere

φ ≥ ψ almost everywhere ⇐⇒ μ({ω : φ(ω) < ψ(ω)}) = 0

32 integral of bounded functions ∫ f dμ = sup ϕ≤f

ϕdμ = inf ψ≥f

ψdμ

33 integral of non-negative functions ∫ f dμ = sup 0 ≤h≤f

hdμ : h bounded, μ({x : h(x) > 0 }) < ∞

34 integrable functions ∫ |f |dμ < ∞ (since |f | is non-negative function)

35 integral of (integrable) functions

f +^ = max(f, 0) and f −^ = max(−f, 0)

so that f = f +^ − f −^ and f +, f −^ are both non-negative functions ∫ f dμ =

f +dμ −

f −dμ

36 basic properties of integrals

If f, g are both integrable/non-negative/bounded/simple:

(i) If f ≥ 0 a.e. then

f dμ ≥ 0

(ii) For all a ∈ R,

af dμ = a

f dμ

(iii)

f + gdμ =

f dμ +

gdμ

(iv) If g ≤ f a.e.

gdμ ≤

f dμ

(v) If g = f a.e.

gdμ =

f dμ

(vi)

∣∫^ f dμ

∣ (^) ≤ ∫^ |f |dμ

1.5 Properties of the Integral

37 Jensen’s Inequality (integral)

If ϕ is convex (technical: λϕ(x) + (1 − λ)ϕ(y) ≥ ϕ(λx + (1 − λ)y), λ ∈ (0, 1))

μ probability measure, and f and ϕ(f ) integrable

ϕ

f dμ

ϕ(f )dμ

38 ||f ||p

||f ||p =

|f |pdμ

) 1 /p

for 1 ≤ p < ∞

39 H¨older’s Inequality (integral)

If p, q ∈ (1, ∞) and (^1) p + (^1) q = 1 then (^) ∫

|f g|dμ ≤ ||f ||p||g||q

40 Cauchy-Schwarz Inequality ∫ |f g|dμ ≤ ||f || 2 ||g|| 2 =

f 2 dμ

g^2 dμ

41 Bounded Convergence Theorem

Finite measure set E (‘bounded’: μ(E) < ∞) and fn supported on E (vanishes on EC^ )

fn bounded (i.e. |fn| ≤ M for some M )

fn → f in measure (measure zero set in the limit |fn(ω) − f (ω)| > ε)

∫ f dμ = lim n→∞

fndμ

42 Fatou’s Lemma

fn ≥ 0 =⇒ lim inf n→∞

fndμ ≥

lim inf n→∞ fn

43 Monotone Convergence Theorem

fn ≥ 0 and fn ↑ f =⇒

fndμ ↑

f dμ

44 Dominated Convergence Theorem

If fn → f a.e., |fn| ≤ g for all n where g is integrable then ∫ fndμ →

f dμ

55 computing EX integrals (change of variable formula)

Measure space (S, S, P ) X a random element (variable?) on (S, S) with μ(A) = P (X ∈ A) = P (X−^1 (A))

If f measurable function (S, S) → (R, R) with f ≥ 0 or E|f (X)| < ∞ then

Ef (x) =

S

f (y)μ(dy) =

f dμ

If X has density function F (x) =

∫ (^) x −∞ g(x)dx^ then

Ef (X) =

−∞

f (x)g(x)dx

56 Bernoulli Distribution

this is a discrete distribution

p some parameter, P (X = 1) = p and P (X = 0) = 1 − p

57 Poisson Distribution

this is a discrete distribution

λ some parameter, P (X = k) = e−λλk/k! for k = 0, 1 , 2 ,...

58 Other formulas for Expected Values

If X ≥ 0 EX =

i=0 P^ (X^ ≥^ i) or^ EX^ =^

0 P^ (X^ ≥^ x)dx^ (useful when nice formula for^ P^ (X^ ≥^ x)) so for example E|X| =

0 P^ (|X|^ > x)dx^ (can be derived by Fubini’s Theorem)

1.7 Product Measures, Fubini’s Theorem

59 product measure

Take (X, A, μ 1 ) and (Y, B, μ 2 ) with σ-finite measures, then μ = μ 1 × μ 2 is the unique measure on X × Y such that μ(A × B) = μ 1 (A)μ 2 (B)

60 Fubini’s Theorem

Fubini’s gives conditions for switching multiple integrals using product spaces

Fubini’s Theorem: Let μ 1 , μ 2 be σ-finite with μ = μ 1 × μ 2. If f ≥ 0 or

|f |dμ < ∞ then ∫

X

Y

f dμ 2 dμ 1 =

X×Y

f dμ =

Y

X

f dμ 1 dμ 2

Typical application to summation/sum+integral combinations

Chapter 2 - Law of Large Numbers

2.1 Independence

61 independence of σ-fields, random variables

independence for σ-fields: (finite version) F 1 ,... , Fn σ-fields (all contained in some larger σ-field with P probability measure) are independent

if for any choice of Ai ∈ Fi for all i = 1,... , n

P (∩iAi) =

i

P (Ai)

(infinite collections are independent if all finite sub-collections are independent)

Independence for random variables: X 1 ,... , Xn random variables from (Ω, F, P ) → (R, R) are independent if σ(Xi)’s are all independent which is equivalent to when

P (X 1 ∈ C 1 ,... , Xn ∈ Cn) = P (∩i{Xi ∈ Ci}) =

i

P (Xi ∈ Ci)

for any collection of Ci ∈ R

62 Independence of events and arbitrary collections of events

Most simply, P (A ∩ B) = P (A)P (B)

Independence of Events: Generally, A 1 ,... , An are independent events in (Ω, F, P ) if for any sub-collection of sets (i.e. I ⊆ { 1 , 2 ,... , n} P (∩i∈I Ai =

i∈I

P (Ai)

Independence of Collections of Sets: Given A 1 ,... , An collections of sets, these are independent if for any choice of Ai ∈ Ai of a subcollection (i.e. I ⊆ { 1 ,... , n}) we have Ais are independent. Can always assume Ω ∈ Ai and take the full collection every time.

63 Pairwise Independent, how it differs from Independent

Any pairs are independent (P (A ∩ B) = P (A)P (B))

Independent =⇒ Pairwise Independent but pairwise is strictly weaker

Example:

X 1 , X 2 , X 3 with P (Xi = 0) = P (Xi = 1) = 1/ 2 A 1 = {X 2 = X 3 }, A 2 = {X 1 = X 3 }, and A 3 = {X 1 = X 2 }

A 1 ∩ A 2 = A 1 ∩ A 2 ∩ A 3 so the probabilities are the same, but P (Ai) = 1/2 so adding A 3 changes the RHS of the independence equation.

64 π-system, λ-system, relationship to σ-fields

π-system - closed under intersection

λ-system - Ω ∈ L, countable unions of increasing sets contained, and set subtraction contained (A ⊆ B ⇒ B ∩ AC^ ∈ L)

π − λ systems ⇐⇒ σ-algebra, so π,λ kinda split up the σ algebra properties

65 Dynkin’s π-λ Theorem

π-λ Theorem: If P is a π-system and L is a λ-system with P ⊆ L then σ(P) ⊂ L.

π-system - closed under intersection

λ-system - Ω ∈ L, countable unions contained, and set subtraction contained (A ⊆ B ⇒ B ∩ AC^ ∈ L)

66 distribution of collections of independent variables

Y = (X 1 ,... , Xn) for independent random variables Xi each with distribution μi(Ai) = P (Xi ∈ Ai) (so∏ Y a random vector) then the distribution measure for Y is μ = μ 1 × · · · μn where μ(A 1 × An) =

i μi(Ai).

78 Weak Law of Large Numbers

Weak Law of Large Numbers: Let X 1 , X 2 ,... be i.i.d with finite variance (can weaken to E|Xi| <

∞). Let Sn = X 1 + X 2 + · · · + Xn and μ = EX 1. Then Sn/n → μ converges in probability.

2.3 Borel-Cantelli Lemmas

79 Borel-Cantelli Lemma

Borel-Cantelli Lemma

∑^ ∞

n=

P (An) < ∞ ⇒ P (An i.o.) = 0 = P (lim sup n→∞

An) = P ( lim n→∞ ∪∞ m=nAm)

80 Second Borel-Cantelli Lemma

Second Borel-Cantelli Lemma: If An are independent and

P (An) = ∞ then

P ({x : x ∈ An infinitely often}) = P (lim sup n→∞

An) = P ( lim n→∞ ∪∞ m=nAm) = P (An infinitely often) = 1

2.4 Strong Law of Large Numbers

81 Strong Law of Large Numbers

SLLN: Let X 1 , X 2 ,... be i.i.d. with EXi = μ and EX i^4 < ∞. (can weaken to E|Xi| < ∞) Then

Sn n

X 1 + · · · Xn n −^ a.s.−→ μ

Generalizations: Let X 1 , X 2 ,... be pairwise independent and identically distributed with EXi = μ and E|Xi| < ∞. Then Sn n

X 1 + · · · Xn n

a.s. −−→ μ

Let X 1 , X 2 ,... be i.i.d with EX i+ = ∞ and EX i− < ∞ (hence EXi = ∞). Then

Sn n

X 1 + · · · Xn n

−^ a.s.−→ EX i =^ ∞

2.5 Convergence of Random Series

82 tail σ-field

T only depends on tail behavior, i.e. changing finitely many values does not affect it

Formally, Fn = σ(Xn, Xn+1,.. .) and T = ∩nFn

Examples: {limn→∞ Sn exists} ∈ T , Bn ∈ R then {Xn ∈ Bn i.o.} ∈ T

83 Kolmogorov’s 0-1 Law

X 1 , X 2 ,... are independent, then A ∈ T implies P (A) ∈ { 0 , 1 } (almost always or almost never)

84 exchangeable σ-field and Hewitt-Savage 0-1 Law

exchangeable σ-field invariant sets under finite permutation of values/variables (contains tail σ-field)

Hewitt-Savage 0-1 Law: If X 1 , X 2 ,... i.i.d then A ∈ E implies P (A) ∈ { 0 , 1 }

85 Kolmogorov’s Maximal Inequality

X 1 , X 2 ,... , Xn independent with EXi = 0 and var(Xi) < ∞, Sn = X 1 + · · · Xn as usual

P

max 1 ≤k≤n |Sk| ≥ x

≤ x−^2 var(Sn)

86 General Idea of Kolmogorov’s Three Series Theorem

gives (3) equivalent conditions on Xi and their truncations to show that the series converges a.s.

Chapter 3 - Central Limit Theorems

3.1 The De Moivre-Laplace Theorem

87 Stirling’s Formula

n! ∼ nne−n

2 πn

3.2 Weak Convergence

88 weak convergence

Fn ⇒ F∞ when limn Fn(y) = F∞(y) for all points y where F∞ is continuous.

Equivalently, Xn ⇒ X∞ or μn ⇒ μ∞ where μi is the probability measure for Xi.

89 (equivalent) properties of weak convergence

Xn ⇒ X∞ if and only if Eg(Xn) → Eg(X∞) for all bounded continuous functions g.

Xn ⇒ X∞ if and only if there exists Yn with the same distribution such that Yn → Y∞ a.s.

Also some equivalent conditions in terms of open/closed/Borel sets

90 Helly’s Selection Theorem/vague convergence

Given a sequence Fn of dist. fun.

There is a subsequence Fn(k) that ‘weakly’ converges to a function G that is right continuous and nondecreasing (but may not go to 0,1 in the limits).

91 tight

the sequence Fn is tight if for every ε > 0 there is an Mε so that

lim sup n→∞

1 − Fn(Mε) + Fn(−Mε) ≤ ε ⇐⇒ 1 − ε ≤ lim inf n→∞ Fn(Mε) − Fn(−Mε)

92 tightness criteria

Fn has a subsequence converging weakly to G and G is a distribution function if and only if Fn are tight.