




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Appendix A provides a brief introduction to elementary probability theory, that could be useful for some mathematics students. (The appendices begin on page ...
Typology: Schemes and Mind Maps
1 / 484
This page cannot be seen from the preview
Don't miss anything!





























































































ii My grandparents’ generation
Chapters 1–5 and Appendix B provide the mathematical foundation for the rest of the text. Then Chapters 6–7 hone some tools geared to probability theory. Appendix A provides a brief introduction to elementary probability theory, that could be useful for some mathematics students. (The appendices begin on page 425.) The classical weak law of large numbers (WLLN) and strong law of large numbers (SLLN) as presented in Sections 8.2–8.4 are particularly complete, and they also emphasize the important role played by the behavior of the maximal summand. Presentation of good inequalities is emphasized in the entire text, and this chapter is a good example. Also, there is an (optional) extension of the WLLN in Appendix C that focuses on the behavior of the sample variance, even in very general situations. It will be appealed to in the optional Section 10.5 and Chapter 11. The classical central limit theorem (CLT) and its Lindeberg and Liapunov and Berry–Esseen generalizations are presented in Chapter 10 using the characteristic function (chf) methods introduced in Chapter 9. Conditions for both the weak boot- strap and the strong bootstrap are also developed in Chapter 10, as is a universal bootstrap CLT based on light trimming of the sample. This approach emphasizes a statistical perspective. Gamma and Edgeworth approximations appear at the end of Chapter 11. Both distribution functions (dfs F (·)) and quantile functions (qfs K(·) ≡ F −^1 (·)) are emphasized throughout (quantile functions are important to statisticians). In Chapter 6 much general information about both dfs and qfs and the Winsorized vari- ance is developed. The text includes presentations showing how to exploit the in- verse transformation X ≡ K(ξ) with ξ ∼= Uniform(0, 1). In particular, Appendix C inequalities relating the qf and the Winsorized variance to some empirical process results of Chapter 12 were used in the first edition to treat trimmed means and L-statistics, rank and permutation tests, sampling from finite populations.
I have learned much through my association with David Mason, and I would like to acknowledge that here. Especially (in the context of this text), Theorem 12.4. is a beautiful improvement on Theorem 12.10.3, in that it still has the potential for necessary and sufficient results. I really admire the work of Mason and his colleagues. It was while working with David that some of my present interests developed. In particular, a useful companion to Theorem 12.10.3 is knowledge of quantile functions. Section 7.6 and Sections C.2–C.X present what I have compiled and produced on that topic while working on various applications, partially with David. Jon Wellner has taught from several versions of this text. In particular, he typed an earlier version and thus gave me a major critical boost. That head start is what turned my thoughts to writing a text for publication. Sections 14.2, and the Hoffman–Jorgensen inequalities came from him. He has also formulated a number of exercises, suggested various improvements, offered good suggestions and references regarding predictable processes, and pointed out some difficulties. My thanks to Jon for all of these contributions. (Obviously, whatever problems may remain lie with me.)
Preface iii Use of This Text ix Definition of Symbols xiii
Chapter 1. Measures
Chapter 2. Measurable Functions and Convergence
Chapter 3. Integration
Chapter 4. Derivatives via Signed Measures
Chapter 5. Measures and Processes on Products
viii CONTENTS
Chapter 14. Convergence in Law on Metric Spaces
Chapter 15. Asymptotics Via Empirical Processes
Appendix A. Special Distributions
Appendix B. General Topology and Hilbert Space
Appendix C. More on the WLLN and CLT
Appendix D. LLN and CLT Summaries
References 451 Index 521
The University of Washington is on the quarter system, so my description will reflect this fact. My thoughts are offered as a potential guide to an instructor. They certainly do not comprise an essential recipe. The reader will note that the exercises are interspersed with the text. It is important to read all of the exercises as they are encountered, as most of them contain some worthwhile contribution to the story. Chapters 1–5 provide the measure-theoretic background that is necessary for the rest of the text. Many of our students have had at least some kind of an undergraduate exposure to part of this subject. Still, it is important that I present the key parts of this material rather carefully. I feel it is useful for all of them. Chapter 1 (measures; 5 lectures) Emphasized in my presentation are generators, the monotone property of measures, the Carath´eodory extension theorem, completions, the approximation lemma, and the correspondence theorem. Presenting the correspondence theorem carefully is important, as this allows one the luxury of merely highlighting some proofs in Chapter 5. [The minimal monotone class theorem of Section 1.1, claim 8 of the Carath´edory extension theorem proof, and most of what follows the approximation lemma in Section 1.2 would never be presented in my lectures.] {I always assign Ex- ercises 1.1.1 (generators), 1.2.1 (completions), and 1.2.3 (the approximation lemma). Other exercises are assigned, but they vary each time.} Chapter 2 (measurable functions and convergence; 4 lectures) I present most of Sections 2.1, 2.2, and 2.3. Highlights are preservation of σ-fields, measurability of both common functions and limits of simple functions, induced measures, convergence and divergence sets (especially), and relating →μ to →a.s (especially, reducing the first to the second by going to subsequences). I then assign Section 2.4 as outside reading and Section 2.5 for exploring. [I never lecture on either Section 2.4 or 2.5.] {I always assign Exercises 2.2.1 (specific σ-fields), 2.3. (concerning →a.e.), 2.3.3 (a substantial proof), and 2.4.1 (Slutsky’s theorem).} Chapter 3 (integration; 7 lectures) This is an important chapter. I present all of Sections 3.1 and 3.2 carefully, but Section 3.3 is left as reading, and some of the Section 3.4 inequalities (Cr , H¨older, Liapunov, Markov, and Jensen) are done carefully. I do Section 3.5 carefully as far as Vitali’s theorem, and then assign the rest as outside reading. {I always assign Exercises 3.2.1–3.2.2 (only the zero function), 3.3.3 (differentiating under the integral sign), 3.5.1 (substantial theory), and 3.5.7 (the Scheff´e theorem).} Chapter 4 (Radon–Nikodym; 2 lectures) I present ideas from Section 4.1, sketch the Jordan–Hahn decomposition proof, and then give the proofs of the Lebesgue decomposition, the Radon–Nikodym theorem, and the change of variable theorem. These final two topics are highlighted. The fundamental theorem of calculus of Section 4.4 is briefly discussed. [I would never present any of Section 4.3.] {I always assign Exercises 4.2.1 (manipulating Radon– Nikodym derivatives), 4.2.7 (mathematically substantial), and 4.4.1, 4.4.2, and 4.4. (so that the students must do some outside reading in Section 4.4 on their own).}
USE OF THIS TEXT xi
practices the important Op(·) and op(·) notation), 8.4.4 (the substantial result of Marcinkiewicz and Zygmund), 8.4.7 (random sample size), and at least one of the alternative SLLN proofs contained in 8.4.8, 8.4.9, and 8.4.10.} At this point at the beginning of the winter quarter the instructor will have his/her own opinions about what to cover. I devote the winter quarter to the weak law of large numbers (WLLN), an introduction to the law of the iterated logarithm (LIL), and various central limit theorems (CLTs). That is, the second term treats material from Chapters 8-10, with others optional. I will now outline my choices. Chapter 8 (LLNs, inequalities, LIL, and series; 6 lectures) My lectures cover Section 8.3 (symmetrization inequalities and L´evy’s inequality for the WLLN, and the Ottovani–Skorokhod inequality for series), Feller’s WLLN from Section 8.4, the Glivenko–Cantelli theorem from Section 8.5, the LIL for normal rvs in Proposition 8.6.1, the strong Markov property of Theorem 8.7.1, and the two series Theorem 8.8.2. [I do not lecture from any of Sections 8.9, 9.10, or 8.11 at this time.] {I always assign Exercise 8.6.1 (Mills’ ratio).} Chapter 9 (characteristic functions (chfs); 8 lectures) Sections 9.1 and 9.2 contain classic results that relate to deriving convergence in distribution from convergence of various classes of integrals. I also cover sections 9.3–9.8. {I always assign Exercises 9.3.1 and 9.3.3(a) (deriving specific chfs) and 9.6.1 (Taylor series expansions of the chf).} Chapter 10 (CLTs via chfs; 6 lectures) The classical CLT, the Poisson limit theorem, and the multivariate CLT make a nice lecture. The chisquare goodness of fit example and/or the median example (of Section 10.3) make a lecture of illustrations. Chf proofs of the usual CLTs are given in Section 10.2 (Section 9.5 on Esseen’s lemma could have been left until now). Other examples from Section 10.2 or 10.3 could now be chosen, and Example 10.3. (weighted sums of iid rvs) is my first choice. [The chisquare goodness of fit example could motivate a student to read from Sections A.3 and A.4.] At this stage I still have at least 7 optional lectures at the end of the winter quarter and about 12 more at the start of the spring quarter. In my final 16 lectures of the spring quarter I feel it appropriate to consider Brownian motion in Chapter 12 and then martingales in Chapter 13 (in a fashion to be described below). Let me first describe some possibilities for the optional lectures, assuming that the above core was covered. Chapter 10 (bootstrap) Both Sections 10.8 and 10.9 on the bootstrap require only a discussion of section 10.??. Chapter 19 (convergence in distribution) Convergence in distribution on the line is presented in Chapter 10. [This is extended to metric spaces in Chapter 14, but I do not lecture from it.] Chapter 10 (domain of normal attraction of the normal df) The converse of the CLT in Theorem 10.6.1 requires the Gin´e–Zinn symmetrization inequality and the Khinchine inequality of Section 8.3 and the Paley–Zygmund inequality of Section 3.4. Chapters 7, 10 and 11 (domain of attraction of the Normal df) Combining Sections 6.6, C.1-C.4, Section 8.3 subsection on maximal inequalities of another ilk, and Sections 10.5–10.6 makes a nice unit. L´evy’s asymptotic normality
xii USE OF THIS TEXT
condition (ANC) of (10.6.3) for a rv X has some prominence. In Section B. purely geometric methods plus Cauchy–Schwarz are used to derive a multitude of equivalent conditions. In the process, quantile functions are carefully studied. In Section 10.1 the ANC is seen to be equivalent to a result akin to a WLLN for the rv X^2 , and so in this context many additional equivalent conditions are again derived. Thus when one comes to the general CLT in Sections 10.5 and 10.6, one already knows a great deal about the ANC. Chapter 11 (infinitely divisible and stable laws) First, Section 11.1 (infinitely divisible laws) is independent of the rest, including Section 11.2 (stable laws). The theorem stated in Section 11.4 (domain of attraction of stable laws) would require methods of Section B.4 to prove, but the interesting exercises are accessible without this. Chapter 11 (higher-order approximations) The local limit theorem in Section 10.4 can be done immediately for continuous dfs, but it also requires Section 9.8 for discrete dfs. The expansions given in Sections 11.5 (Gamma approximation) and 11.6 (Edgeworth approximation) also require Exercise 9.6.7. Assorted topics suitable for individual reading Possibilities include Section 13.8 (counting process martingales), and Section 13. (martingale CLTs). Section 15.1 on trimmed means and Section 15.2 on R-statistics (including a finite sampling CLT) are both fun; both require some discussion of Section C.6. The primary topics for the spring quarter are Chapter 12 (Brownian motion and elementary empirical processes) and Chapter 13 (martingales). Chapter 12 (Brownian motion; 6 lectures) I discuss Section 12.1, sketch the proof of Section 12.2 and carefully apply that result in Section 12.3, and treat Section 12.4 carefully (as I believe that at some point a lecture should be devoted to a few of the more subtle difficulties regarding measurability). I am a bit cavalier regarding Section 12.5 (strong Markov property), but I apply it carefully in Sections 12.6, 12.7, and 12.8. I assign Section 12.9 as outside reading. [I do not lecture on Theorem 12.8.2.] {I always assign Exercises 12.1.2 (on (C, C)), 12.3.1 (various transforms of Brownian motion), 12.3.3 (integrals of normal processes), 12.4.1 (properties of stopping times), 12.7.3(a) (related to embedding a rv in Brownian motion), and 12.8.2 (the LIL via embedding).} At this point let me describe three additional optional topics that could now be pursued, based on the previous lectures from Chapter 12. Chapter 12 (elementary empirical processes) Uniform empirical and quantile processes are considered in Section 12.10. Straight- forward applications to linear rank statistics and two-sample test of fit are included. One could either lecture from Section 12.12 (directly) or 12.11 (with a preliminary lecture from Sections 10.10–10.11, or leave these for assigned reading.) Chapter 11 (martingales; 10 lectures) I cover most of the first seven sections. {I always assign Exercises 11.1.4 (a counting process martingale), 11.3.2 (a proof for continuous time mgs), 11.3.7, and 11.3.9 (on Lr -convergence).}
xiv DEFINITION OF SYMBOLS
Motivation 1.1 (The Lebesgue integral) The Riemann integral of a continuous function f (we will restrict attention to f (x) ≥ 0 on a ≤ x ≤ b for convenience) is formed by subdividing the domain of f , forming approximating sums, and passing to the limit. Thus the mth Riemann sum for
∫ (^) b a f^ (x)^ dx^ is defined as
RSm ≡
∑^ m
i=
(1) f (x∗ mi) [xmi − xm,i− 1 ],
where a ≡ xm 0 < xm 1 < · · · < xmm ≡ b (with xm,i− 1 ≤ x∗ mi ≤ xmi for all i) satisfy meshm ≡ max[xmi − xm,i− 1 ] → 0. Note that xmi − xm,i− 1 is the measure (or length) of the interval [xm,i− 1 , xmi], while f (x∗ mi) approximates the values of f (x) for all xm,i− 1 ≤ x ≤ xmi (at least it does if f is continuous on [a, b]). Within the class C+^ of all nonnegative continuous functions, this definition works reasonably well. But it has one major shortcoming. The conclusion
∫ (^) b a fn(x)^ dx^ →^
∫ (^) b a f^ (x)^ dx is one we often wish to make if fn “converges” to f. However, even when all fn are in C+^ and f (x) ≡ lim fn(x) actually exists, it need not be that f is in C+^ (and thus ∫ (^) b a f^ (x)^ dx^ may not even be well-defined) or that^
∫ (^) b a fn(x)^ dx^ →^
∫ (^) b a f^ (x)^ dx^ (even when it is well defined). A different approach is needed. (Note figure 1.1.) The Lebesgue integral of a nonnegative function is formed by subdividing the range. Thus the mth Lebesgue sum for
∫ (^) b a f^ (x)^ dx^ is defined as
LSm ≡
m ∑ 2 m
k=
k − 1 2 m^
× measure
x :
k − 1 2 m^
≤ f (x) <
k 2 m
and
∫ (^) b a f^ (x)^ dx^ is defined to be the limit of the^ LSm^ sums as^ m^ → ∞. For what class M of functions f can this approach succeed? The members f of the class M will need to be such that the measure (or length) of all sets of the form { x : k − 1 2 m^
≤ f (x) < k 2 m
Definition 1.1 (Set theory) Consider a nonvoid class A of subsets A of a nonvoid set Ω. (For us, Ω will be the sample space of an experiment.) (a) Let Ac^ denote the complement of A, let A ∪ B denote the union of A and B, let A ∩ B and AB both denote the intersection, let A \ B ≡ ABc^ denote the set difference, let A△B ≡ (AcB ∪ ABc) denote the symmetric difference, and let ∅ denote the empty set. The class of all subsets of Ω will be denoted by 2Ω. Sets A and B are called disjoint if AB = ∅, and sequences of sets An or classes of sets At are called disjoint if all pairs are disjoint. Writing A + B or
1 An^ will also denote a union, but will imply the disjointness of the sets in the union. As usual, A ⊂ B denotes that A is a subset of B. We call a sequence An increasing (and we will nearly always denote this fact by writing An ր) when An ⊂ An+1 for all n ≥ 1. We call the sequence decreasing (denoted by An ց) when An ⊃ An+1 for all n ≥ 1. We call the sequence monotone if it is either increasing or decreasing. Let ω denote a generic element of Ω. We will use 1A(·) to denote the indicator function of A, which equals 1 or 0 at ω according as ω ∈ A or ω 6 ∈ A. (b) A will be called a field if it is closed under complements and unions. (That is, A and B in A requires that Ac^ and A ∪ B be in A.) [Note that both Ω and ∅ are necessarily in A, as A was assumed to be nonvoid, with Ω = A ∪ Ac^ and ∅ = Ωc.] (c) A will be called a σ-field if it is closed under complements and countable unions. (That is, A, A 1 , A 2 ,... in A requires that Ac^ and ∪∞ 1 An be in A.) (d) A will be called a monotone class provided it contains ∪∞ 1 An for all increasing sequences An in A and contains ∩∞ 1 An for all decreasing sequences An in A. (e) (Ω, A) will be called a measurable space provided A is a σ-field of subsets of Ω. (f) A will be called a π-system provided AB is in A for all A and B in A; and A will be called a ¯π-system when Ω in A is also guaranteed.
If A is a field (or a σ-field), then it is closed under intersections (under countable intersections); since AB = (Ac^ ∪ Bc)c^ (since ∩∞ 1 An = (∪∞ 1 Acn)c). Likewise, we could have used “intersection” instead of “union” in our definitions by making use of A ∪ B = (Ac^ ∩ Bc)c^ and ∪∞ 1 An = (∩∞ 1 Acn)c. (This used De Morgan’s laws.)
Proposition 1.1 (Closure under intersections) (a) Arbitrary intersections of fields, σ-fields, or monotone classes are fields, σ-fields, or monotone classes, respectively. [For example, F ≡ ∩{Fα : Fα is a field under consideration} is a field.] (b) There is a minimal field, σ-field, or monotone class generated by (or, containing) any specified class C of subsets of Ω. Call C the generators. For example,
σ[C] ≡
(4) {Fα : Fα is a σ-field of subsets of Ω for which C ⊂ Fα}
is the minimal σ-field generated by C (that is, containing C). (c) A collection A of subsets of Ω is a σ-field if and only if it is both a field and a monotone class.
Proof. (c) (⇐) ∪∞ 1 An = ∪∞ 1 (∪n 1 Ak)) ≡ ∪∞ 1 Bn ∈ A since the Bn are in A and are ր. Everything else is even more trivial. 2
Exercise 1.1 (Generators) Let C 1 and C 2 denote two collections of subsets of the set Ω. If C 2 ⊂ σ[C 1 ] and C 1 ⊂ σ[C 2 ], then σ[C 1 ] = σ[C 2 ]. Prove this fact.
Definition 1.2 (Measures and events) Consider a measurable space (Ω, A) and a set function μ : A → [0, ∞] (that is, μ(A) ≥ 0 for each A ∈ A) having μ(∅) = 0. (a) Now A is a σ-field and if μ is countably additive (abbreviated c.a.) in that
μ
n=
An
n=
(5) μ(An) for all disjoint sequences An in A,
then μ is called a measure (or, equivalently, a countably additive measure) on (Ω, A). The triple (Ω, A, μ) is then called a measure space. We call μ finite if μ(Ω) < ∞.
We call μ σ-finite if there exists a measurable decomposition of Ω as Ω =
1 Ωn with Ωn ∈ A and μ(Ωn) < ∞ for all n. The sets A in the σ-field A are called events.
[Even if A is not a σ-field, we will still call μ a measure on (Ω, A), when (5) holds for all sequences An ∈ A for which
1 An^ is in^ A. We will not, however, use the term “measure space” to describe such a triple. We will consider below measures on fields, on certain ¯π-systems, and on some other collections of sets. A useful property of a collection of sets is that along with any sets A 1 ,... , Ak it also includes all sets of the type Bk ≡ AkAck− 1 · · · Ac 2 Ac 1 ; then
⋃n 1 Ak^ =^
∑n 1 Bk^ is easier to work with.] (b) Of less interest, call μ a finitely additive measure (abbreviated f.a.) on (Ω, A) if
μ(
∑n 1 Ak^ ) =^
∑n (6) 1 μ(Ak )
for all disjoint sequences Ak in A for which
∑n 1 Ak^ is also in^ A.
Definition 1.3 (Outer measures) Consider a set function μ∗^ : 2Ω^ → [0, ∞]. (a) Suppose that μ∗^ also satisfies the following three properties. Null: μ∗(∅) = 0. Monotone: μ∗(A) ≤ μ∗(B) for all A ⊂ B. Countable subadditivity: μ∗(
1 An)^ ≤^
1 μ
∗(An) for all sequences An.
Then μ∗^ is called an outer measure. (b) An arbitrary subset A of Ω is called μ∗-measurable if
(7) μ∗(T ) = μ∗(T A) + μ∗(T Ac) for all subsets T ⊂ Ω.
Sets T used in this capacity are called test sets. (c) We let A∗^ denote the class of all μ∗-measurable sets, that is,
(8) A∗^ ≡ {A ∈ 2 Ω^ : A is μ∗-measurable}.
[Note that A ∈ A∗^ if and only if μ∗(T ) ≥ μ∗(T A) + μ∗(T Ac) for all T ⊂ Ω, since the other inequality is trivial by the subadditivity of μ∗.]
Motivation 1.2 (Measure) In this paragraph we will consider only one possible measure μ, namely the Lebesgue-measure generalization of length. Let CI denote the set of all intervals of the types (a, b], (−∞, b], and (a, +∞) on the real line R, and for each of these intervals I we assign a measure value μ(I) equal to its length, thus ∞, b − a, ∞ in the three special cases. All is well until we manipulate the sets