









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Probabilistic experiments, Sample space, Discrete probability spaces, σ-fields (σ-algebras) are discussed in this lecture
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










6.436J/15.085J Fall 2018 Lecture 1
Contents
Probability theory is a mathematical framework that allows us to reason about phenomena or experiments whose outcome is uncertain. A probabilistic model is a mathematical model of a probabilistic experiment that satisfies certain math- ematical properties (the axioms of probability theory), and which allows us to calculate probabilities and to reason about the likely outcomes of the experi- ment. A probabilistic model is defined formally by a triple ( , F, P), called a probability space, comprised of the following three elements:
(a) is the sample space, the set of possible outcomes of the experiment.
(b) F is a σ-field, a collection of subsets of. (The term “σ-algebra” is also commonly used, as a synonym.)
(c) P is a probability measure, a function that assigns a nonnegative probabil- ity to every set in the σ-field F. Our objective is to describe the three elements of a probability space, and explore some of their properties.
The sample space is a set comprised of all the possible outcomes of the ex- periment. Typical elements of are often denoted by ω, and are called ele- mentary outcomes, or simply outcomes. The sample space can be finite, e.g., = {ω 1 ,... , ωn}, countable, e.g., = N, or uncountable, e.g., = R or = { 0 , 1 }∞^. As a practical matter, the elements of must be mutually exclusive and collectively exhaustive, in the sense that once the experiment is carried out, there is exactly one element of that occurs.
Examples
(a) If the experiment consists of a single roll of an ordinary die, the natural sample space is the set = { 1 , 2 ,... , 6 }, consisting of 6 elements. The outcome ω = 2 indicates that the result of the roll was 2. (b) If the experiment consists of five consecutive rolls of an ordinary die, the natural sample space is the set = { 1 , 2 ,... , 6 }^5. The element ω = (3, 1 , 1 , 2 , 5) is an example of a possible outcome. (c) If the experiment consists of an infinite number of consecutive rolls of an ordinary die, the natural sample space is the set = { 1 , 2 ,... , 6 }^1. In this case, an elemen- tary outcome is an infinite sequence, e.g., ω = (3, 1 , 1 , 5 ,.. .). Such a sample space would be appropriate if we intend to roll a die indefinitely and we are interested in studying, say, the number of rolls until a 4 is obtained for the first time. (d) If the experiment consists of measuring the velocity of a vehicle with infinite preci- sion, a natural sample space is the set R of real numbers.
Note that there is no discussion of probabilities so far. The set simply specifies the possible outcomes.
Before continuing with the discussion of σ-fields and probability measures in their full generality, it is helpful to consider the simpler case where the sample space is finite or countable.
4 σ-FIELDS
When the sample space is uncountable, the idea of defining the probability of a general subset of in terms of the probabilities of elementary outcomes runs into difficulties. Suppose, for example, that the experiment consists of drawing a number from the interval [0, 1], and that we wish to model a situation where all elementary outcomes are “equally likely.” If we were to assign a probability of zero to every ω, this alone would not be of much help in determining the proba- bility of a subset such as [1/ 2 , 3 /4]. If we were to assign the same positive value to every ω, we would obtain P({ 1 , 1 / 2 , 1 / 3 ,.. .}) = ∞, which is undesirable. A way out of this difficulty is to work directly with the probabilities of more general subsets of (not just subsets consisting of a single element). Ideally, we would like to specify the probability P(A) of every subset of
. However, if we wish our probabilities to have certain intuitive mathematical properties, we run into some insurmountable mathematical difficulties. A so- lution is provided by the following compromise: assign probabilities to only a partial collection of subsets of. The sets in this collection are to be thought of as the “nice” subsets of , or, alternatively, as the subsets of of interest. Mathematically, we will require this collection to be a σ-field, a term that we define next.
(g) We roll a die n times. We let Ω = { 1 , 2 ,... , 6 }n, and if we believe that all elemen- tary outcomes (6-long sequences) are equally likely, we let P(ω) = 1/ 6 n^ for every ω ∈ Ω. Given the probabilities pi, the problem of determining P(A) for some sub- set of Ω is conceptually straightfo∑rward. However, the calculations involved in determining the value of the sum (^) ω∈AP(ω) can range from straightforward to daunting. Various methods that can simplify such calculations will be explored in future lectures.
Definition 2. Given a sample space , a σ-field is a collection F of subsets of , with the following properties:
(a) Ø ∈ F. (b) If A ∈ F , then Ac^ ∈ F. (c) If Ai ∈ F for every i ∈ N , then ∪∞ i=1Ai ∈ F.
A set A that belongs to F is called an event , an F-measurable set, or simply a measurable set. The pair ( , F) is called a measurable space.
Remark. A σ-field is often called a σ-algebra, and these terms will be used interchangeably. If we relax condition (c) and require only finite unions to be in F, we get a definition of field (or algebra) of sets – see Def. 4 below.
The term “event” is to be understood as follows. Once the experiment is concluded, the realized outcome ω either belongs to A, in which case we say that the event A has occurred, or it doesn’t, in which case we say that the event did not occur. It turns out that if Ai ∈ F for every i ∈ N, then ∩ni=1^ Ai ∈ F, i.e., a σ-field is closed under countable intersections as well.
Exercise 1. (a) Let F be a σ-field. Prove that if A, B ∈ F, then A ∩ B ∈ F. More generally, given a countably infinite sequence of events Ai ∈ F, prove that ∩^1 i=1^ Ai ∈ F. Hint: Use De Morgan’s law. (b) Prove that property (a) of σ-fields (that is, Ø ∈ F) can be derived from properties (b) and (c), assuming that the σ-field F is non-empty.
The following are some examples of σ-fields. (Check that this is indeed the case.)
Examples.
(a) The trivial σ-field, F = {Ø, }. (b) The collection F = {Ø, A, Ac^ , }, where A is a fixed subset of. (c) The set of all subsets of : F = 2 = {A | A ⊂ }. (d) Let = { 1 , 2 ,... , 6 }n^ , the sample space associated with n rolls of a die. Let A = {ω = (ω 1 ,... ωn) | ω 1 ≤ 2 }, B = {ω = (ω 1 ,... , ωn) | 3 ≤ ω 1 ≤ 4 }, and C = {ω = (ω 1 ,... , ωn) | ω 1 ≥ 5 }, and F = {Ø, A, B, C, A∪B, A∪C, B∪C, }.
4.1 Other reasons for “small” σ-fields.
As we discussed earlier, one reason for using a σ-field which does not include all subsets of is in order to avoid insurmountable mathematical difficulties. However, there is also another reason: we may want to capture the perspective of an observer who receives only partial information about the outcome of the experiment. In that case, it is convenient (loosely speaking) to let F be just the set of events for which the observer will be able to tell whether they occurred or not. With this perspective, a σ-field can be viewed as an abstract description of the information that an observer receives. In particular, if the information available to observers 1 and 2 is described by σ-fields F 1 and F 2 , respectively, and if F 2 ⊂ F 1 , we have a situation in which observer 2 has less information.
Example. We flip a coin twice, and each flip results in Heads (H) or Tails (T). In this context, = {HH, HT, T H, T T }. The natural σ-field, F 1 , is the collection of all subsets of. Consider now an observer who sees only the result of the first coin flip. In this case, we describe the information available to that observer in terms of the smaller
F 2 =
σ-field (^) { Ø, Ω, {HH, HT }, {T H, T T }
} .
In particular, this observer can tell whether the event {HH, HT } has occurred or not, but cannot tell whether the event {HH} has occurred.
We will turn to this association of σ-fields to observers much later, when we consider conditional expectations given partial information.
We are now ready to discuss the assignment of probabilities to events. We have already seen that when the sample space Ω is countable, this can be accom- plished by assigning probabilities to individual elements ω ∈ Ω. However, as discussed before, this does not work when Ω is uncountable. We are then led to assign probabilities to certain subsets of Ω, specifically to the elements of a σ-field F, and require that these probabilities have certain “natural” properties. Besides probability measures, it is also convenient to define the notion of a measure more generally. We will be using the following terminology. We say that a collection of sets Aα ⊂ Ω, where α ranges over some index set is mutually exclusive or that the sets are disjoint if Aα ∩ Aα′ = Ø, whenever α 6 = α′. Also, the sets Aα ⊂ Ω are called collectively exhaustive if ∪αAα = Ω.
a topological space. But for the case of the unit interval our definition is an equivalent one.
Definition 3. Let ( , F) be a measurable space. A measure is a function μ : F → [0, ∞] , which assigns a nonnegative extended real number μ(A) to every set A in F , and which satisfies the following two conditions:
(a) μ( Ø) = 0 ; (b) (Countable additivity, or σ-additivity)P If {Ai} is a sequence of disjoint sets that belong to F^ , then^ μ(∪ ∞ iAi) =^ i=1 μ(Ai)^. A probability measure is a measure P with the additional property P( ) =
In short, a measure is a nonnegative extended real valued σ-additive set func- tion with domain F.
For any A ∈ F, P(A) is called the probability of the event A. The assign- ment of unit probability to the event expresses our certainty that the outcome of the experiment, no matter what it is, will be an element of. Similarly, the outcome cannot be an element of the empty set; thus, the empty set cannot occur and is assigned zero probability. If an event A ∈ F satisfies P(A) = 1, we say that A occurs almost surely. Note, however, that A happening almost surely is not the same as the condition A =. For a trivial example, let = { 1 , 2 , 3 }, p 1 =. 5 , p 2 =. 5 , p 3 = 0. Then the event A = { 1 , 2 } occurs almost surely, since P(A) = .5 + .5 = 1, but A 6 =. The outcome 3 has zero probability, but is still possible. We will study more interesting examples of almost sure events later on when we give examples of non-discrete probability spaces. The countable additivity property is very important. Its intuitive meaning is the following. If we have several events A 1 , A 2 ,.. ., out of which at most one can occur, then the probability that “one of them will occur” is equal to the sum of their individual probabilities. In this sense, probabilities (and more generally, measures) behave like the familiar notions of area or volume: the area or volume of a countable union of disjoint sets is the sum of their individual areas or volumes. Indeed, a measure is to be understood as some generalized notion of a volume. In this light, allowing the measure μ(A) of a set to be infinite is natural, since one can easily think of sets with infinite volume. The properties of probability measures that are required by Definition 3 are often called the axioms of probability theory. Starting from these axioms, many other properties can be derived, as in the next proposition.
(e) Left as an exercise; a simple proof will be provided later, using random variables.
For the special case where n = 2, part (e) of Proposition 2 simplifies to
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
Let us note that all properties (a), (c), and (d) in Proposition 2 are also valid for general measures (the proof is the same). Let us also note that for a proba- bility measure, the property P(Ø) = 0 need not be assumed, but can be derived from the other properties. Indeed, consider a sequence of sets Ai, each of which is equal to the empty set. These sets are disjoint, since ØP ∩ Ø = Ø. Applying the countable additivity property, we obtain ∞ i=1 P(Ø) =^ P(Ø)^ ≤^ P(^ ) = 1, which can only hold if P(Ø) = 0.
Finite Additivity
Our definitions of σ-fields and of probability measures involve countable unions and a countable additivity property. A different mathematical structure is ob- tained if we replace countable unions and sums by finite ones. This leads us to the following definitions.
Definition 4. Let be a sample space.
(a) A field is a collection F 0 of subsets of , with the following properties:
(i) Ø ∈ F. (ii) If A ∈ F , then Ac^ ∈ F. (iii) If A^ ∈ F^ and^ B^ ∈ F^ , then^ A^ ∪^ B^ ∈ F^.
(b) Let F 0 be a field of subsets of. A function P : F 0 → [0, 1] is said to be finitely additive if
A, B ∈ F 0 , A ∩ B = Ø ⇒ P(A ∪ B) = P(A) + P(B).
Remark. A field (of sets) is often called an algebra (of sets), and these terms will be used interchangeably.
We note that finite additivity, for the two case of two events, easily implies finite additivity for a general finite number n of events, namely, the property in
part (a) of Proposition 2. To see this, note that finite additivity for n = 2 allows us to write, for the case of three disjoint events,
P(A 1 ∪ A 2 ∪ A 3 ) = P(A 1 ) + P(A 2 ∪ A 3 ) = P(A 1 ) + P(A 2 ) + P(A 3 ),
and we can proceed inductively to generalize to the case of n events. Finite additivity is strictly weaker than the countable additivity property of probability measures. In particular, finite additivity on a field, or even for the special case of a σ-field, does not, in general, imply countable additivity. The reason for introducing the stronger countable additivity property is that with- out it, we are severely limited in the types of probability calculations that are possible. On the other hand, finite additivity is often easier to verify.
Consider a probability space in which = R. The sequence of events An = [1, n] converges to the event A = [1, ∞), and it is reasonable to expect that the probability of [1, n] converges to the probability of [1, ∞). Such a property is established in greater generality in the result that follows. This result provides us with a few alternative versions of such a continuity property, together with a converse which states that finite additivity together with continuity implies countable additivity. This last result is a useful tool that often simplifies the verification of the countable additivity property.
Theorem 1. (σ-additivity ⇐⇒ continuity) Let F be a field of subsets of , and suppose that P : F → [0, 1] satisfies P( ) = 1 as well as the finite additivity property. Then, the following are equivalent:
(a) P is σ-additive on F. In other words, ifP {Aj }∞ j=1–disjoint, Aj ∈ F and A = ∪∞ ∞ j=1Aj^ ∈ F^ then^ P(A) =^ j=1P(Aj^ ). (b) If {Ai} is an increasing sequence of sets in F (i.e., Ai ⊂ Ai+1, for all i), and A = ∪∞ i=1Ai belongs to F, then limi→∞ P(Ai) = P(A). (c) If {Ai} is a decreasing sequence of sets in F (i.e., Ai ⊃ Ai+1, for all i), and A = ∩∞i=1^ Ai belongs to F, then limi→∞ P(Ai) = P(A). (d) If {Ai} is a decreasing sequence of sets in F (i.e., Ai ⊃ Ai+1, for all i) and ∩∞i=1^ Ai is empty, then limi→∞ P(Ai) = 0.
Notes:
Applying finite additivity to the n disjoint sets B 1 , B 2 ,... , Bn− 1 , ∪∞i=^ nBi, we have
This equality holds for any n, and we can take the limit asP n → ∞. The first term on the right-hand side converges to ∞ i=1 P(Bi). The second term is^ P(An), and as observed before, converges to zero. We conclude that
and property (a) holds.
6.1 Discrete probability spaces revisited
In Section 3, we defined P(A) for every A ⊂ in terms of the probabilities of individual outcomes. We actually need to verify that this formula results in probabilities that satisfy countable additivity. To this effect, we can use Theorem 1. We only need to verify (i) finite additivity and (ii) the continuity property in part (d). Regarding finite additivity, it suffices to consider the case of two sets; the general case is obtained by induction on the number of sets. Suppose that the sets A = {ω 1 , ω 2 ,.. .} and B = {ω 1 ′^ , ω 2 ′^ ,.. .} are disjoint. Let ai = P(ωi) and b = P(ω ′i^ ). We then have A ∪ B = {ω 1 , ω 1 ′^ , ω 2 , ω 2 ′^ ,.. .} and
The second and third equalities above are elementary properties of infinite series involving nonnegative numbers (more generally of absolutely convergent infi- nite series); namely, the order of summation or the grouping of the summands does not matter. Regarding continuity, we need to show that
i=
Bi
n∑− 1
i=
P(Bi) + P
i=n
Bi
i=
Bi
i=
P(Bi), )
An ↓ Ø ⇒ P(An) → 0.
Indeed, without loss of generality, we may assume Ω = { 1 , 2 ,.. .} is the set of natural numbers (to be denoted in this course by either N or Z+). Fix some ǫ > 0. Since
i=1 P(i) =^
ω∈Ω P(ω) = 1^ is a convergent series, it follows that there exists some m ∈ N for which ∑
i≥m
P(i) ≤ ǫ.
i 1 2
P(A∪B) = a 1 +b 1 +a 2 +b 2 +· · · =
i=
(ai+bi) =
i=
ai+
i=
bi = P(A)+P(B).
On the other hand, since An ↓ Ø, it follows that for every i, there exists some ni such that i ∈/ An, for n ≥ ni. By using this property for i = 1,... , m − 1 , we see that An ⊆ {m, m + 1,.. .},
when n is large enough. Thus, for all large enough n,
It follows that limn→∞ P(An) ≤ ǫ. Since ǫ can be an arbitrarily small positive number, we conclude that limn→∞ P(An) = 0.
We will soon find that one often needs to prove that a certain collection of sets is a σ-algebra. Such verifications are facilitated by the following theorem.
Definition 5. A collection of sets M is a monotone class if all increasing and decreasing sequences of sets from M have limits belonging to M. For- mally, let An ∈ M for all n
An ր A ⇒ A ∈ M An ց A ⇒ A ∈ M.
The minimal monotone class containing a collection C is denoted μ(C).
Note that μ(C) is well-defined by an analog of Proposition 1 for intersections of monotone classes.
Theorem 2. If A is an algebra (field) of sets, then
μ(A) = σ(A).
Proof. First, note that any σ-algebra is necessarily a monotone class. Thus
μ(A) ⊆ σ(A).
Second, any collection F of sets which is simultaneously a monotone class and
P(An) ≤ P
{m, m + 1,.. .}
i≥m
P(i) ≤ ǫ.
Remark (Caution: real analysis). The importance of the monotone class theo- rem is that it allows one to avoid the use of transfinite induction when proving properties of σ-algebras. However, if you understand transfinite induction many of the tricky constructions involving monotone classes become much less mys- terious. For example, constructing μ(A) involves taking A, then adding all the limits of increasing and decreasing sets (thus forming new sets “tier 2”), then adding the limits of increasing and decreasing sets in tier 2 (forming “tier 3”), etc. Transfinite induction gives a rigorous sense to the definition, “let μ(A) be the first tier at which this procedure stabilizes”. Intuitively, then, μ(A) is closed under the operation of taking limits. Now if E is a set in any tier then E ∩ A is also a set in the same tier (assuming A ∈ A). Consequently, μ(A) is automat- ically closed under intersections with sets from A. Similarly, one may replace A ∈ A with any A in tier 2, 3, etc – eventually proving μ(A) is closed under intersections.
References
(a) E. Cinlar, Chapter I, Sections 1-4.
(b) Grimmett and Stirzaker, Chapter 1, Sections 1.1-1.3.
(c) Williams, Chapter 1, Sections 1.0-1.5, 1.9-1.10.
(d) Florescu, Tudor: Chapter 1, Sections 2.1–2.3.5.