Probability Theory: Understanding Sample Spaces, Events, and Axioms - Prof. David E. Joyce, Study notes of Mathematics

An introduction to probability theory, focusing on the concept of a sample space, which consists of a set of outcomes, a collection of subsets called events, and a probability function. The author, d. Joyce, from clark university, outlines the axioms of probability theory, including the properties of probability functions and the principle of inclusion and exclusion. The document also introduces the concept of random variables and their relation to events and probability functions.

Typology: Study notes

Pre 2010

Uploaded on 08/07/2009

koofers-user-srq-1
koofers-user-srq-1 🇺🇸

9 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Summary of basic probability theory, part 1
D. Joyce, Clark University
Math 218, Mathematical Statistics, Jan 2008
Sample space. Asample space consists of a un-
derlying set S, whose elements are called outcomes,
a collection of subsets of Scalled events, and a
function Pon the set of events, called a probability
function, satisfying the following axioms.
1. The probability of any event is a number in
the interval [0,1].
2. The entire set Sis an event with probability
P(S) = 1.
3. The union and intersection of any finite or
countably infinite set of events are events, and the
complement of an event is an event.
4. The probability of a disjoint union of a finite
or countably infinite set of events is the sum of the
probabilities of those events,
P([
i
Ei) = X
i
P(Ei).
From these axioms a number of other properties
can be derived including these.
5. The the complement E=SEof an event
Eis an event, and
P(E) = 1 P(E).
6. The empty set is an event with probability
P() = 0.
7. For any two events Eand F,
P(EF) = P(E) + P(F)P(EF),
therefore
P(EF)P(E) + P(F).
8. For any two events Eand F,
P(E) = P(EF) + P(EF).
9. If event Eis a subset of event F, then P(E)
P(F).
10. Statement 7 above is called the principle of
inclusion and exclusion. It generalizes to more than
two events.
P(
n
[
r=1
Er) =
n
X
i=1
P(Ei)X
i<j
P(EiEj)
+X
i<j<k
P(EiEjEk) · · ·
+ (1)n1P(E1E2 · ·· En)
In words, to find the probability of a union of
nevents, first sum their individual probabilities,
then subtract the sum of the probabilities of all
their pairwise intersections, then add back the sum
of the probabilities of all their 3-way interections,
then subtract the 4-way intersections, and continue
adding and subtracting k-way intersections until
you finally stop with the probability of the n-way
intersection.
Random variables notation. In order to de-
scribe a sample space, we frequently introduce a
symbol Xcalled a random variable for the sam-
ple space. With this notation, we can replace
the probability of an event, P(E), by the notation
P(XE), which, by itself, doesn’t do much. But
many events are built from the set operations of
complement, union, and intersection, and with the
random variable notation, we can replace those by
logical operations for ‘not’, ‘or’, and ‘and’. For in-
stance, the probability P(EF) can be written as
P(XEbut X /F).
Also, probabilities of finite events can be writ-
ten in terms of equality. For instance, the prob-
1
pf3

Partial preview of the text

Download Probability Theory: Understanding Sample Spaces, Events, and Axioms - Prof. David E. Joyce and more Study notes Mathematics in PDF only on Docsity!

Summary of basic probability theory, part 1

D. Joyce, Clark University

Math 218, Mathematical Statistics, Jan 2008

Sample space. A sample space consists of a un- derlying set S, whose elements are called outcomes, a collection of subsets of S called events, and a function P on the set of events, called a probability function, satisfying the following axioms.

  1. The probability of any event is a number in the interval [0, 1].
  2. The entire set S is an event with probability P (S) = 1.
  3. The union and intersection of any finite or countably infinite set of events are events, and the complement of an event is an event.
  4. The probability of a disjoint union of a finite or countably infinite set of events is the sum of the probabilities of those events,

P (

i

Ei) =

i

P (Ei).

From these axioms a number of other properties can be derived including these.

  1. The the complement E = S − E of an event E is an event, and

P (E) = 1 − P (E).

  1. The empty set is an event with probability P (∅) = 0.
  2. For any two events E and F ,

P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ),

therefore

P (E ∪ F ) ≤ P (E) + P (F ).

  1. For any two events E and F ,

P (E) = P (E ∩ F ) + P (E ∩ F ).

  1. If event E is a subset of event F , then P (E) ≤ P (F ).
  2. Statement 7 above is called the principle of inclusion and exclusion. It generalizes to more than two events.

P (

⋃^ n

r=

Er) =

∑^ n

i=

P (Ei) −

i<j

P (Ei ∩ Ej )

i<j<k

P (Ei ∩ Ej ∩ Ek) − · · ·

  • (−1)n−^1 P (E 1 ∩ E 2 ∩ · · · ∩ En)

In words, to find the probability of a union of n events, first sum their individual probabilities, then subtract the sum of the probabilities of all their pairwise intersections, then add back the sum of the probabilities of all their 3-way interections, then subtract the 4-way intersections, and continue adding and subtracting k-way intersections until you finally stop with the probability of the n-way intersection. Random variables notation. In order to de- scribe a sample space, we frequently introduce a symbol X called a random variable for the sam- ple space. With this notation, we can replace the probability of an event, P (E), by the notation P (X ∈ E), which, by itself, doesn’t do much. But many events are built from the set operations of complement, union, and intersection, and with the random variable notation, we can replace those by logical operations for ‘not’, ‘or’, and ‘and’. For in- stance, the probability P (E ∪ F ) can be written as P (X ∈ E but X /∈ F ). Also, probabilities of finite events can be writ- ten in terms of equality. For instance, the prob-

ability of a singleton, P ({a}), can be written as P (X=a), and that for a doubleton, P ({a, b}) = P (X=a or X=b). One of the main purposes of the random variable notation is when we have two uses for the same sample space. For instance, if you have a fair die, the sample space is S = { 1 , 2 , 3 , 4 , 5 , 6 } where the probability of any singleton is 16. If you have two fair dice, you can use two random variables, X and Y , to refer to the two dice, but each has the same sample space. (Soon, we’ll look at the joint distri- bution of (X, Y ), which has a sample space defined on S × S. Random variables and cumulative distri- bution functions. A sample space can have any set as its underlying set, but usually they’re related to numbers. Often the sample space is the set of real numbers R, and sometimes a power of the real numbers Rn. The most common sample space only has two el- ements, that is, there are only two outcomes. For instance, flipping a coin as two outcomes—Heads and Tails; many experiments have two outcomes— Success and Failure; and polls often have two outcomes—For and Against. Even though these events aren’t numbers, it’s useful to replace them by numbers, namely 0 and 1, so that Heads, Suc- cess, and For are identified with 1, and Tails, Fail- ure, and Against are identified with 0. Then the sample space can have R as its underlying set. When the sample space does have R as its un- derlying set, the random variable X is called a real random variable. With it, the probability of an in- terval like [a, b], which is P ([a, b]), can then be de- scribed as P (a ≤ X ≤ b). Unions of intervals can also be described, for instance P ((−∞, 3) ∪ [4, 5]) can be written as P (X < 3 or 4 ≤ X ≤ 5). When the sample space is R, the probability function P is determined by a cumulative distri- bution function (c.d.f.) F as follows. The function F : R → R is defined by

F (x) = P (X ≤ x) = P ((−∞, x]).

Then, from F , the probability of a half-open inter-

val can be found as

P ((a, b]) = F (b) − F (a).

Also, the probability of a singleton {b} can be found as a limit

P ({b}) = lim a→b (F (b) − F (a)).

From these, probabilities of unions of intervals can be computed. Sometimes, the c.d.f. is simply called the distribution, and the sample space is identified with this distribution. Discrete distributions. Many sample distribu- tions are determined entirely by the probabilities of their outcomes, that is, the probability of an event E is

P (E) =

x∈E

P (X=x) =

x∈E

P ({x}).

The sum here, of course, is either a finite or count- ably infinite sum. Such a distribution is called a dis- crete distribution, and when there are only finitely many outcomes x with nonzero probabilities, it is called a finite distribution. A discrete distributions is usually described in terms of a probability mass function (p.m.f.) f de- fined by

f (x) = P (X=x) = P ({x}).

This p.m.f. is enough to determine this distribution since, by the definition of a discrete distribution, the probability of an event E is

P (E) =

x∈E

f (x).

In many applications, a finite distribution is uni- form, that is, the probabilities of its outcomes are all the same, 1/n, where n is the number of out- comes with nonzero probabilities. When that is the case, the field of combinatorics is useful in find- ing probabilities of events. Combinatorics includes various principles of counting such as the multipli- cation principle, permutations, and combinations.