





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Probability Axioms and preliminaries for Markov Chains
Typology: Lecture notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Probability Spaces
Consider a set or ensemble of items and a class of subsets chosen from the initial collection. The
ensemble could be a collection of identical experiments, and the class of subsets could be various
subcollections of the experiments.
Definition 1: Let S be a set and F a collection of subsets of S satisfying
(i) if A F, then F
c A ,
(ii) if A, BF, then A BF.
The collection F is called an algebra of sets or a field of sets depending on the
particular author. If F has the additional property
(iii) if F i A for i 1 , 2 ,, then F
i 1
i
Another term which is encountered for F is that it is a Borel field. Property (i) states that the collection F is
closed under the operation of set complement, and property (ii) is closure of F under set union. Property
(iii) extends the closure of property (ii) to countable unions of sets. Additional properties of F may be
derived from DeMorgan's Laws.
Lemma 1: (a) if A, BF, then A BF,
(b) if F i A for i 1 , 2 ,, then F
i 1
i
(c) F,
(d) S F.
Proof: (a) DeMorgan's Laws, which are easily proven, state that for two sets A and B,
c c c A B A B and
c c c A B A B. If A, BF, then F
c c A , B from (i), and (ii)
gives F
c c A B. The second of the DeMorgan Laws yields F
c A B , and use of
property (i) obtains (a).
(b) For a countable collection of sets, DeMorgan's Laws become
1 i 1
c
i
c
i
i
and
1 i 1
c
i
c
i
i A A. Hence, if F i A for i 1 , 2 ,, then F
c
i A for i 1 , 2 , and it
follows that F
1 i 1
c
i
c
i
i A A which yields (b) by complementing and invoking (i).
(c) The fact that Fis immediate since
c A A , and (d) is equivalent
to (c) since and S are complementary sets.
Next consider a set function which maps the class of subsets F of S to the real numbers.
(i) if A F, then P A 0 ,
(ii) if A, BFsuch that A B, then P A B P A P B.
The function P is called a measure. In addition, if P satisfies the additional property
(iii) P S 1 ,
then P is called a probability measure or probability. A measure P
which has the property
(iv) if F i A for i 1 , 2 , and i j A A for i j, then
1 i 1
i
i
i
P a probability.
The collection S, F,Pis called a measure space or a probability space in the case with P a probability.
The sets in F are called measurable with F being the collection of measurable sets. In the case in which the
triple S, F,Pis a probability space, then the triple is sometimes called an experiment, the elements of S
are called outcomes, and the sets A F are called events. The real number P Afor A Fis called the
probability of the event A.
Lemma 2: (a) If A, BFand A B, then P A P B,
(b) if F i A for i 1 , 2 ,, and i i
1
then
1
lim
i
n i n
Proof: (a) Observe that B A BAwhere B ABAand that A BA .
Hence, by (ii) it follows that P B P A P BA P Asince P B A 0 by (i).
(b) Note that i P A for all i by (a) and (iii). Let 1
i i i B A A and
i 1
i A A. Observe that i j B B for i jand that B A i for all i. Since
1
1
i
i
A A B, it follows from (iv) that
1
1
i
i
P A PA PB. However,
1
i i i
A B A and i i 1
B A give 1
i i i
P A PB PA by (ii), so
1
1
1
1
1
lim
n
i
i i n i
i
n n
lim 1
and (b) follows.
Further relations deal with dependency between two events.
Definition 3: For A, BF, the conditional probability of A given B is
P AB P AB P Bwhere P B 0.
Definition 4: Events A, BFare independent if P AB P A P B
in which case P A B P A.
Random Variables, Distributions, Densities
Given a probability space S, F,P, a random variable is simply a mapping from the set of
outcomes to the real numbers.
A random variable and its distribution have been defined in a general context with no distinction
for the range of the variable or the differentiability of the distribution.
Definition 7: If a random variableX :S Rcan take at most countably many values,
then the variable is said to be a discrete random variable.
For a discrete random variable, the distribution function is a step function with jumps at the points on which
the variable is defined. These jumps are the probabilities of the points and are given by the probability
i X i X i p x F x F x where the latter term indicates the limit from the left. The
alternative to a discrete variable is one that can on a continuum of values.
Defintion 8: A random variable X with distribution function X
F that is differentiable
except at countably or fewer points is a continuous random variable.
Definition 9: The density function of a continuout random variable X is the derivative
of the distribution function f x dF dx X X at those points where the derivative exists.
Observe that the distribution function of a continuous random variable may be written as
x
X X
F x f d and that the conditional distribution yields the conditional density
f x A dF x A dx X X
Integration, Expectation, Moments
A characteristic function of a set A is the function
A
0 for
1 for
, and a simple function is
a finite linear combination of characteristic functions of measurable sets, i.e.,
n
k
i Ai
c
1
where the A i
are measurable sets in a probability space S, F,P. The integral of over a set of finite
measure A is given by
n
i
i i A
dP cPA A
1
and the integral of a measurable function or random variable X is defined by
A X A X A
X dP sup dP inf dP.
Definition 10: If A is the entire space S, then the expectation or mean of X is
X A
E X XdP XdP X dF
S
with the latter expression indicating the Lebesgue-Stieljes integral with
respect to the distribution.
This definition may be expanded to include expressions of the form
X E gX gX dP gX dF
S
for a general class of functions having sufficiently well-behaved properties. Under fairly general conditions
the Lebesgue-Stieljes integral reduces to a Riemann-Stieljes integral and the expectation becomes
X E gX gX dP gx dF
S
Assuming that X has a density given by the derivative of its distribution function, the expectations are
E X XdF xf xdx X X
and
E gX gx f xdx X
where f x X is the density of X . Of course for a discrete random variable, the integrals become
summations and the density is replaced by a probability mass.
Definition 11: The n
th moment of a random variable X is
E X x f x dx X
n n
n
with the obvious changes for a discrete variable. The first moment is simply
th central moment of X is the moment
about the mean
E X x f xdx X
n n
and the second central moment is the variance
2
X
root the standard deviation X
Sequences, Convergence, and Limits
A number of important results regarding bounds and limiting processes are very useful and require
recognition. Elaboration and proof can be found in any number of references.
Markov Inequality: If X is a nonnegative random variable, then for any
a 0 , P X a E X a.
Proof: Note that
a
a
E X xf x dx xf xdx xf xdx xf x dx X X X X 0 0
x f x dx af x dx a f x dx aP X a
a a a
X X X
Chebyshev Inequality: If X is a random variable with mean and variance
2 , then for any
k 0 , ^ ^
2 2 P X k k.
Proof: From the preceding result
2 2 2 2 2 2
But
2 2
Law of Large Numbers: If , , 1 2 X X is a sequence of independent and identically distributed
random variables with i E X and variance
2 , then
X X X n n n
1 2 lim with
probability 1.
n
n
1 1 1
k n i
n n i n
1 1 1 1 1
n i
n i
k n i
n i
1 1
n i
n i
probability
1 1
n i
n i
i 1
n i
n
n n
1 , so by
1 1
n i
n i
n n
n n
n n
n n
The final form of convergence is the weakest but deals directly with the distribution function.
n X converges in
X X n
n
lim (^) forx R.
Another lemma shows that convergence in probability yields convergence in distribution.
n
n X n
and
X , and note that
n n n n n
Next observe
n n n n n
n n n
n n n
X (^) n n X n
A similar argument obtains
n n
P X x P X X F x P X X n X n n n
F x P X X X n n
so that
F x P X X F x F x P X X X n Xn X n
Since the sequence converges in probability, lim 0
n n
and
F x F x F x X X X n n
lim
for all 0. Therefore, F x F x X X n n
lim.
Jointly Distributed Variables
Given random variables X and Y on a probability space S, F,P, one may question the joint
behavior and connection between these two despite the existence of their individual properties.
Definition 12: The joint distribution of X and Y on the space S , F,Pis the function
F xy P X xY y XY
Observe that the individual or marginal distributions of each variable are given by F x F x y XY
y
X lim ,
and F y F x y XY
x
Y lim ,
, and if the distribution has second order derivatives, then the joint density
function is f xy F x y x y XY XY
2 from which the marginal densities may be obtained as
f x f xy dy XY X , and
f y f xy dx XY Y ,. Recalling the definition of independent events, it
follows immediately that random variables X and Y are independent if F x y F x F y XY X Y , , and this
yields the same expression for the densities of independent variables f xy f x f y XY X Y
Definition 13: The random variables X and Y are uncorrelated if E XY E X EY and are
called orthogonal if E XY 0.
Note that if two random variables are independent then they are uncorrelated.
Definition 14: The covariance of two random variables is X Y
11
Stochastic Processes
Consider a random variable that evolves in time. From one perspective this could be regarded as a
collection of random variables that are indexed by time. However, the index need not be time.
Definition 14: Given a probability space S, F,P, a stochastic process is a collection of random
variables X t, twith the index set of the process. Note that the dependence on Sis
understood, and the index is often written as a subscript t X. If the index set is finite or countable
the process is said to be discrete. and if is an interval of real numbers, the process is said to
be continuous with the index commonly referred to as time. The range of the process is a subset
of the real numbers and is called the state space of the process.
Since a real-valued stochastic process X t is a collection of random variables, information about the
process is contained in the joint distributions or densities of the variables in the process. The n
th order
distribution of the process is
n n n n F x, ,x ;t, ,t PXt x, ,Xt x 1 1 1 1