Probability Theory: Foundations and Concepts, Lecture notes of Probability and Stochastic Processes

Probability Axioms and preliminaries for Markov Chains

Typology: Lecture notes

2017/2018

Uploaded on 04/21/2018

mathiskool
mathiskool 🇺🇸

1 document

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Probability Spaces
Consider a set or ensemble of items and a class of subsets chosen from the initial collection. The
ensemble could be a collection of identical experiments, and the class of subsets could be various
subcollections of the experiments.
Definition 1: Let S be a set and F a collection of subsets of S satisfying
(i) if FA, then F
c
A,
(ii) if FBA,, then F BA .
The collection F is called an algebra of sets or a field of sets depending on the
particular author. If F has the additional property
(iii) if F
i
A for
,2,1
i, then F
1i
i
A,
then F is called a
-algebra or
-field of sets.
Another term which is encountered for F is that it is a Borel field. Property (i) states that the collection F is
closed under the operation of set complement, and property (ii) is closure of F under set union. Property
(iii) extends the closure of property (ii) to countable unions of sets. Additional properties of F may be
derived from DeMorgan's Laws.
Lemma 1: (a) if FBA,, then F BA ,
(b) if F
i
A for
,2,1
i, then F
1i
i
A,
(c) F ,
(d) FS .
Proof: (a) DeMorgan's Laws, which are easily proven, state that for two sets A and B,
ccc BABA and
ccc BABA . If FBA,, then F
cc BA , from (i), and (ii)
gives F cc BA . The second of the DeMorgan Laws yields
F c
BA , and use of
property (i) obtains (a).
(b) For a countable collection of sets, DeMorgan's Laws become
11 i
c
i
c
i
iAA
and
11 i
c
i
c
i
iAA . Hence, if F
i
A for
,2,1
i, then F
c
i
A for
,2,1
i and it
follows that F
11 i
c
i
c
i
iAA which yields (b) by complementing and invoking (i).
(c) The fact that F is immediate since c
AA , and (d) is equivalent
to (c) since
and S are complementary sets.
Next consider a set function which maps the class of subsets F of S to the real numbers.
Definition 2: Let P be a function from F to the real numbers
R
such that
(i) if FA, then
0AP ,
(ii) if FBA, such that
BA , then
BPAPBAP .
The function
P is called a measure. In addition, if
P satisfies the additional property
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Probability Theory: Foundations and Concepts and more Lecture notes Probability and Stochastic Processes in PDF only on Docsity!

Probability Spaces

Consider a set or ensemble of items and a class of subsets chosen from the initial collection. The

ensemble could be a collection of identical experiments, and the class of subsets could be various

subcollections of the experiments.

Definition 1: Let S be a set and F a collection of subsets of S satisfying

(i) if A F, then F

c A ,

(ii) if A, BF, then A  BF.

The collection F is called an algebra of sets or a field of sets depending on the

particular author. If F has the additional property

(iii) if F i A for i  1 , 2 ,, then F

i 1

i

A ,

then F is called a -algebra or -field of sets.

Another term which is encountered for F is that it is a Borel field. Property (i) states that the collection F is

closed under the operation of set complement, and property (ii) is closure of F under set union. Property

(iii) extends the closure of property (ii) to countable unions of sets. Additional properties of F may be

derived from DeMorgan's Laws.

Lemma 1: (a) if A, BF, then A  BF,

(b) if F i A for i  1 , 2 ,, then F

i 1

i

A ,

(c)  F,

(d) S  F.

Proof: (a) DeMorgan's Laws, which are easily proven, state that for two sets A and B,

 

c c c A  B A B and  

c c c A  B A B. If A, BF, then F

c c A , B from (i), and (ii)

gives  F

c c A B. The second of the DeMorgan Laws yields    F

c A B , and use of

property (i) obtains (a).

(b) For a countable collection of sets, DeMorgan's Laws become

1 i 1

c

i

c

i

i

A A

and

1 i 1

c

i

c

i

i A A. Hence, if F i A for i  1 , 2 ,, then F

c

i A for i  1 , 2 , and it

follows that  F 

1 i 1

c

i

c

i

i A A which yields (b) by complementing and invoking (i).

(c) The fact that  Fis immediate since  

c A A , and (d) is equivalent

to (c) since  and S are complementary sets.

Next consider a set function which maps the class of subsets F of S to the real numbers.

Definition 2: Let P be a function from F to the real numbers R such that

(i) if A F, then P A  0 ,

(ii) if A, BFsuch that A  B, then P A B P A P B.

The function P is called a measure. In addition, if P  satisfies the additional property

(iii) P  S  1 ,

then P is called a probability measure or probability. A measure P 

which has the property

(iv) if F i A for i  1 , 2 , and   i j A A for i  j, then  

1 i 1

i

i

i

P A PA

is called a -additive measure or a -additive probability in the case with

P  a probability.

The collection  S, F,Pis called a measure space or a probability space in the case with P  a probability.

The sets in F are called measurable with F being the collection of measurable sets. In the case in which the

triple  S, F,Pis a probability space, then the triple is sometimes called an experiment, the elements of S

are called outcomes, and the sets A F are called events. The real number P Afor A Fis called the

probability of the event A.

Lemma 2: (a) If A, BFand A  B, then P A  P B,

(b) if F i A for i  1 , 2 ,, and i i

A A

 1

then   

 

1

lim

i

n i n

P A P A.

Proof: (a) Observe that B  A BAwhere B  ABAand that A   BA .

Hence, by (ii) it follows that P  B  P A P BA P Asince P B A  0 by (i).

(b) Note that    i P A for all i by (a) and (iii). Let  1

i i i B A A and

i 1

i A A. Observe that   i j B B for i  jand that B A i for all i. Since

1

1

i

i

A A B, it follows from (iv) that       

1

1

i

i

P A PA PB. However,

 1

i i i

A B A and   i i 1

B A give        1

i i i

P A PB PA by (ii), so

              

  

1

1

1

1

1

lim

n

i

i i n i

i

PA PA PB PA PA PA

      n n

P A PA PA



  lim 1

and (b) follows.

Further relations deal with dependency between two events.

Definition 3: For A, BF, the conditional probability of A given B is

P  AB  P AB P Bwhere P B  0.

Definition 4: Events A, BFare independent if P AB P A P B

in which case P A B  P A.

Random Variables, Distributions, Densities

Given a probability space  S, F,P, a random variable is simply a mapping from the set of

outcomes to the real numbers.

A random variable and its distribution have been defined in a general context with no distinction

for the range of the variable or the differentiability of the distribution.

Definition 7: If a random variableX :S Rcan take at most countably many values,

then the variable is said to be a discrete random variable.

For a discrete random variable, the distribution function is a step function with jumps at the points on which

the variable is defined. These jumps are the probabilities of the points and are given by the probability

mass function      

   i X i X i p x F x F x where the latter term indicates the limit from the left. The

alternative to a discrete variable is one that can on a continuum of values.

Defintion 8: A random variable X  with distribution function   X

F that is differentiable

except at countably or fewer points is a continuous random variable.

Definition 9: The density function of a continuout random variable X  is the derivative

of the distribution function f  x dF dx X X  at those points where the derivative exists.

Observe that the distribution function of a continuous random variable may be written as

   



x

X X

F x f  d  and that the conditional distribution yields the conditional density

f  x A dF  x A dx X X

Integration, Expectation, Moments

A characteristic function of a set A is the function  

A

A

A 

0 for

1 for

, and a simple function is

a finite linear combination of characteristic functions of measurable sets, i.e.,

    

n

k

i Ai

c

1

where the A i

are measurable sets in a probability space  S, F,P. The integral of  over a set of finite

measure A is given by

    

n

i

i i A

dP cPA A

1

and the integral of a measurable function or random variable X  is defined by

  A X A X A

X dP sup dP inf dP.

Definition 10: If A is the entire space S, then the expectation or mean of X  is

 

 

X A

E X XdP XdP X dF

S

with the latter expression indicating the Lebesgue-Stieljes integral with

respect to the distribution.

This definition may be expanded to include expressions of the form

      

 

X E gX gX dP gX dF

S

for a general class of functions having sufficiently well-behaved properties. Under fairly general conditions

the Lebesgue-Stieljes integral reduces to a Riemann-Stieljes integral and the expectation becomes

      

 

X E gX gX dP gx dF

S

Assuming that X  has a density given by the derivative of its distribution function, the expectations are

   

 

 

E X  XdF  xf xdx X X

and

      

 

E gX  gx f xdx X

where f  x X is the density of X  . Of course for a discrete random variable, the integrals become

summations and the density is replaced by a probability mass.

Definition 11: The n

th moment of a random variable X  is

 

 

 E X x f x dx X

n n

n

with the obvious changes for a discrete variable. The first moment is simply

the expectation written as . The n

th central moment of X  is the moment

about the mean

      

 

E X  x  f xdx X

n n

and the second central moment is the variance  

2

X

VarX   with square

root the standard deviation X

Sequences, Convergence, and Limits

A number of important results regarding bounds and limiting processes are very useful and require

recognition. Elaboration and proof can be found in any number of references.

Markov Inequality: If X  is a nonnegative random variable, then for any

a  0 , P  X a E X a.

Proof: Note that

         

  

 

a

a

E X xf x dx xf xdx xf xdx xf x dx X X X X 0 0

x f  x dx af  x dx a f  x dx aP X a

a a a

X X X

  

Chebyshev Inequality: If X  is a random variable with mean  and variance

2  , then for any

k  0 , ^ ^

2 2 P X  k   k.

Proof: From the preceding result

2 2 2 2 2 2

P X  k E X  k   k.

But  

2 2

X   k if and only if X   k.

Law of Large Numbers: If , , 1 2 X X is a sequence of independent and identically distributed

random variables with    i E X and variance

2  , then       

 

X X X n n n

1 2 lim with

probability 1.

Lemma 5: A sequence  

n

X that converges almost everywhere to X  converges in probability.

Proof: Since  

n

X converges almost everywhere to X  , by the remarks following the definition

 lim          1  1

1 1 1

  

k n i

n n i n

P  S X  X  P  S X  X  k.

But for   0 taking an integer k  0 with 1 k , it follows that

1 1 1 1 1

n i

n i

k n i

n i

 S X  X  k  S X  X  

and       1

1 1

^ 

n i

n i

P  S X  X  . This gives the complementary

probability

1 1

n i

n i

P  S X  X  . Observe that the sets

i 1

n i

A X X

n

 S    are a decreasing sequence of sets

n n

A A

 1 , so by

Lemma 2(b), lim         0

1 1

  

n i

n i

P A P X X

n n

 S   . Hence, since

A     X    X   

n n

S ,

lim         lim    0

  

n n

P X X P A

n n

 S   .

The final form of convergence is the weakest but deals directly with the distribution function.

Convergence in Distribution: The sequence of random variables  

n X converges in

distribution to X  if F^ ^ x^ F ^ x

X X n

n

 

lim (^) forx  R.

Another lemma shows that convergence in probability yields convergence in distribution.

Lemma 6: A sequence  

n

X that converges in probability to X  converges in distribution.

Proof: Recall the definition of the distribution function, F  x P X x

n X n

  and

F  x PX x

X   , and note that

P  X  x P X x X X    PX x X X  

n n n n n

Next observe

X  x X X    X x X  XX    Xx 

n n n n n

and X  x X X     X X 

n n n

Hence, P X^ n  x, XnX   ^ P^ Xx ^ FX^ x ,

P  X  x X X    P X X  

n n n

and F  x  PX x F  x   P X X  

X (^) n n X n

A similar argument obtains

P  X  x   PX x  X X   P Xx  X X  

n n

 P  X x P X X    F  x P X X   n X n n n

 F  x P X X   X n n

so that

F  x   P X X   F  x F  x  P X X   X n Xn X n

Since the sequence converges in probability, lim      0

 

P X X 

n n

and

          



F x F x F x X X X n n

lim

for all   0. Therefore, F  x F  x X X n n

 

lim.

Jointly Distributed Variables

Given random variables X and Y on a probability space  S, F,P, one may question the joint

behavior and connection between these two despite the existence of their individual properties.

Definition 12: The joint distribution of X and Y on the space S , F,Pis the function

F  xy P X xY y XY

Observe that the individual or marginal distributions of each variable are given by F  x F  x y XY

y

X lim ,



and F  y F  x y XY

x

Y lim ,



 , and if the distribution has second order derivatives, then the joint density

function is f  xy F  x y x y XY XY

2 from which the marginal densities may be obtained as

    

 

f x  f xy dy XY X , and     

 

f y  f xy dx XY Y ,. Recalling the definition of independent events, it

follows immediately that random variables X and Y are independent if F  x y F  x F  y XY X Y ,  , and this

yields the same expression for the densities of independent variables f  xy f  x f  y XY X Y

Definition 13: The random variables X and Y are uncorrelated if E XY  E X EY and are

called orthogonal if E  XY  0.

Note that if two random variables are independent then they are uncorrelated.

Definition 14: The covariance of two random variables is    X Y

 E X  Y 

11

Stochastic Processes

Consider a random variable that evolves in time. From one perspective this could be regarded as a

collection of random variables that are indexed by time. However, the index need not be time.

Definition 14: Given a probability space  S, F,P, a stochastic process is a collection of random

variables X  t, twith  the index set of the process. Note that the dependence on  Sis

understood, and the index is often written as a subscript t X. If the index set is finite or countable

the process is said to be discrete. and if  is an interval of real numbers, the process is said to

be continuous with the index commonly referred to as time. The range of the process is a subset

of the real numbers  and is called the state space of the process.

Since a real-valued stochastic process X  t is a collection of random variables, information about the

process is contained in the joint distributions or densities of the variables in the process. The n

th order

distribution of the process is

        n n n n F x, ,x ;t, ,t  PXt x, ,Xt x 1 1 1 1