Random Variables - Stochastic Processes and Applications | MATH 697, Study notes of Mathematics

Material Type: Notes; Professor: Rey-Bellet; Class: ST-Lie Groups; Subject: Mathematics; University: University of Massachusetts - Amherst; Term: Fall 2007;

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-7bu
koofers-user-7bu 🇺🇸

8 documents

1 / 36

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stochastic Processes
and
Monte-Carlo methods
University of Massachusetts: Fall 2007
Luc Rey-Bellet
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24

Partial preview of the text

Download Random Variables - Stochastic Processes and Applications | MATH 697 and more Study notes Mathematics in PDF only on Docsity!

Stochastic Processes

and

Monte-Carlo methods

University of Massachusetts: Fall 2007

Luc Rey-Bellet

Contents

  • 1 Random Variables
    • 1.1 Review of probability
    • 1.2 Some Common Distributions
    • 1.3 Simulating Random Variables
    • 1.4 Markov, Chebyshev, and Chernov
    • 1.5 Limit Theorems
    • 1.6 Monte-Carlo methods
    • 1.7 Problems

by the multiparameter analogue of the p.d.f. For example if there is a function fX : Rd^ → [0, ∞) such that

P (X ∈ A) =

∫ · · ·

A

fX(x 1 , · · · , xd)dx 1 · · · dxd

then X is a continuous random vector with p.d.f fX. Similarly a discrete random vector X taking values i = (i 1 , · · · , id) is described by

p(i 1 , · · · , id) = P (X 1 = i 1 , · · · Xd = id).

A collection of random variables X 1 , · · · , Xd are independent if

fX(x) = fX 1 (x 1 ) · · · fXd (xd) , continuous case pX(i) = pX 1 (i 1 ) · · · pXd (id) , discrete case (1.1)

If X is a random vector and g : Rd^ → R is a function then Y = g(X) is a real random variable. The mean or expectation of a real random variable X is defined by

E[X] =

{ ∫ (^) ∞ ∑−∞^ xfX^ (x)^ dx^ if^ X^ is continuous i∈S i pX^ (i)^ if^ X^ is discrete

More generally if Y = g(X) then

E[Y ] = E[g(X)] =

{ ∫ ∑^ Rd^ g(x)fX(x)^ dx^ if^ X^ is continuous i g(i)^ px(i)^ if^ X^ is discrete

The variance of a random variable X, denoted by var(X), is given by

var(X) = E

[ (X − E[X])^2

] = E[X^2 ] − E[X]^2.

The mean of a random variable X measures the average value of X while its variance is a measure of the spread of the distribution of X. Also commonly used is the standard

deviation

√ var(X). Let X and Y be two random variables then we have

E[X + Y ] = E[X] + E[Y ].

For the variance a simple computation shows that

var(X + Y ) = var(X) + 2cov(X, Y ) + var(Y )

where cov(X, Y ) is the covariance of X and Y and is defined by

cov(X, Y ) = E [(X − E[X])(Y − E[Y ])].

In particular if X and Y are independent then E[XY ] = E[X]E[Y ] and so cov(X, Y ) = 0 and thus var(X 1 + X 2 ) = var(X 1 ) + var(X 2 ). Another important and useful object is the moment generating function (m.g.f.) of a random variable X and is given by

MX (t) = E

[ etX^

] .

Whenever we use a m.g.f we will always assume that MX (t) is finite at least in an interval around 0. Note that this is not always the case. If the moment generating function of X is known then one compute all moments of X, i.e. E[Xn] by repeated differentiation of the function MX (t) with respect to t. The nth^ derivative of Mx(t) is given by

M (^) x(n )(t) =

[ XnetX^

]

and therefore E[Xn] = M (n)(0).

In particular E[X] = M (^) X′ (0) and var(X) = M (^) X′′ (0) − (M (^) X′ (0))^2. It is often very convenient to compute the mean and variance of X using these formulas (see the examples below). An important fact is the following (its proof is not that easy!)

Theorem 1.1.1 Let X and Y be two random variables and suppose that MX (t) = MY (t) for all t ∈ (−δ, δ) then X and Y have the same distribution.

Another important property of the m.g.f is

Proposition 1.1.2 If X and Y are independent random variable then the m.g.f of X + Y satisfies MX+Y (t) = MX (t)MY (t) ,

i.e., the m.g.f of a sum of independent random variable is the product of the m.g.f.

Proof: We have

E

[ et(X+Y^ )

] = E

[ etX^ etY^ )

] = E

[ etX^

] E

[ etY^ )

] ,

since etX^ and etY^ are independent.

The moment generating function is

E

[ etX^

]

∫ (^) b

a

etx^ dx =

etb^ − eta t(b − a)

and the mean and variance are

E[X] =

b − a 2

, var(X) =

(b − a)^2 12

We write X = U [a, b] to denote this random variable.

  1. Normal Random Variable Let μ be a real number and σ be a positive number. The normal random variable with mean μ and variance σ^2 is the continuous random variable with p.d.f

f (x) =

σ

2 π

e−^

(x−μ)^2 2 σ^2.

The moment generating function is (see below for a proof)

E

[ etX^

]

σ

2 π

∫ (^) ∞

−∞

etxe−^

(x−μ)^2 2 σ^2 dx = eμt+^ σ^2 t^2 (^2). (1.2)

and the mean and variance are

E[X] = μ , var(X) = σ^2.

We write X = N (μ, σ^2 ) to denote this random variable. The standard normal random variable is the normal random variable with μ = 0 and σ = 1, i.e., N (0, 1) The normal random variable has the following property X = N (0, 1) if and only if σX + μ = N (μ, σ^2 )

To see this one applies Proposition 1.2.1 (i) and (ii) and this tells us that the density of σX + μ is (^) σ^1 f (x−σ μ). To show that the formula for the moment generating function we consider first X = N (0, 1). Then by completing the square we have

MX (t) =

2 π

∫ (^) ∞

−∞

etxe−^

x 22 dx

2 π

∫ (^) ∞

−∞

e

t^2 (^2) e−^ (x−t)^2 (^2) dx

= e

t 22 1 √ 2 π

∫ (^) ∞

−∞

e−^

(x−t)^2 (^2) dx

= e

t 22 1 √ 2 π

∫ (^) ∞

−∞

e−^

y^2 (^2) dy

= e

t 22

(1.3)

This proves the formula for N (0, 1). Since N (μ, σ^2 ) = σN (0, 1) + μ, by Proposition

1.2.1, (iii) the moment generating function of N (μ, σ^2 ) is etμe σ^22 t 2 as claimed.

  1. Exponential Random Variable Let λ be a positive number. The exponential random variable with parameter λ is the continuous random variable with p.d.f

f (x) =

{ λe−λx^ if x > 0 0 otherwise

The moment generating function is

E

[ etX^

] = λ

∫ (^) ∞

0

etxe−λx^ =

{ (^) λ λ−t if^ λ < t +∞ otherwise

and the mean and variance are

E[X] =

λ

, var(X) =

λ^2

We write X = Exp(λ) to denote this random variable. This random variable will play an important role in the construction of continuous-time Markov chains. It often has the interpretation of a waiting time until the occurrence of an event.

  1. Gamma Random Variable Let n and λ be positive numbers. The gamma random variable with parameters n and λ is the continuous random variable with p.d.f

f (x) =

{ λe−λx^ (λx)

n− 1 (n−1)! if^ x >^0 0 otherwise

The moment generating function is

E

[ etX^

] = λ

∫ (^) ∞

0

etxλe−λx^

(λx)n−^1 (n − 1)!

{ (^ λ λ−t

)n if t < λ +∞ otherwise

and the mean and variance are

E[X] =

n λ

, var(X) =

n λ^2

We write X = Gamma(n, λ) to denote this random variable. To compute the m.g.f note that for any α > 0 ∫ (^) ∞

0

e−αx^ dx =

α

and the mean and the variance are

E[X] = np , var(X) = np(1 − p).

We write X = B(n, p) to denote this random variable. The formula for the m.g.f can be obtained directly using the binomial theorem, or simply by noting that by construction B(n, p) is a sum of n independent Bernoulli random variables.

  1. Geometric Random Variable Consider an experiment which has exactly two outcomes 0 or 1 and is repeated as many times as needed until a 1 occurs. The geometric random describes the probability that the first 1 occurs at exactly the nth trial. Let p be a number with 0 ≤ p ≤ 1 and let n be a positive integer. The Geometric random variable with parameter p is the random variable with p.d.f

p(n) = (1 − p)n−^1 p , n = 1, 2 , 3 , · · ·

The moment generating function is

E

[ etX^

]

∑^ ∞

n=

etn(1 − p)n−^1 p =

{ (^) pet 1 −et(p−1) if^ e

t(p − 1) < 1 0 otherwise

The mean and the variance are

E[X] =

p

, var(X) =

1 − p p^2

We write X = Geometric(p) to denote this random variable.

  1. Poisson Random Variable Let λ be a positive number. The Poisson random vari- able with parameter λ is the discrete random variable which takes values in { 0 , 1 , 2 , · · ·} and with p.d.f

p(n) = e−λ^

λn n!

n = 0, 1 , 2 , · · ·.

The moment generating function is

E

[ etX^

]

∑^ ∞

n=

etn^

λn n!

e−λ^ = eλ(e t−1) .

The mean and the variance are

E[X] = λ , var(X) = λ.

We write X = P oisson(λ) to denote this random variable.

1.3 Simulating Random Variables

In this section we discuss a few techniques to simulate a given random variable on a computer. The first step which is built-in in any computer is the simulation of a random number, i.e., the simulation of a uniform random variable U ([0, 1]), rounded off to the nearest (^101) n. In principle this is not difficult: take ten slips of paper numbered 0, 1 , · · · , 9, place them in a hat and select successively n slips, with replacement, from the hat. The sequence of digits obtained (with a decimal point in front) is the value of a uniform random variable rounded off to the nearest (^101) n. In pre-computer times, tables of random numbers were produced in that way and still can be found. This is of course not the way a actual computer generates a random number. A computer will usually generates a random number by using a deterministic algorithm which produce a pseudo random number which ”looks like” a random number For example choose positive integers a, c and m and set

Xn+1 = (aXn + c) mod(m).

The number Xn is either 0, 1 , · · · , m − 1 and the quantity Xn/m is taken to be an approximation of a uniform random variable. One can show that for suitable a, C and m this is a good approximation. This algorithm is just one of many possibles and used in practice. The issue of actually generating a good random number is a nice, interesting, and classical problem in computer sciences. For our purpose we will simply content ourselves with assuming that there is a ”black box” in your computer which generates U ([0, 1]) in a satisfying manner. We start with a very easy example, namely simulating a discrete random variable X.

Algorithm 1.3.1 (Discrete random variable) Let X be a discrete random variable taking the values x 1 , x 2 , · · · with p.d.f. p(j) = P {X = xj }. To simulate X,

  • Generate a random number U = U ([0, 1]).
  • Set

X =

      

x 1 if U < p(1) x 2 if p(1) < U < p(1) + p(2) .. .

xn if p(1) + · · · + p(n − 1) < U < p(1) + · · · p(n) .. .

Then X has the desired distribution.

We discuss next two general methods simulating continuous random variable. The first is called the inverse transformation method and is based on the following

Algorithm 1.3.5 (Rejection method for continuous random variable). Let X be a random variable with p.d.f f (x) and let Y be a random variable with p.d.f g(x). Furthermore assume that there exists a constant C such that

f (y) g(y)

≤ C , for all y.

To simulate X

  • Step 1 Simulate Y with density g.
  • Step 2 Simulate a random number U.
  • Step 3 If U ≤

f (Y ) g(Y )C set X = Y. Otherwise return to Step 1.

That the algorithm does the job is the object of the following proposition.

Proposition 1.3.6 The random variable X generated by the rejection method has p.d.f f (x).

Proof: To obtain a value of X we will need in general to iterate the algorithm a random number of times We generate random variables Y 1 , · · · , YN until YN is accepted and then set X = YN. We need to verify that the p.d.f of X is actually f (x). Then we have

P (X ≤ x) = P (YN ≤ x)

= P

( Y ≤ x | U ≤

f (Y ) Cg(Y )

)

P

( Y ≤ x , U ≤ (^) Cgf^ (Y(Y^ ) )

)

P

( U ≤ (^) Cgf^ ((YY^ ) )

)

∫ (^) ∞ −∞ P^

( Y ≤ x , U ≤ (^) Cgf^ ((YY^ ) ) | Y = y

) g(y) dy P

( U ≤ (^) Cgf^ (Y(Y^ ) )

)

∫ (^) x −∞ P^

( U ≤ (^) Cgf^ (y(y))

) g(y) dy P

( U ≤ (^) Cgf^ (Y(Y^ ) )

)

∫ (^) x −∞

f (y) Cg(y) g(y)^ dy P

( U ≤ (^) Cgf^ ((YY^ ) )

)

∫ (^) x −∞ f^ (y)^ dy CP

( U ≤ (^) Cgf^ (Y(Y^ ) )

If we let x → ∞ we obtain that CP

( U ≤ (^) Cgf^ ((YY^ ) )

) = 1 and thus

P (X ≤ x) =

∫ (^) x

−∞

f (x) dx.

and this shows that X has p.d.f f (x).

In order to decide whether this method is efficient of not, we need to ensure that rejections occur with small probability. The above proof shows that at each iteration the probability that the results is accepted is

P

( U ≤

f (Y ) Cg(Y )

)

C

independently of the other iterations. Therefore the number of iterations needed is Geom( (^) C^1 ) with mean C. Therefore the ability to choose a reasonably small C will ensure that the method is efficient.

Example 1.3.7 Let X be the random variable with p.d.f

f (x) = 20(1 − x)^3 , 0 < x < 1.

Since the p.d.f. is concentrated on [0, 1] let us take

g(x) = 1 0 < x < 1.

To determine C such that f (x)/g(x) ≤ C we need to maximize the function h(x) ≡ f (x)/g(x) = 20x(1 − x)^3. Differentiating gives h′(x) = 20 ((1 − x)^3 − 3 x(1 − x)^2 ) and thus the maximum is attained at x = 1/4. Thus

f (x) g(x)

) 3

≡ C.

We obtain f (x) Cg(x)

x(1 − x)^3

and the rejection method is

  • Step 1 Generate random numbers U 1 and U 2.
  • Step 2 If U 2 ≤ 25627 U 1 (1 − U 1 )^3 , stop and set X = U 1. Otherwise return to step 1.

The average number of accepted iterations is 135/64.

Algorithm 1.3.10 (Geometric random variable)

  • Step 1 Generate a random number U.
  • Step 2 Set X = d (^) log(1log(U−^ )p) e

Then X = Geom(p).

Example 1.3.11 (Simulating the Gamma random variable)Using the fact that Gamma(n, λ) is a sum of n independent Exp(λ) one immediately obtain

Algorithm 1.3.12 (Gamma random variable)

  • Step 1 Generate n random number U 1 , · · · , Un.
  • Step 2 Set Xi = − (^1) λ log(Ui)
  • Step 3 Set X = X 1 + · · · + Xn.

Then X = Gamma(n, p).

Finally we give an elegant algorithm which generates 2 independent normal random variables.

Example 1.3.13 (Simulating a normal random variable: Box-M¨uller)We show a simple way to generate 2 independent standard normal random variables X and Y. The joint p.d.f. of X and Y is given by

f (x, y) =

2 π

e−^

(x^2 +y^2 ) (^2).

Let us change into polar coordinates (r, θ) with r^2 = x^2 + y^2 and tan(θ) = y/x. The change of variables formula gives

f (x, y) dxdy = re−^

r 22 dr

2 θ

dθ.

Consider further the change of variables set s = r^2 so that

f (x, y) dxdy =

e−^ 2 s ds

2 θ

dθ.

The right-hand side is iasily seen to be the joint p.d.f of the two independent random variables S = Exp(1/2) and Θ = U ([0, 2 π]). Therefore we obtain

Algorithm 1.3.14 (Standard normal random variable)

  • Step 1 Generate two random number U 1 and U 2
  • Step 2 Set

X =

√ −2 log(U 1 ) cos(2πU 2 )

Y =

√ −2 log(U 1 ) sin(2πU 2 ) (1.4) (1.5)

Then X and Y are 2 independent N (0, 1).

1.4 Markov, Chebyshev, and Chernov

We start by deriving simple techniques for bounding the tail distribution of a random variable, i.e., bounding the probability that the random variable takes value far from the its mean. Our first inequality, called Markov’s inequality simply assumes that we know the mean of X.

Proposition 1.4.1 (Markov’s Inequality) Let X be a random variable which as- sumes only nonnegative values, i.e. P (X ≥ 0) = 1. Then for any a > 0 we have

P (X ≥ a) ≤

E[X]

a

Proof: For a > 0 let us define the random variable

Ia =

{ 1 if X ≥ a 0 otherwise

Note that, since X ≥ 0 we have

Ia ≤

X

a

and that since Ia is a binomial random variable

E[Ia] = P (X ≥ a).

Taking expectations in the inequality (1.6) gives

P (X ≥ a) = E[Ia] ≤ E

[ X

a

]

E[X]

a

This is significantly better that the bound provided by Markov’s inequality! Note also that we can do a bit better by noting that the distribution of Sn is symmetric around its mean and thus we can replace 4/n by 2/n.

We can better if we know all moments of the random variable X, for example if we know the moment generating function MX (t) of the random variable X. We have

Proposition 1.4.5 (Chernov’s bounds) Let X be a random variable with moment generating function MX (t) = E[etX^ ].

  • For any a and any t > 0 we have

P (X ≥ a) ≤ min t≥ 0

E[etX^ ] eta^

  • For any a and any t < 0 we have

P (X ≤ a) ≤ min t< 0

E[etX^ ] eta^

Proof: This follows from Markov inequality. For t > 0 we have

P (X ≥ a) = P (etX^ > eta) ≤

E[etX^ ] eta^

Since t > 0 is arbitrary we obtain

P (X ≥ a) ≤ min t≥ 0

E[etX^ ] eta^

Similarly for t < 0 we have

P (X ≤ a) = P (etX^ > eta) ≤

E[etX^ ] eta^

and thus

P (X ≥ a) ≤ min t≤ 0

E[etX^ ] eta^

Let us consider again our flipping coin examples

Example 1.4.6 (Flipping coins, cont’d) Since Sn is a binomial B(n, 12 ) random variable its moment generating function is given by MSn (t) = (^12 + 12 et)n. To estimate P (Sn ≥ 3 n/4) we apply Chernov bound with t > 0 and obtain

P

( Sn ≥

3 n 4

) ≤

(^12 + 12 et)n e

3 nt 4 =

( 1 2

e−^

3 t (^4) +^1 2

e

t 4

)n .

To find the optimal bound we minimize the function f (t) = 12 e−^ 34 t

  • 12 e t 4

. The mimimum is at t = log 3 and

f (log(3)) =

(e−^

(^34) log(3)

  • e

(^14) log(3) ) =

e

(^14) log(3) (e−^ log 3^ + 1) =

(^14) ' 0. 877

and thus we obtain

P

( Sn ≥

3 n 4

) ≤ 0. 877 n^.

This is course much better than 2/n. For n = 100 Chebyshev inequality tells us that the probability to obtain 75 heads is not bigger than 0.02 while the Chernov bounds tells us that it is actually not greater than 2. 09 × 10 −^6.

1.5 Limit Theorems

In this section we study the behavior, for large n of a sum of independent identically distributed variables. That is let X 1 , X 2 , · · · be a sequence of independent random variables where all Xi’s have the same distribution. Then we denote by Sn the sum

Sn = X 1 + · · · + Xn.

Under suitable conditions Sn will exhibit a universal behavior which does not depend on all the details of the distribution of the Xi’s but only on a few of its charcteristics, like the mean or the variance. The first result is the weak law of large numbers. It tells us that if we perform a large number of independent trials the average value of our trials is close to the mean with probability close to 1. The proof is not very difficult, but it is a very important result!

Theorem 1.5.1 (The weak Law of Large Numbers) Let X 1 , X 2 , · · · be a sequence of independent identically distributed random variables with mean μ and variance σ^2. Let Sn = X 1 + · · · + Xn

Then for any  > 0

nlim→∞ P

(∣∣ ∣∣^ Sn n

− μ

∣∣ ∣∣ ≥ 

) = 0.

Proof: : By the linearity of expectation we have

E

[ Sn n

]

n

E[X 1 + · · · + Xn] =

nμ n

= μ.