Stochastic Processes and Simulation - Lecture Notes | M 362K, Study notes of Probability and Statistics

Material Type: Notes; Class: PROBABILITY I; Subject: Mathematics; University: University of Texas - Austin; Term: Spring 2008;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-itm
koofers-user-itm 🇺🇸

9 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 3: STOCHASTIC PROCESSES AND SIMULATION 1of 7
Course: M362K Stochastic Processes I
Term: Spring 2008
Instructor: Gordan Zitkovic
Lecture 3
STOCHASTIC PROCESSES AND SIMULATION
3.1. Stochastic Processes
Definition 3.1. Let Tbe a subset of [0,). A family of random variables {Xt}t∈T , indexed by T, is called
astochastic (or random) process. When T=N(or T=N0), {Xt}t∈T is said to be a discrete-time
process, and when T= [0,), it is called a continuous-time process.
When Tis a singleton (say T={1}), the process {Xt}t∈T X1is really just a single random variable.
When Tis finite (e.g., T={1,2, . . . , n}), we get a random vector. Therefore, stochastic processes are
generalizations of random vectors. The interpretation is, however, somewhat different. While the compo-
nents of a random vector usually (not always) stand for different spatial coordinates, the index tTis
more often than not interpreted as time. Stochastic processes usually model the evolution of a random
system in time. When T= [0,)(continuous-time processes), the value of the process can change every
instant. When T=N(discrete-time processes), the changes occur discretely.
In contrast to the case of random vectors or random variables, it is not easy to define a notion of a
density (or a probability mass function) for a stochastic process. Without going into details why exactly
this is a problem, let me just mention that the main culprit is the infinity. One usually considers a family of
(discrete, continuous, etc.) finite-dimensional distributions, i.e., the joint distributions of random vectors
(Xt1, Xt2, . . . , Xtn),
for all nNand all choices t1, . . . , tn T .
The notion of a stochastic processes is very important both in mathematical theory and its applications
in science, engineering, economics, etc. It is used to model a large number of various phenomena where
the quantity of interest varies discretely or continuously through time in a non-predictable fashion.
Every stochastic process can be viewed as a function of two variables - tand ω. For each fixed t,
ω7→ Xt(ω)is a random variable, as postulated in the definition. However, if we change our point of view
and keep ωfixed, we see that the stochastic process is a function mapping ωto the real-valued function
t7→ Xt(ω). These functions are called the trajectories of the stochastic process X.
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
5
10
15
20
-6
-4
-2
2
4
6
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
5
10
15
20
-6
-4
-2
2
4
6
Figures on the left show two different
trajectories of a simple random walka,
i.e., each one corresponds to a (different)
frozen ω, but tgoes from 0to 30.
aWe will define the simple random walk later. For
now, let us just say that is behaves as follows. It starts
at x= 0 for t= 0. After that a fair coin is tossed and
we move up (to x= 1) if heads is observed and down
to x=1is we see tails. The procedure is repeated
at t= 1,2,... and the position at t+1 is determined
in the same way, independently of all the coin tosses
before (note that the position at t=kcan be any of
the following x=k,x=k+ 2, ..., x=k2,
x=k).
Last Updated: September 8, 2008
pf3
pf4
pf5

Partial preview of the text

Download Stochastic Processes and Simulation - Lecture Notes | M 362K and more Study notes Probability and Statistics in PDF only on Docsity!

Course: M362K Stochastic Processes I Term: Spring 2008 Instructor: Gordan Zitkovic

Lecture 3

Stochastic processes and simulation

3.1. Stochastic Processes

Definition 3.1. Let T be a subset of [0, ∞). A family of random variables {Xt}t∈T , indexed by T , is called a stochastic (or random) process. When T = N (or T = N 0 ), {Xt}t∈T is said to be a discrete-time process, and when T = [0, ∞), it is called a continuous-time process.

When T is a singleton (say T = { 1 }), the process {Xt}t∈T ≡ X 1 is really just a single random variable. When T is finite (e.g., T = { 1 , 2 ,... , n}), we get a random vector. Therefore, stochastic processes are generalizations of random vectors. The interpretation is, however, somewhat different. While the compo- nents of a random vector usually (not always) stand for different spatial coordinates, the index t ∈ T is more often than not interpreted as time. Stochastic processes usually model the evolution of a random system in time. When T = [0, ∞) (continuous-time processes), the value of the process can change every instant. When T = N (discrete-time processes), the changes occur discretely. In contrast to the case of random vectors or random variables, it is not easy to define a notion of a density (or a probability mass function) for a stochastic process. Without going into details why exactly this is a problem, let me just mention that the main culprit is the infinity. One usually considers a family of (discrete, continuous, etc.) finite-dimensional distributions, i.e., the joint distributions of random vectors

(Xt 1 , Xt 2 ,... , Xtn ),

for all n ∈ N and all choices t 1 ,... , tn ∈ T. The notion of a stochastic processes is very important both in mathematical theory and its applications in science, engineering, economics, etc. It is used to model a large number of various phenomena where the quantity of interest varies discretely or continuously through time in a non-predictable fashion.

Every stochastic process can be viewed as a function of two variables - t and ω. For each fixed t, ω 7 → Xt(ω) is a random variable, as postulated in the definition. However, if we change our point of view and keep ω fixed, we see that the stochastic process is a function mapping ω to the real-valued function t 7 → Xt(ω). These functions are called the trajectories of the stochastic process X.

Ê

Ê Ê

Ê

Ê Ê

Ê Ê Ê Ê

Ê

Ê Ê Ê

Ê Ê

Ê

Ê

Ê

Ê

Ê

5 10 15 20

  • 6
  • 4
  • 2

2

4

6

Ê Ê

Ê Ê

Ê

Ê

Ê

Ê

Ê

Ê Ê Ê Ê

Ê Ê Ê Ê Ê Ê

Ê 5 10 15 Ê 20

  • 6
  • 4
  • 2

2

4

6

Figures on the left show two different trajectories of a simple random walk a , i.e., each one corresponds to a (different) frozen ω ∈ Ω, but t goes from 0 to 30. a We will define the simple random walk later. For now, let us just say that is behaves as follows. It starts at x = 0 for t = 0. After that a fair coin is tossed and we move up (to x = 1) if heads is observed and down to x = − 1 is we see tails. The procedure is repeated at t = 1, 2 ,... and the position at t + 1 is determined in the same way, independently of all the coin tosses before (note that the position at t = k can be any of the following x = −k, x = −k + 2,... , x = k − 2 , x = k).

Unlike with the figures above, the two pictures on the right show two time- slices of the same random process; in each graph, the time t is fixed (t = 15 vs. t = 25) but the various values random variables X 15 and X 25 can take are presented through the probability mass functions.

0.00- 20 - 10 0 10 20

0.00- 20 - 10 0 10 20

3.2. The canonical probability space

When one deals with infinite-index (#T = +∞) stochastic processes, the construction of the probability space (Ω, F, P) to support a given model is usually quite a technical matter. This course does not suffer from that problem because all our models can be implemented on a special probability space. We start with the sample-space Ω: Ω = [0, 1] × [0, 1] × · · · = [0, 1]∞,

and any generic element of Ω will be a sequence ω = (ω 0 , ω 1 , ω 2 ,... ) of real numbers in [0, 1]. For n ∈ N 0 we define the mapping γn : Ω → [0, 1] which simply chooses the n-th coordinate :

γn(ω) = ωn.

The proof of the following theorem can be found in advanced probability books:

Theorem 3.2. There exists a σ -algebra F and a probability P on Ω such that

(1) each γn , n ∈ N 0 is a random variable with the uniform distribution on [0, 1] , and (2) the sequence {γn}n∈N 0 is independent.

Remark 3.3_._ One should think of the sample space Ω as a source of all the randomness in the system: the elementary event ω ∈ Ω is chosen by a process beyond out control and the exact value of ω is assumed to be unknown. All the other parts of the system are possibly complicated, but deterministic, functions of ω (random variables). When a coin is tossed, only a single drop of randomness is needed - the outcome of a coin-toss. When several coins are tossed, more randomness is involved and the sample space must be bigger. When a system involves an infinite number of random variables (like a stochastic process with infinite T ), a large sample space Ω is needed.

3.3. Constructing the Random Walk

Let us show how to construct the simple random walk on the canonical probability space (Ω, F, P) from Theorem 3.2. First of all, we need a definition of the simple random walk:

Definition 3.4. A stochastic process {Xn}n∈N 0 is called a simple random walk if

(1) X 0 = 0, (2) the increment Xn+1 − Xn is independent of (X 0 , X 1 ,... , Xn) for each n ∈ N 0 , and (3) the increment Xn+1 − Xn has the coin-toss distribution, i.e. P[Xn+1 − Xn = 1] = P[Xn+1 − Xn = −1] = 12. For the sequence {γn}n∈N, given by Theorem 3.2, define the following, new, sequence {ξn}n∈N of random variables:

ξn =

1 , γn ≥ (^12) − 1 , otherwise.

Such an implementation of rand will, however, hardly qualify for an RNG since the values it spits out come in a predictable order. We should, therefore, require any candidate for a random number generator to produce a sequence of numbers which is as unpredictable as possible. This is, admittedly, a hard task for a computer having only deterministic functions in its arsenal, and that is why the random generator design is such a difficult field. The state of the affairs is that we speak of good or less good random number generators, based on some statistical properties of the produced sequences of numbers. One of the most important requirements is that our RNG produce uniformly distributed numbers in [0, 1] - namely - the sequence of numbers produced by rand will have to cover the interval [0, 1] evenly, and, in the long run, the number of random numbers in each subinterval [a, b] of [0, 1] should be proportional to the length of the interval b − a. This requirement if hardly enough, because the sequence

0 , 0. 1 , 0. 2 ,... , 0. 8 , 0. 9 , 1 , 0. 05 , 0. 15 , 0. 25 ,... , 0. 85 , 0. 95 , 0. 025 , 0. 075 , 0. 125 , 0. 175 ,...

will do the trick while being perfectly predictable. To remedy the inadequacy of the RNGs satisfying only the requirement of uniform distribution, we might require rand to have the property that the pairs of produced numbers cover the square [0, 1] × [0, 1] uniformly. That means that, in the long run, the proportion of pairs falling in a patch A of the square [0, 1] × [0, 1] will be proportional to its area. Of course, one could continue with such requirements and ask for triples, quadruples,... of random numbers to be uniform in [0, 1]^3 , [0, 1]^4.... The highest dimension n such that the RNG produces uniformly distributed numbers in [0, 1]n^ is called the order of the RNG. A widely-used RNG called the Mersenne Twister , has the order of 623. Another problem with RNGs is that the numbers produced will start to repeat after a while (this is a fact of life and finiteness of your computer’s memory). The number of calls it takes for a RNG to start repeating its output is called the period of a RNG. You might have wondered how is it that an RNG produces a different number each time it is called, since, after all, it is only a function written in some programming language. Most often, RNGs use a hidden variable called the random seed which stores the last output of rand and is used as an (invisible) input to the function rand the next time it is called. If we use the same seed twice, the RNG will produce the same number, and so the period of the RNG is limited by the number of possible seeds. It is worth remarking that the actual random number generators usually produce a “random” integer between 0 and some large number RAND_MAX, and report the result normalized (divided) by RAND_MAX to get a number in [0, 1).

3.4.2. Simulation of Random Variables. Having found a random number generator good enough for our purposes (the one used by Mathematica is just fine), we might want to use it to simulate random variables with distributions different from the uniform on [0, 1] (coin-tosses, normal, exponential,... ). This is almost always achieved through transformations of the output of a RNG, and we will present several methods for dealing with this problem. A typical procedure (see the Box-Muller method below for an exception) works as follows: a real (deterministic) function f : [0, 1] → R - called the transformation function - is applied to rand. The result is a random variable whose distribution depends on the choice of f. Note that the transformation function is by no means unique. In fact, γ ∼ U [0, 1], then f (γ) and fˆ (γ), where f^ ˆ (x) = f (1 − x), have the same distribution (why?). What follows is a list of procedures commonly used to simulate popular random variables: (1) Discrete Random Variables Let X have a discrete distribution given by

X ∼

x 1 x 2... xn p 1 p 2... pn

For discrete distributions taking an infinite number of values we can always truncate at a very large n and approximate it with a distribution similar to the one of X. We know that the probabilities p 1 , p 2 ,... , pn add-up to 1, so we define the numbers 0 = q 0 < q 1 < · · · < qn = 1 by q 0 = 0, q 1 = p 1 , q 2 = p 1 + p 2 ,... qn = p 1 + p 2 + · · · + pn = 1.

To simulate our discrete random variable X, we call rand and then return x 1 if 0 ≤ rand < q 1 , return x 2 if q 1 ≤ rand < q 2 , and so on. It is quite obvious that this procedure indeed simulates a random variable X. The transformation function f is in this case given by

f (x) =

x 1 , 0 ≤ x < q 1 x 2 , q 1 ≤ x < q 2

... xn, qn− 1 ≤ x ≤ 1

(2) The Method of Inverse Functions The basic observation in this method is that , for any continuous random variable X with the distribution function FX , the random variable Y = FX (X) is uniformly distributed on [0, 1]. By inverting the distribution function FX and applying it to Y , we recover X. Therefore, if we wish to simulate a random variable with an invertible distribution function F , we first simulate a uniform random variable on [0, 1] (using rand) and then apply the function F −^1 to the result. In other words, use f = F −^1 as the transformation function. Of course, this method fails if we cannot write F −^1 in closed form. Example 3.6. (Exponential Distribution) Let us apply the method of inverse functions to the simulation of an exponentially distributed random variable X with parameter λ. Remember that the density fX of X is given by fX (x) = λ exp(−λx), x > 0 , and so FX (x) = 1 − exp(−λx), x > 0 , and so F (^) X− 1 (y) = − (^1) λ log(1−y). Since, 1 −rand has the same U[0, 1]-distribution as rand, we conclude that f (x) = − (^1) λ log(x) works as a transformation function in this case, i.e., that

log( rand ) λ has the required Exp(λ)-distribution. Example 3.7. (Cauchy Distribution) The Cauchy distribution is defined through its density func- tion fX (x) =

π

(1 + x^2 )

The distribution function FX can be determined explicitly in this example:

FX (x) =

π

∫ (^) x

−∞

(1 + x^2 )

dx =

π

( (^) π 2

  • arctan(x)

, and so F (^) X− 1 (y) = tan

π

y −

yielding that f (x) = tan(π(x − 12 )) is a transformation function for the Cauchy random variable, i.e., tan(π(rand − 0 .5)) will simulate a Cauchy random variable for you.

(3) The Box-Muller method This method is useful for simulating normal random variables, since for them the method of inverse function fails (there is no closed-form expression for the distribution function of a standard normal). Note that this method does not fall under that category of transfor- mation function methods as described above. You will see, though, that it is very similar in spirit. It is based on a clever trick, but the complete proof is a bit technical, so we omit it. Proposition 3.8. Let γ 1 and γ 2 be independent U [0, 1] -distributed random variables. Then the random variables X 1 =

−2 log(γ 1 ) cos(2πγ 2 ), X 2 =

−2 log(γ 1 ) sin(2πγ 2 ) are independent and standard normal (N(0,1)).

Theorem 3.10. (Law of Large Numbers) Let X 1 , X 2 ,... be a sequence of identically distributed random variables, and let g : R → R be function such that μ = E[g(X 1 )] (= E[g(X 2 )] =... ) exists. Then

g(X 1 ) + g(X 2 ) + · · · + g(Xn) n

→ μ =

−∞

g(x)fX 1 (x) dx, as n → ∞.

The key idea of Monte Carlo integration is the following Suppose that the quantity y we are interested in can be written as y =

−∞ g(x)fX^ (x)^ dx for some random variable X with density fX and come function g , and that x 1 , x 2 ,... are random numbers distributed according to the distribution with density fX_. Then the average_ 1 n

(g(x 1 ) + g(x 2 ) + · · · + g(xn)), will approximate y_._ It can be shown that the accuracy of the approximation behaves like 1 /

n, so that you have to quadruple the number of simulations if you want to double the precision of you approximation.

Example 3.11.

(1) (numerical integration) Let g be a function on [0, 1]. To approximate the integral

0 g(x)^ dx^ we can take a sequence of n (U[0,1]) random numbers x 1 , x 2 ,... , ∫ (^1)

0

g(x) dx ≈

g(x 1 ) + g(x 2 ) + · · · + g(xn) n

because the density of X ∼ U [0, 1] is given by

fX (x) =

1 , 0 ≤ x ≤ 1 0 , otherwise

(2) (estimating probabilities) Let Y be a random variable with the density function fY. If we are interested in the probability P[Y ∈ [a, b]] for some a < b, we simulate n draws y 1 , y 2 ,... , yn from the distribution FY and the required approximation is

P[Y ∈ [a, b]] ≈

number of yn’s falling in the interval [a, b] n

One of the nicest things about the Monte-Carlo method is that even if the density of the random variable is not available, but you can simulate draws from it, you can still preform the calculation above and get the desired approximation. Of course, everything works in the same way for probabilities involving random vectors in any number of dimensions. (3) (approximating π) We can devise a simple procedure for approximating π ≈ 3. 141592 by using the Monte-Carlo method. All we have to do is remember that π is the area of the unit disk. Therefore, π/ 4 equals to the portion of the area of the unit disk lying in the positive quadrant, and we can write π 4

0

0

g(x, y) dxdy,

where

g(x, y) =

1 , x^2 + y^2 ≤ 1 0 , otherwise. So, simulate n pairs (xi, yi), i = 1... n of uniformly distributed random numbers and count how many of them fall in the upper quarter of the unit circle, i.e. how many satisfy x^2 i + y^2 i ≤ 1 , and divide by n. Multiply your result by 4 , and you should be close to π. How close? Well, that is another story... Experiment!