21 The Exponential Distribution, Lecture notes of Stochastic Processes

We conclude from the induction argument that the sequence of interarrival times T1,T2,... are all independent and identically distributed Exponential(λ) random ...

Typology: Lecture notes

2022/2023

Uploaded on 03/01/2023

ekanga
ekanga 🇺🇸

4.9

(16)

263 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
21
The Exponential Distribution
From Discrete-Time to Continuous-Time:
In Chapter 6 of the text we will be considering Markov processes in con-
tinuous time. In a sense, we already have a very good understanding of
continuous-time Markov chains based on our theory for discrete-time
Markov chains. For example, one way to describe a continuous-time
Markov chain is to say that it is a discrete-time Markov chain, except
that we explicitly model the times between transitions with contin-
uous, positive-valued random variables and we explicity consider the
process at any time t, not just at transition times.
The single most important continuous distribution for building and
understanding continuous-time Markov chains is the exponential dis-
tribution, for reasons which we shall explore in this lecture.
177
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download 21 The Exponential Distribution and more Lecture notes Stochastic Processes in PDF only on Docsity!

The Exponential Distribution

From Discrete-Time to Continuous-Time:

In Chapter 6 of the text we will be considering Markov processes in con- tinuous time. In a sense, we already have a very good understanding of continuous-time Markov chains based on our theory for discrete-time Markov chains. For example, one way to describe a continuous-time Markov chain is to say that it is a discrete-time Markov chain, except that we explicitly model the times between transitions with contin- uous, positive-valued random variables and we explicity consider the process at any time t, not just at transition times.

The single most important continuous distribution for building and understanding continuous-time Markov chains is the exponential dis- tribution, for reasons which we shall explore in this lecture.

177

178 21. THE EXPONENTIAL DISTRIBUTION

The Exponential Distribution:

A continuous random variable X is said to have an Exponential(λ) distribution if it has probability density function

fX (x|λ) =

λe−λx^ for x > 0 0 for x ≤ 0

where λ > 0 is called the rate of the distribution.

In the study of continuous-time stochastic processes, the exponential distribution is usually used to model the time until something hap- pens in the process. The mean of the Exponential(λ) distribution is calculated using integration by parts as

E[X] =

0

xλe−λxdx

= λ

[

−xe−λx λ

0

λ

0

e−λxdx

]

= λ

[

λ

−e−λx λ

0

]

= λ

λ^2

λ

So one can see that as λ gets larger, the thing in the process we’re waiting for to happen tends to happen more quickly, hence we think of λ as a rate.

As an exercise, you may wish to verify that by applying integration by parts twice, the second moment of the Exponential(λ) distribution is given by

E[X^2 ] =

0

x^2 λe−λx^ =... =

λ^2

180 21. THE EXPONENTIAL DISTRIBUTION

To see how this works, imagine that at time 0 we start an alarm clock which will ring after a time X that is exponentially distributed with rate λ. Let us call X the lifetime of the clock. For any t > 0 , we have that

P (X > t) =

t

λe−λxdx = λ −e−λx λ

t

= e−λt.

Now we go away and come back at time s to discover that the alarm has not yet gone off. That is, we have observed the event {X > s}. If we let Y denote the remaining lifetime of the clock given that {X > s}, then

P (Y > t|X > s) = P (X > s + t|X > s)

P (X > s + t, X > s) P (X > s) = P (X > s + t) P (X > s)

= e−λ(s+t) e−λs = e−λt.

But this implies that the remaining lifetime after we observe the alarm has not yet gone off at time s has the same distribution as the original lifetime X. The really important thing to note, though, is that this implies that the distribution of the remaining lifetime does not depend on s. In fact, if you try setting X to have any other continuous distribution, then ask what would be the distribution of the remaining lifetime after you observe {X > s}, the distribution will depend on s.

181

This property is called the memoryless property of the exponential distribution because I don’t need to remember when I started the clock. If the distribution of the lifetime X is Exponential(λ), then if I come back to the clock at any time and observe that the clock has not yet gone off, regardless of when the clock started I can assert that the distribution of the time till it goes off, starting at the time I start observing it again, is Exponential(λ). Put another way, given that the clock has currently not yet gone off, I can forget the past and still know the distribution of the time from my current time to the time the alarm will go off. The resemblance of this property to the Markov property should not be lost on you.

It is a rather amazing, and perhaps unfortunate, fact that the exponen- tial distribution is the only one for which this works. The memoryless property is like enabling technology for the construction of continuous- time Markov chains. We will see this more clearly in Chapter 6. But the exponential distribution is even more special than just the memo- ryless property because it has a second enabling type of property.

Another Important Property of the Exponential: Let X 1 ,... , Xn be independent random variables, with Xi having an Exponential(λi) distribution. Then the distribution of min(X 1 ,... , Xn) is Exponential(λ 1 +... + λn), and the probability that the minimum is Xi is λi/(λ 1 +... + λn).

Proof:

P (min(X 1 ,... , Xn) > t) = P (X 1 > t,... , Xn > t) = P (X 1 > t)... P (Xn > t) = e−λ^1 t^... e−λnt = e−(λ^1 +...+λn)t.

183

Example: (Ross, p.332 #20). Consider a two-server system in which a customer is served first by server 1, then by server 2, and then departs. The service times at server i are exponential random variables with rates μi, i = 1, 2. When you arrive, you find server 1 free and two customers at server 2 — customer A in service and customer B waiting in line.

(a) Find PA, the probability that A is still in service when you move over to server 2. (b) Find PB, the probability that B is still in the system when you move over to 2. (c) Find E[T ], where T is the time that you spend in the system.

Solution:

(a) A will still be in service when you move to server 2 if your service at server 1 ends before A’s remaining service at server 2 ends. Now A is currently in service at server 2 when you arrive, but because of memorylessness, A’s remaining service is Exponential(μ 2 ), and you start service at server 1 that is Exponential(μ 1 ). Therefore, PA is the probability that an Exponential(μ 1 ) random variable is less than an Exponential(μ 2 ) random variable, which is

PA =

μ 1 μ 1 + μ 2

(b) B will still be in the system when you move over to server 2 if your service time is less than the sum of A’s remaining service time and B’s service time. Let us condition on the first thing to happen, either A finishes service or you finish service:

184 21. THE EXPONENTIAL DISTRIBUTION

P (B in system) = P (B in system|A finishes before you) μ 2 μ 1 + μ 2

  • P (B in system|you finish before A) μ 1 μ 1 + μ 2

Now P (B in system|you finish before A) = 1 since B will still be waiting in line when you move to server 2. On the other hand, if the first thing to happen is that A finishes service, then at that point, by memorylessness, your remaining service at server 1 is Exponential(μ 1 ), and B will still be in the system if your remaining service at server 1 is less than B’s service at server 2, and the probability of this is μ 1 /(μ 1 + μ 2 ). That is,

P (B in system|A finishes before you) =

μ 1 μ 1 + μ 2

Therefore, P (B in system) = μ 1 μ 2 (μ 1 + μ 2 )^2

μ 1 μ 1 + μ 2

(c) To compute the expected time you are in the system, we first divide up your time in the system into T = T 1 + R, where T 1 is the time until the first thing that happens, and R is the rest of the time. The time until the first thing happens is Exponential(μ 1 + μ 2 ), so that

E[T 1 ] =

μ 1 + μ 2

To compute E[R], we condition on what was the first thing to happen, either A finished service at server 2 or you finished service

186 21. THE EXPONENTIAL DISTRIBUTION

The Poisson Process: Introduction

We now begin studying our first continuous-time process – the Poisson Process. Its relative simplicity and significant practical usefulness make it a good introduction to more general continuous time processes. To- day we will look at several equivalent definitions of the Poisson Process that, each in their own way, give some insight into the structure and properties of the Poisson process.

187

189 Definition 1 of a Poisson Process:

A continuous-time stochastic process {N (t) : t ≥ 0 } is a Poisson process with rate λ > 0 if (i) N (0) = 0. (ii) It has stationary and independent increments.

(iii) The distribution of N (t) is Poisson with mean λt, i.e.,

P (N (t) = k) = (λt)k k!

e−λt^ for k = 0, 1 , 2 ,.. ..

This definition tells us some of the structure of a Poisson process immediately:

  • By stationary increments the distribution of N (t)−N (s), for s < t is the same as the distribution of N (t − s) − N (0) = N (t − s), which is a Poisson distribution with mean λ(t − s).
  • The process is nondecreasing, for N (t) − N (s) ≥ 0 with probabil- ity 1 for any s < t since N (t) − N (s) has a Poisson distribution.
  • The state space of the process is clearly S = { 0 , 1 , 2 ,.. .}. We can think of the Poisson process as counting events as it progresses: N (t) is the number of events that have occurred up to time t and at time t + s, N (t + s) − N (t) more events will have been counted, with N (t + s) − N (t) being Poisson distributed with mean λs.

For this reason the Poisson process is called a counting process. Count- ing processes are a more general class of processes of which the Pois- son process is a special case. One common modeling use of the Poisson process is to interpret N (t) as the number of arrivals of tasks/jobs/customers to a system by time t.

190 22. THE POISSON PROCESS: INTRODUCTION

Note that N (t) → ∞ as t → ∞, so that N (t) itself is by no means stationary, even though it has stationary increments. Also note that, in the customer arrival interpetation, as λ increases customers will tend to arrive faster, giving one justification for calling λ the rate of the process.

We can see where this definition comes from, and in the process try to see some more low level structure in a Poisson process, by considering a discrete-time analogue of the Poisson process, called a Bernoulli process, described as follows.

The Bernoulli Process: A Discrete-Time “Poisson Process”:

Suppose we divide up the positive half-line [0, ∞) into disjoint inter- vals, each of length h, where h is small. Thus we have the intervals [0, h), [h, 2 h), [2h, 3 h), and so on. Suppose further that each interval corresponds to an independent Bernoulli trial, such that in each inter- val, independently of every other interval, there is a successful event (such as an arrival) with probability λh. Define the Bernoulli process to be {B(t) : t = 0, h, 2 h, 3 h,.. .}, where B(t) is the number of successful trials up to time t.

The above definition of the Bernoulli process clearly corresponds to the notion of a process in which events occur randomly in time, with an intensity, or rate, that increases as λ increases, so we can think of the Poisson process in this way too, assuming the Bernoulli process is a close approximation to the Poisson process. The way we have defined it, the Bernoulli process {B(t)} clearly has stationary and independent increments. As well, B(0) = 0. Thus the Bernoulli process is a discrete-time approximation to the Poisson process with rate λ if the distribution of B(t) is approximately Poisson(λt).

192 22. THE POISSON PROCESS: INTRODUCTION

Thinking intuitively about how the Poisson process can be expected to behave can be done by thinking about the conceptually simpler Bernoulli process. For example, given that there are n events in the interval [0, t) (i.e. N (t) = n), the times of those n events should be uniformly distributed in the interval [0, t) because that is what we would expect in the Bernoulli process. This intuition is true, and we’ll prove it more carefully later.

Thinking in terms of the Bernoulli process also leads to a more low- level (in some sense better) way to define the Poisson process. This way of thinking about the Poisson process will also be useful later when we consider continuous-time Markov chains. In the Bernoulli process the probability of a success in any given interval is λh and the probability of two or more successes is 0 (that is, P (B(h) = 1) = λh and P (B(h) ≥ 2) = 0). Therefore, in the Poisson process we have the approximation that P (N (h) = 1) ≈ λh and P (N (h) ≥ 2) ≈ 0.

We write this approximation in a more precise way by saying that

P (N (n) = 1) = λh + o(h) and P (N (h) ≥ 2) = o(h).

The notation “o(h)” is called Landau’s o(h) notation, read “little o of h”, and it means any function of h that is of smaller order than h. This means that if f (h) is o(h) then f (h)/h → 0 as h → 0 (f (h) goes to 0 faster that h goes to 0). Notationally, o(h) is a very clever and useful quantity because it lets us avoid writing out long, complicated, or simply unknown expressions when the only crucial property of the expression that we care about is how fast it goes to 0. We will make extensive use of this notation in this and the next chapter, so it is worthwhile to pause and make sure you understand the properties of o(h).

193

Landau’s “Little o of h” Notation:

Note that o(h) doesn’t refer to any specific function. It denotes any quantity that goes to 0 at a faster rate than h, as h → 0 :

o(h) h

→ 0 as h → 0.

Since the sum of two such quantities retains this rate property, we get the potentially disconcerting property that

o(h) + o(h) = o(h)

as well as

o(h)o(h) = o(h) c × o(h) = o(h),

where c is any constant (note that c can be a function of other variables as long as it remains constant as h varies).

Example: The function hk^ is o(h) for any k > 1 since

hk h

= hk−^1 → 0 as h → 0.

h however is not o(h). The infinite series

k=2 ckh

k, where |ck| < 1 ,

is o(h) since

lim h→ 0

k=2 ckh k h

= lim h→ 0

∑^ ∞

k=

ckhk−^1

∑^ ∞

k=

ck lim h→ 0 hk−^1 = 0,

where taking the limit inside the summation is justified because the sum is bounded by 1 /(1 − h) for h < 1. 

195

Similarly,

P (N (h) = 1) = λhe−λh

= λh

[

1 − λh +

(λh)^2 2!

(λh)^3 3!

]

= λh − λ^2 h^2 + (λh)^3 2!

(λh)^4 3!

= λh + o(h).

Finally,

P (N (h) ≥ 2) = 1 − P (N (h) = 1) − P (N (h) = 0) = 1 − (λh + o(h)) − (1 − λh + o(h)) = −o(h) − o(h) = o(h).

Thus Definition 1 implies Definition 2. 

A third way to define the Poisson process is to define the distribution of the time between events. We will see in the next lecture that the times between events are independent and identically distributed Exponential(λ) random variables. For now we can gain some insight into this fact by once again considering the Bernoulli process.

Imagine that you start observing the Bernoulli process at some arbitrary trial, such that you don’t know how many trials have gone before and you don’t know when the last successful trial was. Still you would know that the distribution of the time until the next successful trial was h times a Geometric random variable with parameter λh. In other words, you don’t need to know anything about the past of the process to know the distribution of the time to the next success, and in fact this is the same as the distribution until the first success. That is, the distribution of the time between successes in the Bernoulli process is memoryless.

196 22. THE POISSON PROCESS: INTRODUCTION

When you pass to the limit as h → 0 you get the Poisson process with rate λ, and you should expect that you will retain this memoryless property in the limit. Indeed you do, and since the only continuous distribution on [0, ∞) with the memoryless property is the Exponential distribution, you may deduce that this is the distribution of the time between events in a Poisson process. Moreover, you should also inherit from the Bernoulli process that the times between successive events are independent and identically distributed.

As a final aside, we remark that this discussion also suggests that the Exponential distribution is a limiting form of the Geometric distribu- tion, as the probability of success λh in each trial goes to 0. This is indeed the case. As we mentioned above, the time between successful trials in the Bernoulli process is distributed as Y = hX, where X is a Geometric random variable with parameter λh. One can verify that for any t > 0 , we have P (Y > t) → e−λt^ as h → 0 :

P (Y > t) = P (hX > t) = P (X > t/h) = (1 − λh)dt/he = (1 − λh)t/h(1 − λh)dt/he−t/h

=

λt t/h

)t/h (1 − λh)dt/he−t/h

→ e−λt^ as h → 0 ,

where dt/he is the smallest integer greater than or equal to t/h. In other words, the distribution of Y converges to the Exponential(λ) distribution as h → 0.

Note that the above discussion also illustrates that the Geometric distribution is a discrete distribution with the memoryless property.