Continuous Time Markov Chains, Lecture notes of Statistics

Construction and Basic Definitions of this lecture with examples

Typology: Lecture notes

2020/2021

Uploaded on 05/24/2021

ilyastrab
ilyastrab 🇺🇸

4.4

(52)

379 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 6
Continuous Time Markov Chains
In Chapter 3, we considered stochastic processes that were discrete in both time and
space, and that satisfied the Markov property: the behavior of the future of the
process only depends upon the current state and not any of the rest of the past. Here
we generalize such models by allowing for time to be continuous. As before, we will
always take our state space to be either finite or countably infinite.
AgoodmentalimagetohavewhenfirstencounteringcontinuoustimeMarkov
chains is simply a discrete time Markov chain in which transitions can happen at any
time. We will see in the next section that this image is a very good one, and that the
Markov property will imply that the jump times, as opposed to simply being integers
as in the discrete time setting, will be exponentially distributed.
6.1 Construction and Basic Definitions
We wi s h t o c o nstruc t a c o ntin u ous time p r o c e s s o n some cou n table st a t e s pace S
that satisfies the Markov property. That is, letting FX(s)denote all the information
pertaining to the history of Xup to time s, and letting jSand st,wewant
P{X(t)=j|F
X(s)}=P{X(t)=j|X(s)}.(6.1)
We al s o want t h e p ro c ess to be time- h o m ogeneo u s s o t hat
P{X(t)=j|X(s)}=P{X(ts)=j|X(0)}.(6.2)
We will call any process satisfying (6.1) and (6.2) a time-homogeneous continuous
time Markov chain, though a more useful equivalent definition in terms of transition
rates will be given in Definition 6.1.3 below. Property (6.1) should be compared with
the discrete time analog (3.3). As we did for the Poisson process, which we shall see
is the simplest (and most important) continuous time Markov chain, we will attempt
to understand such processes in more than one way.
Before proceeding, we make the technical assumption that the processes under
consideration are right-continuous. This implies that if a transition occurs “at time
t”, then we take X(t) to be the new state and note that X(t)=X(t).
146
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Continuous Time Markov Chains and more Lecture notes Statistics in PDF only on Docsity!

Chapter 6

Continuous Time Markov Chains

In Chapter 3, we considered stochastic processes that were discrete in both time and space, and that satisfied the Markov property: the behavior of the future of the process only depends upon the current state and not any of the rest of the past. Here we generalize such models by allowing for time to be continuous. As before, we will always take our state space to be either finite or countably infinite. A good mental image to have when first encountering continuous time Markov chains is simply a discrete time Markov chain in which transitions can happen at any time. We will see in the next section that this image is a very good one, and that the Markov property will imply that the jump times, as opposed to simply being integers as in the discrete time setting, will be exponentially distributed.

6.1 Construction and Basic Definitions

We wish to construct a continuous time process on some countable state space S that satisfies the Markov property. That is, letting FX(s) denote all the information pertaining to the history of X up to time s, and letting j ∈ S and s ≤ t, we want

P {X(t) = j | F (^) X(s) } = P {X(t) = j | X(s)}. (6.1)

We also want the process to be time-homogeneous so that

P {X(t) = j | X(s)} = P {X(t − s) = j | X(0)}. (6.2)

We will call any process satisfying (6.1) and (6.2) a time-homogeneous continuous time Markov chain, though a more useful equivalent definition in terms of transition rates will be given in Definition 6.1.3 below. Property (6.1) should be compared with the discrete time analog (3.3). As we did for the Poisson process, which we shall see is the simplest (and most important) continuous time Markov chain, we will attempt to understand such processes in more than one way. Before proceeding, we make the technical assumption that the processes under consideration are right-continuous. This implies that if a transition occurs “at time t”, then we take X(t) to be the new state and note that X(t) ￿= X(t−).

Example 6.1.1. Consider a two state continuous time Markov chain. We denote the states by 1 and 2, and assume there can only be transitions between the two states (i.e. we do not allow 1 → 1). Graphically, we have

1 ￿ 2.

Note that if we were to model the dynamics via a discrete time Markov chain, the tansition matrix would simply be

P =

and the dynamics are quite trivial: the process begins in state 1 or 2, depending upon the initial distribution, and then deterministically transitions between the two states. At this point, we do not know how to understand the dynamics in the continuous time setting. All we know is that the distribution of the process should only depend upon the current state, and not the history. This does not yet tell us when the firings will occur. ￿

Motivated by Example 6.1.1, our first question pertaining to continuous time Markov chains, and one whose answer will eventually lead to a general construc- tion/simulation method, is: how long will this process remain in a given state, say x ∈ S? Explicitly, suppose X(0) = x and let T (^) x denote the time we transition away from state x. To find the distribution of T (^) x , we let s, t ≥ 0 and consider

P {T (^) x > s + t | T (^) x > s} = P {X(r) = x for r ∈ [0, s + t] | X(r) = x for r ∈ [0, s]} = P {X(r) = x for r ∈ [s, s + t] | X(r) = x for r ∈ [0, s]} = P {X(r) = x for r ∈ [s, s + t] | X(s) = x} (Markov property) = P {X(r) = x for r ∈ [0, t] | X(0) = x} (time homogeneity) = P {T (^) x > t}.

Therefore, T (^) x satisfies the loss of memory property, and is therefore exponentially distributed (since the exponential random variable is the only continuous random variable with this property). We denote the parameter of the exponential holding time for state x as λ(x). We make the useful observation that

ET (^) x =

λ(x)

Thus, the higher the rate λ(x), representing the rate out of state x, the smaller the expected time for the transition to occur, which is intuitively pleasing.

Example 6.1.2. We return to Example 6.1.1, though now we assume the rate from state 1 to state 2 is λ(1) > 0, and the rate from state 2 to state 1 is λ(2) > 0. We

as h → 0, where the o(h) in the first equality represents the probability of seeing two or more jumps (each with an exponential distribution) in the time window [0, h]. Therefore, λ(x, y) yields the local rate, or intensity, of transitioning from state x to state y. It is worth explicitly pointing out that for x ∈ S ￿

y￿ =x

λ(x, y) =

y￿ =x

λ(x)p (^) xy = λ(x).

Note that we also have

P {X(h) = x | X(0) = x} = 1 −

y￿ =x

P {X(h) = y | X(0) = x}

y￿ =x

λ(x, y)h + o(h)

= 1 − λ(x)h

y￿ =x

p (^) xy + o(h)

= 1 − λ(x)h + o(h).

Similarly to our consideration of the Poisson process, it can be argued that any time homogeneous process satisfying the local conditions (6.3) and (6.4) also satisfies the Markov property (6.1). This is not surprising as the conditions (6.3)-(6.4) only make use of the current state of the system and ignore the entire past. This leads to a formal definition of a continuous time Markov chain that incorporates all the relevant parameters of the model and is probably the most common definition in the literature.

Definition 6.1.3. A time-homogeneous continuous time Markov chain with transi- tion rates λ(x, y) is a stochastic process X(t) taking values in a finite or countably infinite state space S satisfying

P {X(t + h) = x | X(t) = x} = 1 − λ(x)h + o(h) P {X(t + h) = y | X(t) = x} = λ(x, y)h + o(h),

where y ￿= x, and λ(x) =

y ￿=x λ(x, y). When only the local rates λ(x, y) are given in the construction of the chain, then it is important to recognize that the transition probabilities of the chain can be recovered via the identity

p (^) xy =

λ(x, y) λ(x)

λ(x, y) ￿ y ￿=x λ(x, y)

Example 6.1.4. Let N be a Poisson process with intensity λ > 0. As N satisfies

P {N (t + h) = j + 1 | N (t) = j} = λh + o(h) P {N (t + h) = j | N (t) = j} = 1 − λh + o(h),

we see that it is a continuous time Markov chain. Note also that any Poisson process is the continuous time version of the deterministically monotone chain from Chapter

  1. ￿

Example 6.1.5. Consider again the three state Markov chain

λ(1,2) ￿ λ(2,1)

λ(2,3) ￿ λ(3,2)

where the local transition rates have been placed next to their respective arrows. Note that the holding time in state two is an exponential random variable with a parameter of λ(2) def = λ(2, 1) + λ(2, 3),

and the probability that the chain enters state 1 after leaving state 2 is

p (^21) def =

λ(2, 1) λ(2, 1) + λ(2, 3)

whereas the probability that the chain enters state 3 after leaving state 2 is

p (^23) def =

λ(2, 3) λ(2, 1) + λ(2, 3)

This chain could then be simulated by sequentially computing holding times and transitions. ￿

An algorithmic construction of a general continuous time Markov chain should now be apparent, and will involve two building blocks. The first will be a stream of unit exponential random variables used to construct our holding times, and the second will be a discrete time Markov chain, denoted X (^) n , with transition probabilities p (^) xy that will be used to determine the sequence of states. Note that for this discrete time chain we necessarily have that p (^) xx = 0 for each x. We also explicitly note that the discrete time chain, X (^) n , is different than the continuous time Markov chain, X(t), and the reader should be certain to clarify this distinction. The discrete time chain is often called the embedded chain associated with the process X(t).

Algorithm 1. (Algorithmic construction of continuous time Markov chain) Input:

  • Let X (^) n , n ≥ 0, be a discrete time Markov chain with transition matrix Q. Let the initial distribution of this chain be denoted by α so that P {X 0 = k} = α (^) k.
  • Let E (^) n , n ≥ 0, be a sequence of independent unit exponential random variables.

Algorithmic construction:

  1. Select X(0) = X 0 according to the initial distribution α.
  2. Let T 0 = 0 and define W (0) = E 0 /λ(X(0)), which is exponential with parameter λ(X(0)), to be the waiting time in state X(0).
  3. Let T 1 = T 0 + W (0), and define X(t) = X(0) for all t ∈ [T 0 , T 1 ).

See Propositions 2.3.18 and 2.3.19. We close this section with three examples.

Example 6.1.6. We consider again a random walker on S = { 0 , 1 ,... }. We suppose the transition intensities are

λ(i, i + 1) = λ λ(i, i − 1) = μ, if i > 0 ,

and λ(0, −1) = 0. Therefore, the probability of the embedded discrete time Markov chain transitioning up if the current state is i ￿= 0, is λ/(λ+μ), whereas the probability of transitioning down is μ/(λ + μ). When i ￿= 0, the holding times will always be exponentially distributed with a parameter of λ + μ.

Example 6.1.7. We generalize Example 6.1.6 by allowing the transition rates to depend upon the current state of the system. As in the discrete time setting this leads to a birth and death process. More explicitly, for i ∈ { 0 , 1 ,... , } we let

λ(i, i + 1) = B(i) λ(i, i − 1) = D(i),

where μ 0 = 0. Note that the transition rates are now state dependent, and may even be unbounded as i → ∞. Common choices for the rates include

B(i) = λi D(i) = μi,

for some scalar λ, μ > 0. Another common model would be to assume a population satisfies a logistical growth model,

B(i) = ri D(i) =

r K

i 2.

where K is the carrying capacity. Analogously to Example 5.2.18, if we let X(t) denote the state of the system at time t, we have that X(t) solves the stochastic equation

X(t) = X(0) + Y (^1)

￿￿ (^) t

0

B(X(s))ds

− Y 2

￿￿ (^) t

0

D(X(s))ds

where Y 1 and Y 2 are independent unit-rate Poisson processes. As in Example 5.2.18, it is now an exercise to show that the solution to (6.5) satisfies the correct local intensity relations of Definition 6.1.3. For example, denoting

A(t) def = Y (^1)

￿￿ (^) t

0

B(X(s))ds

D(t) def = Y (^2)

￿￿ (^) t

0

D(X(s))ds

we see that

P {X(t + h) = x + 1 | X(t) = x} = P {A(t + h) − A(t) = 1, D(t + h) − D(t) = 0 | X(t) = x} + o(h) = B(x)h(1 − D(x)h) + o(h) = B(x)h + o(h).

￿

Example 6.1.8. We will model the dynamical behavior of a single gene, the mRNA molecules it produces, and finally the resulting proteins via a continuous time Markov chain. It is an entirely reasonable question to ask whether it makes sense to model the reaction times of such cellular processes via exponential random variables. The answer is almost undoubtably “no,” however the model should be interpreted as an approximation to reality and has been quite successful in elucidating cellular dynam- ics. It is also a much more realistic model than a classical ODE approach, which is itself a crude approximation to the continuous time Markov chain model (we will discuss this fact later). Consider a single gene that is producing mRNA (this process is called transcrip- tion) at a constant rate of λ 1 , where the units of time are hours, say. Further, we suppose the mRNA molecules are producing proteins (this process is called transla- tion) at a rate of λ 2 · (#mRNA), for some λ 2 > 0. Next, we assume that the mRNA molecules are being degraded at a rate of d (^) m · (#mRNA), and proteins are being degraded at a rate of d (^) p · (#proteins). Graphically, we may represent this system via

G

λ(1) → G + M

M

λ(2) → M + P

M d (^) m → ∅

P

d (^) p → ∅.

It is important to note that this is not the only way to write down these reactions. For example, many in the biological communities would write M → P , as opposed to M → M + P. However, we feel it is important to stress, through the notation M → M + P , that the mRNA molecule is not lost during the course of the reaction. As the number of genes in the model is assumed to be constant in time, the state space should be taken to be Z (^2) ≥ 0. Therefore, we let X(t) ∈ Z (^2) ≥ 0 be the state of the process at time t where the first component gives the number of mRNA molecules and the second gives the number of proteins. Now we ask: what are the possible transitions in the model, and what are the rates? We see that the possible transitions are given by addition of the reaction vectors ￿ 1 0

Thus, we see that we can construct an explosive birth process by requiring that the holding times satisfy

n 1 /λ(X^ n^ )^ <^ ∞.

Example 6.2.3. Consider a pure birth process in which the embedded discrete time Markov chain is the deterministically monotone chain of Example 3.1.5. Suppose that the holding time parameter in state i is λ(i). Finally, let X(t) denote the state of the continuous time process at time t. Note that the stochastic equation satisfied by X is

X(t) = X(0) + N

￿￿ (^) t

0

λ(X(s))ds

Suppose that λ(n) = λn 2 for some λ > 0 and that X(0) = 1. Then the nth holding time is determined by an exponential random variable with parameter λn 2 , which we denote by E (^) n. Since ￿

n

λn 2

we may conclude by Proposition 6.2.2 that

P

n

E (^) n < ∞

and the process is explosive. The stochastic equation for this model is

X(t) = X(0) + N

λ

￿ (^) t

0

X(s) 2 ds

and should be compared with the deterministic ordinary differential equation

x ￿^ (t) = λx 2 (t) ⇐⇒ x(t) = x(0) + λ

￿ (^) t

0

x(s) 2 ds,

which also explodes in finite time. ￿

Example 6.2.4. Consider a continuous time Markov chain with state space {-2,- 1,0,1,2,... }. We suppose that the graph of the model is

1 ￿ 1

2 ← 0 1 → 1 1 → 2 2 2 → 3 3 2 → · · · ,

where, in general, the intensity of n → n + 1, for n ≥ 1 is λ(n) = n 2. From the previous example, we know this process is explosive. However, if X(0) ∈ {− 2 , − 1 }, then the probability of explosion is zero 2 , whereas if X(0) = 0, the probability of explosion is 1/3. ￿

The following proposition characterizes the most common ways in which a process is non-explosive. A full proof can be found in [13].

(^2) This is proven by the next proposition, but it should be clear

Proposition 6.2.5. For any i ∈ S,

P (^) i {T (^) ∞ < ∞} = P (^) i

n

λ(X (^) n )

and therefore, the continuous time Markov chain is non-explosive iff

￿

n

λ(X (^) n )

P (^) i - almost surely for every i ∈ S. In particular,

(1) If λ(i) ≤ c for all i ∈ S for some c > 0 , then the chain is non-explosive.

(2) If S is a finite set, then the chain is non-explosive.

(3) If T ⊂ S are the transient states of {X (^) n } and if

P (^) i {X (^) n ∈ T, ∀n} = 0,

for every i ∈ S, then the chain is non-explosive.

Proof. The equivalence of the probabilities is shown in [13, Section 5.2]. Will prove the results 1,2,3. For (1), simply note that

￿

n

λ(X(n))

n

c

To show (2), we note that if the state space is finite, we may simply take c = max{λ (^) i }, and apply (1). We will now show (3). If P (^) i {X (^) n ∈ T, ∀n} = 0, then entry into T c^ is assured. There must, therefore, be a state i ∈ T c^ , which is hit infinitely often (note that this value can be different for different realizations of the process). Let the infinite sequence of times when X (^) n = i be denoted by {n (^) j }. Then,

￿

n

1 /λ(X (^) n ) ≥

j

1 /λ(X (^) n (^) j ) =

j

1 /λ(i) = ∞.

We will henceforth have a running assumption that unless otherwise explicitly stated, all processes consider are non-explosive. However, we will return to explosive- ness later and prove another useful condition that implies a process is non-explosive. This condition will essentially be a linearity condition on the intensities. This con- dition is sufficient to prove the non-explosiveness of most processes in the queueing literature. Unfortunately, the wold of biology is not so easy and most processes of interest are highly non-linear and it is, in general, quite a difficult (and open) problem to characterize which systems are non-explosive.

Thus,

P (^) ij￿ (t) = −λ(j)P (^) ij (t) +

y￿ =j

P (^) iy (t)λ(y, j). (6.8)

These are the Kolmogorov forward equations for the process. In the biology literature this system of equations is termed the chemical master equation. We point out that there was a small mathematical “slight of hand” in the above calculation. To move from (6.6) to (6.7), we had to assume that ￿

y

P (^) iy (t)o (^) y (h) = o(h),

where we write o (^) y (h) to show that the size of the error can depend upon the state y. This condition is satisfied for all systems we will consider.

Definition 6.3.1. Let X(t) be a continuous time Markov chain on some state space S with transition intensities λ(i, j) ≥ 0. Recalling that

λ(i) =

j￿ =i

λ(i, j),

The matrix

Aij =

−λ(i), if i = j λ(i, j), if i ￿= j

j λ(i, j),^ if^ i^ =^ j λ(i, j), if i ￿= j

is called the generator, or infinitesimal generator, or generator matrix of the Markov chain.

We see that the Kolmogorov forward equations (6.8) can be written as the matrix differential equation P ￿^ (t) = P (t)A,

since

(P (t)A) (^) ij =

y

P (^) iy (t)Ayj = P (^) ij Ajj +

y￿ =j

P (^) iy Ayj

= −λ(j)P (^) ij (t) +

y ￿=j

P (^) iy λ(y, j).

At least formally, this system can be solved

P (t) = P (0)e tA^ = e tA^ ,

where e tA^ is the matrix exponential and we used that P (0) = I, the identity matrix. recall that the matrix exponential is defined by

e At^ def =

￿^ ∞

k=

t n^ A n n!

This solution is always valid in the case that the state space is finite. We make the following observations pertaining to the generator A:

  1. The elements on the main diagonal are all strictly negative.
  2. The elements off the main diagonal are non-negative.
  3. Each row sums to zero.

We also point out that given a state space S, the infinitesimal generator A completely determines the Markov chain as it contains all the local information pertaining to the transitions: λ(i, j). Thus, it is sufficient to characterize a chain by simply providing a state space, S, and generator, A.

Example 6.3.2. A molecule transitions between states 0 and 1. The transition rates are λ(0, 1) = 3 and λ(1, 0) = 1. The generator matrix is

A =

Example 6.3.3. Consider a mathematician wandering between three coffee shops with graphical structure

A

μ (^1) ￿ λ (^1)

B

μ (^2) ￿ λ (^2)

C.

The infinitesimal generator of this process is

A =

−μ 1 μ 1 0 λ 1 −(λ 1 + μ 2 ) μ (^2) 0 λ 2 −λ (^2)

and the transition matrix for the embedded Markov chain is

P =

λ 1 /(λ 1 + μ 1 ) 0 μ 2 /(λ 1 + μ 1 ) 0 1 0

Example 6.3.4. For a unit-rate Poisson process, we have

A =

If we are given an initial condition, α, then αP (t) is the vector with jth element

(αP (t)) (^) j =

i

α (^) i P (^) ij =

i

P {X(t) = j | X(0) = i}P {X(0) = i} def = P (^) α {X(t) = j},

giving the probability of being in state j at time t given and initial distribution of α. Thus, we see that if α is given, we have

αP (t) = P (^) α (t) = αe tA^. (6.9)

We let s (^) n = nt/N for some large N , denote ∆ = t/N , and see

P {X(t) = j,T 1 ≤ t | X(0) = i} =

N￿ − 1

n=

P {X(t) = j, T 1 ∈ (s (^) n , sn+1 ) | X(0) = i}

N￿ − 1

n=

P {X(t) = j | X(0) = i, T 1 ∈ (s (^) n , sn+1 )}P {T 1 ∈ (s (^) n , sn+1 ) | X(0) = i}

N￿ − 1

n=

P {X(t) = j | X(0) = i, T 1 ∈ (s (^) n , sn+1 )}

λ(i)e λ(i)s^ n^ ∆ + O(∆ 2 )

N￿ − 1

n=

λ(i)e λ(i)s^ n

k ￿=i

P {X(t) = j, X 1 = k | X(0) = i, T 1 ∈ (s (^) n , sn+1 )}∆ + O(∆)

N￿ − 1

n=

λ(i)e λ(i)s^ n

k ￿=i

P {X(t) = j | X 1 = k, X(0) = i, T 1 ∈ (s (^) n , sn+1 )}

× P {X 1 = k | X(0) = i, T 1 ∈ (s (^) n , sn+1 )}

∆ + O(∆)

N￿ − 1

n=

λ(i)e λ(i)s^ n

k￿ =i

Q (^) ik P (^) kj (t − s (^) n )∆ + O(∆)

￿ (^) t

0

λ(i)e λ(i)s^

k￿ =i

Q (^) ik P (^) kj (t − s)ds,

as ∆ → 0. Combining the above shows the result.

Proposition 6.3.6. For all i, j ∈ S, we have that P (^) ij (t) is continuously differentiable and P ￿^ (t) = AP (t), (6.10)

which in component form is

P (^) ij￿ (t) =

k

Aik P (^) kj (t).

The system of equations (6.10) is called the Kolmogorov backwards equations. Note that the difference with the forward equations is the order of the multiplication of P (t) and A. However, the solution of the backwards equation is once again seen to be P (t) = e tA^ ,

agreeing with previous results.

Proof. Use the substitution u = t − s in the integral equation to find that

P (^) ij (t) = δ (^) ij e −λ(i)t^ +

￿ (^) t

0

λ(i)e −λ(i)s^

k ￿=i

Q (^) ik P (^) kj (t − s)ds

= δ (^) ij e −λ(i)t^ +

￿ (^) t

0

λ(i)e −λ(i)(t−u)^

k￿ =i

Q (^) ik P (^) kj (u)ds

= e −λ(i)t

δ (^) ij +

￿ (^) t

0

λ(i)e λ(i)u^

k￿ =i

Q (^) ik P (^) kj (u)ds

Differentiating yields

P (^) ij￿ (t) = −λ(i)e −λ(i)t

δ (^) ij +

￿ (^) t

0

λ(i)e λ(i)u^

k￿ =i

Q (^) ik P (^) kj (u)ds

  • e −λ(i)t^ · λ(i)e λ(i)t^

k ￿=i

Q (^) ik P (^) kj (t)

= −λ(i)P (^) ij (t) + λ(i)

k￿ =i

Q (^) ik P (^) kj (t)

k

(−λ(i)δ (^) ik P (^) kj (t)) +

k

λ(i)Q (^) ik P (^) kj (t)

k

(−λ(i)δ (^) ik + λ(i)Q (^) ik )P (^) kj (t)

k

A (^) ik P (^) kj (t).

Both the forward and backward equations can be used to solve for the associated probabilities as the next example demonstrates.

Example 6.3.7. We consider a two state, { 0 , 1 }, continuous time Markov chain with generator matrix

A =

−λ λ μ −μ

We will use both the forwards and backwards equations to solve for P (t).

Approach 1: Backward equation. While we want to compute P (^) ij (t) for each pair i, j ∈ { 0 , 1 }, we know that

P 00 (t) + P 01 (t) = P 10 (t) + P 11 (t) = 1,

for all t ≥ 0, and so it is sufficient to solve just for P 00 (t) and P 10 (t). The backwards equation is P ￿^ (t) = AP (t), yielding the equations

P 00 ￿ (t) = λ[P 10 (t) − P 00 (t)] P 10 ￿ (t) = μ[P 00 (t) − P 10 (t)].

Example 6.3.8 (Computing matrix exponentials). Suppose that A is an n×n matrix with n distinct eigenvectors. Then, letting D be a diagonal matrix consisting of the eigenvalues of A, we can decompose A into

A = QDQ −^1 ,

where Q consists of the eigenvectors of A (ordered similarly to the order of the eigen- values in D). In this case, we get the very nice identity

e At^ =

￿^ ∞

n=

t n^ (QDQ −^1 ) n n!

= Q

n=

t n^ D n n!

Q −^1 = Qe Dt^ Q −^1 ,

where e Dt^ , because D is diagonal, is a diagonal matrix with diagonal elements e λ^ i^ t where λ (^) i is the ith eigenvalue.

Example 6.3.9. We now solve the above problem using the matrix exponential. Supposing, for concreteness, that λ = 3 and μ = 1, we have that the generator matrix is

A =

It is easy to check that the eigenvalues are 0, −4 and the associated eigenvalues are [1, 1] t^ and [− 3 , 1] t^. Therefore,

Q =

, Q −^1 =

and

e tA^ =

1 /4 + (3/4)e −^4 t^3 / 4 − (3/4)e −^4 t 1 / 4 − (1/4)e −^4 t^3 /4 + (1/4)e −^4 t

You should note that

lim t→∞ e tA^ =

which has a common row. Thus, for example, in the long run, the chain will be in state zero with a probability of 1/4. ￿

6.4 Stationary Distributions

In this section we will parallel our treatment of stationary distributions for discrete time Markov chains. We will aim for intuition, as opposed to attempting to prove everything, and point the interested reader to [13] and [11] for the full details of the proofs.

6.4.1 Classification of states

We start by again classifying the states of our process. Viewing a continuous time Markov chain as an embedded discrete time Markov chain with exponential holding times makes the classification of states, analogous to Section 3.4 in the discrete time setting, easy. We will again denote our state space as S.

Definition 6.4.1. The communication classes of the continuous time Markov chain X(t) are the communication classes of the embedded Markov chain X (^) n. If there is only one communication class, we say the chain is irreducible; otherwise it is said to be reducible.

Noting that X(t) will return to a state i infinitely often if and only if the embedded discrete time chain does (even in the case of an explosion!) motivates the following.

Definition 6.4.2. State i ∈ S is called recurrent for X(t) if i is recurrent for the embedded discrete time chain X (^) n. Otherwise, i is transient.

Definition 6.4.3. Let T 1 denote the first jump time of the continuous time chain. We define τ (^) i^ def = inf{t ≥ T 1 : X(t) = i},

and set m (^) i = E (^) i τ (^) i. We say that state i is positive recurrent if m (^) i < ∞.

Note that, perhaps surprisingly, we do not define i to be positive recurrent if i is positive recurrent for the discrete time chain. In Example 6.4.10 we will demonstrate that i may be positive recurrent for X (^) n , while not for X(t). As in the discrete time setting, recurrence, transience, and positive recurrence are class properties. Note that the concept of periodicity no longer plays a role, or even makes sense to define, as time is no longer discrete. In fact, if P (t) is the matrix with entries P (^) ij (t) = P {X(t) = j | X(0) = i} for an irreducible continuous time chain, then for every t > 0, P (^) ij (t) has strictly positive entries because there is necessarily a path between i and j, and a non-zero probability of moving along that path in time t > 0.

6.4.2 Invariant measures

Recall that equation (6.9) states that if the initial distribution of the process is α, then αP (t) is the vector whose ith component gives the probability that X(t) = i. We therefore define an invariant measure in the following manner.

Definition 6.4.4. A measure η = {η (^) j , j ∈ S} on S is called invariant if for all t > 0

ηP (t) = η.

If this measure is a probability distribution (i.e. sums to one), then it is called a stationary distribution.