Notes on Markov Chains | Stochastic Process | PSTAT 160A, Study notes of Statistics

Markov Chains Material Type: Notes; Class: STOCHASTIC PROCESS; Subject: Statistics & Applied Probability; University: University of California - Santa Barbara;

Typology: Study notes

2011/2012

Uploaded on 03/17/2012

rbaseball4
rbaseball4 🇺🇸

5

(2)

12 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PSTAT160A: Markov Chains
University of California, Santa Barbara
Gerard Brunick
Last Updated: March 7, 2012
These notes are a minor modification of a set on notes which were generously shared with the
present “author” by Gordan ˇ
Zitkovi´c who currently works in the Department of Mathematics at
The University of Texas at Austin. Any mistakes in these notes where undoubtedly introduced by
the present author when he modified the original presentation.
Contents
1 Markov Chains 1
1.1 TheMarkovproperty ................................... 1
1.2 Examples .......................................... 4
1.3 Chapman-Kolmogorov relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 StrongMarkovproperty.................................. 13
2 Classification of States 14
2.1 The Communication Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Classes............................................ 15
2.3 Transienceandrecurrence................................. 16
3 Limiting Probabilities 19
3.1 Stationarydistributions .................................. 19
3.2 Limitingdistributions ................................... 24
1 Markov Chains
1.1 The Markov property
Simply put, a stochastic process has the Markov property if its future evolution depends only
on its current position, not on how it got there. Here is a more precise, mathematical, definition.
It will be assumed throughout this course that any stochastic process (Xn)nN0takes values in a
countable set S- the state space. Usually, Swill be either N0(as in the case of branching processes)
or Z(random walks). Sometimes, a more general, but still countable, state space Swill be needed.
A generic element of Swill be denoted by ior j.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Notes on Markov Chains | Stochastic Process | PSTAT 160A and more Study notes Statistics in PDF only on Docsity!

PSTAT160A: Markov Chains

University of California, Santa Barbara

Gerard Brunick

Last Updated: March 7, 2012

These notes are a minor modification of a set on notes which were generously shared with the present “author” by Gordan ˇZitkovi´c who currently works in the Department of Mathematics at The University of Texas at Austin. Any mistakes in these notes where undoubtedly introduced by the present author when he modified the original presentation.

Contents

1 Markov Chains 1 1.1 The Markov property................................... 1 1.2 Examples.......................................... 4 1.3 Chapman-Kolmogorov relations.............................. 9 1.4 Strong Markov property.................................. 13

2 Classification of States 14 2.1 The Communication Relation............................... 14 2.2 Classes............................................ 15 2.3 Transience and recurrence................................. 16

3 Limiting Probabilities 19 3.1 Stationary distributions.................................. 19 3.2 Limiting distributions................................... 24

1 Markov Chains

1.1 The Markov property

Simply put, a stochastic process has the Markov property if its future evolution depends only on its current position, not on how it got there. Here is a more precise, mathematical, definition. It will be assumed throughout this course that any stochastic process (Xn)n∈N 0 takes values in a countable set S - the state space. Usually, S will be either N 0 (as in the case of branching processes) or Z (random walks). Sometimes, a more general, but still countable, state space S will be needed. A generic element of S will be denoted by i or j.

Definition 1.1. A stochastic process (Xn)n∈N 0 taking values in a countable state space S is said to have the Markov property or be a Markov process if

P[Xn+1 = in+1|Xm = im, 0 ≤ m ≤ n] = P[Xn+1 = in+1|Xn = in], (^) (1.1)

for all n ∈ N 0 and all i 0 , i 1 ,... , in, in+1 ∈ S, whenever the two conditional probabilities are well- defined, i.e., when P[Xn = in,... , X 1 = i 1 , X 0 = i 0 ] > 0.

The condition P[Xn = in,... , X 0 = i 0 ] > 0 will be assumed (without explicit mention) every time we write a conditional expression like to one in (1.1). We will actually further simplify the situation and assume that the rules which determine how the process evolves do not change over time.

Definition 1.2. We say that a Markov process (Xn)n∈N 0 taking values in the countable state space S is a Markov chain if

P[Xn+1 = j | Xn = i] = P[Xm+1 = j | Xm = i], (1.2)

for m, n ∈ N 0.

Markov chains are (relatively) easy to work with because the Markov property allows us to compute all the probabilities, expectations, etc. we might be interested in by using only two ingre- dients.

  1. Initial probability π = {πi : i ∈ S}, πi = P[X 0 = i] - the initial probability distribution of the process, and
  2. Transition probabilities Pij = P[Xn+1 = j|Xn = i] - the mechanism that the process uses to jump around.

Indeed, if one knows all πi and all Pij , and wants to compute a joint distribution P[Xn = in, Xn− 1 = in− 1 ,... , X 0 = i 0 ], one needs to use the definition of conditional probability and the Markov property several times (the multiplication theorem from your elementary probability course) to get

P(Xn = in,... , X 0 = i 0 ] = P(Xn = in | Xn− 1 = in− 1 ,... , X 0 = i 0 ) P(Xn− 1 = in− 1 ,... , X 0 = i 0 ) = P(Xn = in|Xn− 1 = in− 1 ]P[Xn− 1 = in− 1 ,... , X 0 = i 0 ) = Pin− 1 in P(Xn− 1 = in− 1 ,... , X 0 = i 0 )

Repeating the same procedure, we get

P[Xn = in,... , X 0 = i 0 ] = πi 0 × Pi 0 i 1 × · · · × Pin− 2 in− 1 × Pin− 1 in.

When S is finite, there is no loss of generality in assuming that S = { 1 , 2 ,... , n}, and then we usually organize the entries of π into a row vector

π = (π 1 , π 2 ,... , πn),

Proof. We can write the event {Xn = i} ∩ A as a disjoint (countable) union of events of the form {X 0 = k 0 ,... , Xn− 1 = kn− 1 , Xn = i}, and we know that

P(Xn+1 = j | X 0 = k 0 ,... , Xn− 1 = kn− 1 , Xn = i) = Pij ,

so the result again follows from the previous lemma.

1.2 Examples

Here are some examples of Markov chains - for each one we write down the transition matrix. The initial distribution is sometimes left unspecified because it does not really change anything.

  1. Random walks Let (Xn)n∈N 0 be a simple random walk. Let us show that it indeed has the Markov property (1.1). Remember, first, that Xn =

∑n k=1 ξk, where^ ξk^ are^ independent^ coin-tosses. For a choice of i 0 ,... , in+1 (such that i 0 = 0 and ik+1 − ik = ±1) we have

P[Xn+1 = in+1|Xn = in, Xn− 1 = in− 1 ,... , X 1 = i 1 , X 0 = i 0 ] =P[Xn+1 − Xn = in+1 − in|Xn = in, Xn− 1 = in− 1 ,... , X 1 = i 1 , X 0 = i 0 ] =P[ξn+1 = in+1 − in|Xn = in, Xn− 1 = in− 1 ,... , X 1 = i 1 , X 0 = i 0 ] =P[ξn+1 = in+1 − in],

where the last equality follows from the fact that the increment ξn+1 is independent of the previous increments, and, therefore, of the values of X 1 , X 2 ,... , Xn. The last line above does not depend on in− 1 ,... , i 1 , i 0 , so X indeed has the Markov property. The state space S of (Xn)n∈N 0 is the set Z of all integers, and the initial distribution π is very simple: we start at 0 with probability 1 (so that π 0 = 1 and πi = 0, for i 6 = 0.). The transition probabilities are simple to write down

Pij =

p, j = i + 1 q, j = i − 1 0 , otherwise.

These can be written down in an infinite matrix,

P =

... 0 p 0 0 0... ... q 0 p 0 0... ... 0 q 0 p 0... ... 0 0 q 0 p... ... 0 0 0 q 0... ... 0 0 0 0 q... ...^

but it does not help our understanding much.

  1. Branching processes Let (Xn)n∈N 0 be a simple Branching process with the branching dis- tribution (∑ pn)n∈N 0. As you surely remember, it is constructed as follows: X 0 = 1 and Xn+1 = Xn k=1 Xn,k, where^ {Xn,k}n∈N 0 ,k∈N^ is a family of independent random variables with distribution (pn)n∈N 0. It is now not very difficult to show that (Xn)n∈N 0 is a Markov chain

P[Xn+1 = in+1|Xn = in, Xn− 1 = in− 1 ,... , X 1 = i 1 , X 0 = i 0 ]

=P[

∑^ Xn

k=

Xn,k = in+1|Xn = in, Xn− 1 = in− 1 ,... , X 1 = i 1 , X 0 = i 0 ]

=P[

∑^ in

k=

Xn,k = in+1|Xn = in, Xn− 1 = in− 1 ,... , X 1 = i 1 , X 0 = i 0 ]

=P[

∑^ in

k=

Xn,k = in+1],

where, just like in the random-walk case, the last equality follows from the fact that the random variables Xn,k, k ∈ N are independent of all Xm,k, m < n, k ∈ N. In particular, they are independent of Xn, Xn− 1 ,... , X 1 , X 0 , which are obtained as combinations of Xm,k, m < n, k ∈ N. The computation above also reveals the structure of the transition probabilities, Pij , i, j ∈ S = N 0 :

Pij = P[

∑^ i

k=

Xn,k = j].

There is little we can do to make the expression above more explicit, but we can remember gener- ating functions and write Gi(s) =

j=0 Pij^ s j (^) (remember that each row of the transition matrix is

a probability distribution). Thus, Gi(s) = (G(s))i^ (why?), where G(s) is the generating function of the branching probability. Analogously to the random walk case, we have

πi =

1 , i = 1, 0 , i 6 = 1.

  1. Gambler’s ruin In Gambler’s ruin, a gambler starts with $x, where 0 ≤ x ≤ a ∈ N and in each play wins a dollar (with probability p ∈ (0, 1)) and loses a dollar (with probability q = 1 − p). When the gambler reaches either 0 or a, the game stops. The transition probabilities are similar to those of a random walk, but differ from them at the boundaries 0 and a. The state space is finite S = { 0 , 1 ,... , a} and the matrix P is, therefore, given by

P =

q 0 p 0... 0 0 0 0 q 0 p... 0 0 0 0 0 q 0... 0 0 0 .. .

0 0 0 0... 0 p 0 0 0 0 0... q 0 p 0 0 0 0... 0 0 1

  1. Making a non-Markov chain into a Markov chain How can we turn the process of Example 6 into a Markov chain. Obviously, the problem is that the frog has to remember the number of the leaf it came from in order to decide where to jump next. The way out is to make this information a part of the state. In other words, we need to change the state space. Instead of just S = { 1 , 2 ,... , N }, we set S = {(i, j) : i, j ∈ { 1 , 2 ,... N }}. In words, the state of the process will now contain not only the number of the current leaf (i.e., i) but also the number of the leaf we came from (i.e., j). There is a bit of freedom with the initial state, but we simply assume that we start from (1, 1). Starting from the state (i, j), the frog can jump to any state of the form (k, i), k 6 = i, j (with equal probabilities). Note that some states will never be visited (like (i, i) for i 6 = 1), so we could have reduced the state space a little bit right from the start.
  2. A more complicated example Let (Xn)n∈N 0 be a simple symmetric random walk. The absolute-value process Yn = |Xn|, n ∈ N 0 , is also a Markov chain. This processes is sometimes called the reflected random walk. Its easy to see (why?) that

P

[

|Xn+1| = 1

|Xm| = im, 0 ≤ m ≤ n

]

when in = 0, so we will show that

P

[

|Xn+1| = in ± 1

|Xm| = im, 0 ≤ m ≤ n

]

when in ∈ N. First observe that if jn 6 = 0, then

P

[

|Xn+1| = |jn| ± 1

∣∣ X

m =^ jm,^0 ≤^ m^ ≤^ n

]

= P

[

Xn+1 = jn ± sgn(jn)

∣ Xm =^ jm,^0 ≤^ m^ ≤^ n

]

where

sgn(x) =

1 if x > 0 , 0 if x = 0, − 1 if x < 0.

As we can partition the event {|Xm| = im, 0 ≤ m ≤ n} into a finite number of disjoint events of the form {Xm = jm, 0 ≤ m ≤ n}, we may apply Lemma 1.3 to conclude that (1.3) holds. We have now shown that |Xn| is a Markov chain on the state space S = N 0 with

P 0 j =

1 if j = 1 0 otherwise

and

Pij =

1 / 2 if j = i ± 1 0 otherwise

when i ∈ N.

  1. A more realistic example In a game of tennis, the scoring system is as follows: both players (let us call them Am´elie and Bj¨orn) start with the score of 0. Each time Am´elie wins a point, her score moves a step up in the following hierarchy

0 7 → 15 7 → 30 7 → 40.

Once Am´elie reaches 40 and scores a point, three things can happen:

  1. if Bj¨orn’s score is 30 or less, Am´elie wins the game.
  2. if Bj¨orn’s score is 40, Am´elie’s score moves up to “advantage”, and
  3. if Bj¨orn’s score is “advantage”, nothing happens to Am´elie’s score, but Bj¨orn’s score falls back to 40.

Finally, if Am´elie’s score is “advantage” and she wins a point, she wins the game. The situation is entirely symmetric for Bj¨orn. We suppose that the probability that Am´elie wins each point is p ∈ (0, 1), independently of the current score.

q

p

q

p

q

p

q

q

q

p

p q

q

p

p

q

p

q

1

q

p

q

p

q

p

p q

q

p p

q

p

q

1

p

q

p

q

8 0, 0<

8 0, 15<

8 15, 0<

8 0, 30<

8 15, 15<

8 0, 40<

8 15, 30<

8 15, 40<

8 30, 40<

Bjorn wins

8 30, 0<

8 30, 15<

8 30, 30<

8 40, 0<

8 40, 15<

8 40, 30<

8 40, 40<

Amelie wins

8 Adv, 40<

8 40, Adv<

Figure 1: Markov chains with a finite number of states are usually represented by directed graphs (like the one in the figure above). The nodes are states, two states i, j are linked by a (directed) edge if the transition probability Pij is non-zero, and the number Pij is written above the link. If Pij = 0, no edge is drawn.

For n = 0, we clearly have

P (^) ij(0) =

1 , i = j, 0 , i 6 = j.

Once we have defined the multi-step transition probabilities P (^) ij(n ), i, j ∈ S, n ∈ N 0 , we need to be able to compute them. This computation is central in various applications of Markov chains: they relate the small-time (one-step) behavior which is usually easy to observe and model to a long-time (multi-step) behavior which is really of interest. Before we state the main result in this direction, let us remember how matrices are multiplied. When A and B are n × n matrices, the product C = AB is also an n × n matrix and its ij-entry Cij is given as

Cij =

∑^ n

k=

AikBkj.

There is nothing special about finiteness in the above definition. If A and B were infinite matrices A = (Aij )i,j∈S , B = (Bij )i,j∈S for some countable set S, the same procedure could be used to define C = AB. Indeed, C will also be an “S × S ”-matrix and

Cij =

k∈S

AikBkj ,

as long as the (infinite) sum above converges absolutely. In the case of a typical transition matrix P, convergence will not be a problem since P is a stochastic matrix, i.e., it has the following two properties (why?):

  1. Pij ≥ 0, for all i, j ∈ S, and

j∈S Pij^ = 1, for all^ i^ ∈^ S^ (in particular,^ Pij^ ∈^ [0,^ 1], for all^ i, j).

When P = (Pij )i,j∈S and P′^ = (P (^) ij′ )i,j∈S are two S × S-stochastic matrices, the series

k∈S PikP^

′ kj converges absolutely since 0 ≤ P (^) kj′ ≤ 1 for all k, j ∈ S and so

k∈S

PikP (^) kj′

k∈S

Pik ≤ 1 , for all i, j ∈ S.

Moreover, a product C of two stochastic matrices A and B is always a stochastic matrix: the entries of C are clearly positive and (by Tonelli’s theorem)

j∈S

Cij =

j∈S

k∈S

AikBkj =

k∈S

Aik

j∈S

Bkj ︸ ︷︷ ︸ 1

k∈S

Aik = 1.

Proposition 1.6. Let Pn^ be the n-th (matrix) power of the transition matrix P. Then P (^) ij(n ) = (Pn)ij , for i, j ∈ S.

Proof. We proceed by induction. For n = 1 the statement follows directly from the definition of the matrix P. Supposing that P (^) ij(n )= (Pn)ij for all i, j, we have

P (^) ij(n +1)= P[Xn+1 = j | X 0 = i]

k∈S

P[Xn = k | X 0 = i] P[Xn+1 = j | X 0 = i, Xn = k]

k∈S

P[Xn = k | X 0 = i] P[Xn+1 = j | Xn = k]

k∈S

P (^) ik(n )Pkj.

where the second equality follows from the law of total probability, the third follows from Corol- lary 1.5, and the fourth one from homogeneity. The last sum above is nothing but the expression for the matrix product of Pn^ and P, and so we have proven the induction step.

Using Proposition 1.6, we can write a simple expression for the distribution of the random variable Xn, for n ∈ N 0. Remember that the initial distribution (the distribution of X 0 ) is denoted by

π = (πi)i∈S. Analogously, we define the vector π(n)^ = (π i( n))i∈S by

π( i n)= P[Xn = i], i ∈ S.

Using the law of total probability, we have

π i( n)= P[Xn = i] =

k∈S

P[X 0 = k] P[Xn = i|X 0 = k] =

k∈S

πkP (^) ki(n ).

We usually interpret π as a (row) vector, so the above relationship can be expressed using vector- matrix multiplication π(n)^ = πPn.

The following corollary shows a simple, yet fundamental, relationship between different multi- step transition probabilities P (^) ij(n ).

Corollary 1.7 (Chapman-Kolmogorov relations). For n, m ∈ N 0 and i, j ∈ S we have

P (^) ij(m +n)=

k∈S

P (^) ik(m )P (^) kj(n ).

Proof. The statement follows directly from the matrix equality

Pm+n^ = PmPn.

It is usually difficult to compute Pn^ for a general transition matrix P and a large n. We will see later that it will be easier to find the limiting values limn→∞ P (^) ij(n ). In the mean-time, here is a simple example where this can be done by hand

1.4 Strong Markov property

The rules which describe the evolution of a Markov chain do not change over time. As a result, if we stop a Markov chain at a stopping time, and then restart the process after this stopping time, the restarted process is again a Markov chain with the same transition function. This useful observation is known as the “Strong Markov property.”

Proposition 1.9. Let (Xn)n∈N 0 be Markov chain with transition function P = (Pij )i,j∈S , and let T be a stopping time with respect to (Xn)n∈N 0. Then, conditional on the event {T < ∞}, the process Yn = XT +n is also a Markov chain with the same transition function.

Proof. Its enough to check that

P(Yn+1 = j | T < ∞, Yn = i, Ym = km, 0 ≤ m ≤ n − 1) = Pij , (1.5)

for all n, i, j, and k 0 ,... , kn− 1. Moreover, if we show that

P(Yn+1 = j | T = `, Yn = i, Ym = km, 0 ≤ m ≤ n − 1) = Pij (1.6)

for all ` ∈ N 0 , then (1.5) follows from Lemma 1.3 and we are done, so its really enough to show (1.6). We have

P(Yn+1 = j | T = , Yn = i, Ym = km, 0 ≤ m ≤ n − 1) = P(X+n+1 = j | T = , X+n = i, X+m = km, 0 ≤ m ≤ n − 1) = P(X+n+1 = j | X`+n = i) = Pij ,

where the first equality follows from the definition of Y , and the second equality follows from the fact that the event {T = , X+m = km, 0 ≤ m ≤ n − 1 }

is determined by the random variables X 0 ,... , X`+n, so Corollary 1.5 says that we may remove it from the conditioning.

One easy consequence of the strong Markov property is that the multistep transition proba- bilities are time independent when the single step transition probabilities are times independent.

Corollary 1.10. Let (Xn)n∈N 0 be Markov chain with multistep transition probabilities:

P (^) ij(n )= P(Xn = j | X 0 = i).

Then P(Xm+n = j | Xm = i) = P (^) ij(n)

for all m ∈ N 0.

Proof. Let T = m denote a deterministic stopping time, and set Yn = XT +n = Xm+n. Then Y is a Markov chain with the same transition probabilities as X, so

P(Yn = j | Y 0 = i) = P (^) ij(n ),

but P(Yn = j | Y 0 = i) = P(Xm+n = j | Xm = i),

so we are done.

2 Classification of States

2.1 The Communication Relation

Let (Xn)n∈N 0 be a Markov chain on the state space S. For a given state i ∈ S, define the hitting time Ti of i as Ti = min{n ∈ N 0 : Xn = i}. (2.1)

It is easy to check that Ti is a stopping time with respect to (Xn)n∈N 0. As always, we allow Ti to take the value +∞ if the process never reaches the state i. The hitting times are important both for immediate applications of (Xn)n∈N 0 , as well as for better understanding of the structure of Markov chains.

Example 2.1. Let (Xn)n∈N 0 be the chain which models a game of tennis (Example 9., in Section

  1. of Lecture 8). The probability of winning for (say) Am´elie can be phrased in terms of hitting times: P[ Am´elie wins ] = P[TiA < TiB ],

where iA = “Am´elie wins” and iB =“Bj¨orn wins” (the two absorbing states of the chain).

Having introduced the hitting times Ti, let us give a few more definitions. It will be convenient to let Pi denote the probability Pi[A] = P[A|X 0 = i]. We use Pi to signify that we are starting the chain from the state i, i.e., Pi corresponds to a Markov chain whose transition matrix is the same as the one of (Xn)n∈N 0 , but the initial distribution is given by Pi[X 0 = i] = 1.

Definition 2.2. The state j ∈ S is said to accessible from the state i ∈ S if

Pi(Tj < ∞) > 0.

In other words, j is accessible from i if there is a non-zero chance that the Markov chain X will eventually visit j if it starts from i.

Example 2.3. In the tennis example, every state is accessible from (0, 0) (the fact that p ∈ (0, 1) is important here), but (0, 0) is not accessible from any other state. The consequence of (40, 40) are (40, 40) itself, (40, Adv), (Adv, 40), “Am´elie wins” and “Bj¨orn wins”.

Proposition 2.4. i → j if and only if P (^) ij(n )> 0 for some n ∈ N 0.

  1. Start from an arbitrary state (call it 1).
  2. Identify all states j that communicate with it - don’t forget that always i ↔ i, for all i.
  3. That is your first class, call it C 1. If there are no elements left, then there is only one class C 1 = S. If there is an element in S \ C 1 , repeat the procedure above starting from that element.

Example 2.9. A simple random walk is irreducible: you can get anywhere on Z from anywhere else.

Example 2.10. In the gambler’s ruin problem there are three classes:

  1. the winning state by itself,
  2. the losing state by itself, and
  3. the collection of intermediate states.

Example 2.11. If the process marched deterministically to the right on Z, then each point in the state space is its own class.

Example 2.12. In the tennis example, all the states except for those contained in the set E = {(40, Adv), (40, 40), (Adv, 40), Am´elie wins, Bj¨orn wins} communicate only with themselves, so each i ∈ S \ E is in a class by itself. The winning states Am´elie wins and Bj¨orn wins are absorbing, and, so, also form classes with one element. Finally, the three states in {(40, Adv), (40, 40), (Adv, 40)} intercommunicate with each other, so they form the last class.

Certain properties of states are shared between all elements in a class. Knowing which properties share this feature is useful for a simple reason - if you can check them for a single class member, you know automatically that all the other elements of the class share it.

Definition 2.13. A property is called a class property it holds for all states in its class, whenever it holds for any one particular state in the that class.

Put differently, a property is a class property if and only if either all states in a class have or none does.

2.3 Transience and recurrence

It is often important to know whether a Markov chain will ever return to its initial state, and if so, how often. The notions of transience and recurrence address this question.

Definition 2.14. The m-th time that the process X returns to the state j is defined to be

T (^) j( m)= min

n ∈ N :

∑n i=1 1 {Xn=j}^ =^ m

for m ∈ N.

Notice that T (^) j( m)≥ 1, even if X 0 = j.

Definition 2.15. A state i ∈ S is said to be

  1. recurrent if Pi

T (^) i(1) < ∞

= 1, and

  1. transient if it is not recurrent.

A recurrent state is

  1. positive recurrent if Ei[T (^) i(1) ] < ∞ (Ei means expectation when the probability is Pi),
  2. and null recurrent otherwise.

A state is recurrent if we are sure we will come back to it eventually (with probability 1). It is positive recurrent if the time between two consecutive visits has finite expectation. Null recurrence means that we will return, but the waiting time may be very long. A state is transient is there is a positive chance (however small) that the chain will never return to it.

Example 2.16 (One-dimensional random walk). We studied transience and recurrence in the lectures about random walks (we just did not call them that). The situation highly depends on the probability p of making an up-step. If p > 12 , there is a positive probability that the first step will be “up”, so that X 1 = 1. Then, we know that there is a positive probability that the walk will never hit 0 again. Therefore, there is a positive probability of never returning to 0 , which means that the state 0 is transient. A similar argument can be made for any state i and any probability p 6 = 12. What happens when p = 12? In order to come back to 0 , the walk needs to return there from its position at time n = 1. If it went up, the we have to wait for the walk to hit 0 starting from 1. We have shown that this will happen sooner or later, but that the expected time it takes is infinite. The same argument works if X 1 = − 1. All in all, 0 (and all other states) are null-recurrent.

Example 2.17 (Gambler’s Ruin). The absorbing states 0 and a are (trivially) positive recur- rent. All the other states are transient: starting from any state i ∈ { 1 , 2 ,... , a − 1 }, there is a positive probability (equal to pa−i) of winning every one of the next a − i games and, thus, getting absorbed in a before returning to i.

Example 2.18 (Deterministic movement to the right). If the process marches in one direc- tion on Z, then all states are transient.

Proposition 2.19. If (Xn)n∈N 0 is a Markov chain and the state i is recurrent, then we have

Pi(T (^) i( m)< ∞) = 1 for all m ∈ N. In particular, a Markov chain which starts at a recurrent state, revisits that state infinitely often with probability one.

Proof. The idea here is simple: if we start at i and i is recurrent, then we will return to i over and over again. To make this rigorous, we use the strong Markov property. The proof is by induction, so suppose that Pi

T (^) i( m)< ∞

= 1 and set Yn = XT (m) i +n

. As T (^) i(m)

is a stopping times, Yn is a Markov chain that starts at i with the same transition probabilities as X. Let S i(1) = min {n ∈ N : Yn = i}

Proposition 2.22. Transience and recurrence are class properties.

Proof. Suppose that the state i is recurrent, and that j is in its class, i.e., that i ↔ j. Then, there exist natural numbers and n such that P (^) ij( )> 0 and P (^) ji(n )> 0. By the Chapman-Kolmogorov relations (use Remark 2.6 twice), for each m ∈ N, we have

P (^) jj(+m+n)≥ P (^) ji( )P (^) ii(m )P (^) ij(n ).

In other words, there exists a positive constant c (take c = P (^) ji(` )P (^) ij(n )), independent of n, such that

P (^) jj(` +m+n)≥ cP (^) ii(m ).

Therefore, by recurrence of i we have

m=1 P^

(m) ii =^ ∞, and ∑^ ∞

m=

P (^) jj(m )≥

∑^ ∞

m=

P (^) jj(` +m+n)≥ c

∑^ ∞

m=

P (^) ii(m )= +∞,

and so, j is recurrent. Therefore, recurrence is a class property. Since transience is just the opposite of recurrence, it is clear that transience is also a class property.

Remark 2.23. With a bit more work, it can be shown that positive recurrence and null recurrence are both class property.

3 Limiting Probabilities

Transitions between different states of a Markov chain describe short-time behavior of the chain. In most models used in physical and social sciences, systems change states many times per second. In a rare few, the time scale of the steps can be measured in hours or days. What is of interest, however, is the long-term behavior of the system, measured in thousands, millions, or even billions of steps. Here is an example: for a typical liquid stock traded on the New York Stock Exchange, there is a trade every few seconds, and each trade changes the price (state) of the stock a little bit. What is of interest to an investor is, however, the distribution of the stock-price in 6 months, in a year or, in 30 years - just in time for retirement. A back-of-an-envelope calculation shows that there are, approximately, 50 million trades in 30 years. So, a grasp of very-long time behavior of a Markov chain is one of the most important achievements of probability in general, and stochastic-process theory in particular. We only scratch the surface in these notes.

3.1 Stationary distributions

Definition 3.1. A probability distribution π = (πi)i∈S on the state space S of a Markov chain with transition matrix P is called a stationary or equilibrium distribution if

P[X 1 = i] = πi for all i ∈ S, whenever P[X 0 = i] = πi, for all i ∈ S.

In words, π is called a stationary distribution if the distribution of X 1 is equal to that of X 0 when the distribution of X 0 is π. Here is a hands-on characterization:

Proposition 3.2. A vector π = (πi)i∈S with

i∈S πi^ = 1^ is a stationary distribution if and only if πP = π, (3.1)

when π is interpreted as a row vector of length |S| and (3.1) is interpreted as matrix multiplication.

Proof. This is immediate since πP is the distribution of X 0.

Example 3.3. Let (Xn)n∈N 0 be a Markov chain on S = { 1 , 2 , 3 } with transition probabilities

P =

This chain is irreducible and π = ( 13 , 13 , 13 ) is a stationary distribution.

Example 3.4. Let (Xn)n∈N 0 be a Markov chain on S = { 1 , 2 , 3 } with transition probabilities

P =

so X simply rotates about the vertices of a triangle. This chain is irreducible and π = ( 13 , 13 , 13 ) is a stationary distribution.

Example 3.5 (A model of diffusion in a glass). Let us get back to the story about the glass of water and let us analyze a simplified model of that phenomenon. Our glass will be represented by the set { 0 , 1 , 2 ,... , a}, where 0 and a are the positions adjacent to the walls of the glass. The ink particle performs a simple random walk inside the glass. Once it reaches the state 0 it either takes a step to the right to 1 (with probability 12 ) or tries to go left (also with probability 12 ). The passage to the left is blocked by the wall, so the particle ends up staying where it is. The same thing happens at the other wall. All in all, we get a Markov chain with the following transition matrix

P =

1 2

1 1 2 0 0...^0 0 2 0

1 2 0...^0 0 0 12 0 12... 0 0 0 .. .

Let us see what happens when we start the chain with a distribution concentrated in a/ 2 (assuming that a is even); a graphical representation of the distribution of X 3 , X 12 , X 30 , X 200 , X 700 and X 5000 when a = 30 represents the behavior of the system very well (the y axis is on a different scales on