Markov chains are sequences of random variables (or vectors) that possess the so-called Markov property: given one term in the chain (the present), the subsequent terms (the future) are conditionally independent of the previous terms (the past).
This lecture is a roadmap to Markov chains. Unlike most of the lectures in this textbook, it is not an exhaustive treatment of the subject with detailed proofs, derivations, examples and exercises (for which a separate textbook would be needed). Its aim is to provide a quick introduction to some concepts that are often used in applied statistics (e.g., in Markov Chain Monte Carlo methods). Readers who are interested in a more detailed treatment can have a look at the references reported at the end of this page.
Table of contents
Let us start with a formal definition.
Definition Let be a sequence of random vectors. The sequence is said to be a Markov chain if and only if any given term of the sequence is independent of all terms preceding conditional on : where the letter denotes a conditional distribution function.
The state space of a Markov chain is the set of all possible realizations of the terms of the chain. In other words, for any given term , the support of is included in .
In what follows we present the main facts about Markov chains, by tackling, in order of increasing difficulty, the cases of:
a finite state space
an infinite but countable state space
an uncountable state space
Let us start with the case of a finite state space. In particular, we assume thatthat is, the terms of the chain can take one of values .
We specify an initial distribution for the first value of the chain, that is, a vector of initial probabilities and impose
Then, we can choose a transition probability matrix (a matrix such that each of its rows is a vector of probabilities) and imposefor all and . The assumption that is equal for all is called time-homogeneity (think of the index as a measure of time).
These two choices (of the initial distribution and the transition distribution ) completely determine the distribution of all the terms of the chain. As a matter of fact, for any , we have where
The latter equality holds because
If, for a given transition probability matrix , there is an initial distribution such that the distribution of all the terms of the chain is equal to the initial distribution, then is called a stationary distribution (or invariant distribution) of the chain.
When is a stationary distribution, we have that
Together with the fact thatit implies
A Markov chain with finite state space is said to satisfy the detailed balance condition if and only if there exists a distribution such thatfor any .
By summing both sides of the equation over , we getor
But
Therefore,for any . But this can be written in matrix form as
As a consequence, is a stationary distribution of the chain.
Note that detailed balance is a more stringent requirement than the existence of a stationary distribution: the former implies the latter, but the converse is not true.
We now introduce some properties of Markov chains that are frequently used to study their asymptotic behavior. The first such property is so-called irreducibility.
Let . Define the hitting time where the subscript indicates that the chain is assumed to start from .
We say that leads to if and only if
Again, the notation means that the probability is computed under the assumption that .
A Markov chain is said to be irreducible if and only if every state leads to itself and to every other state, that is, if and only if there is a positive probability that for any starting state the chain will reach any other state (including itself) in finite time.
A state is called recurrent if and only if
Otherwise, it is called transient.
In other words, a state is recurrent if and only if the probability that the chain will return to in finite time (after having started from itself) is equal to .
A Markov chain is called recurrent if and only if all the elements of its state space are recurrent.
Let . The period of is defined as where is the greatest common denominator. In other words, is the minimum time the chain takes to return to (after starting from itself).
It is possible to prove that if the chain is irreducible, then all its states have the same period , which is called the period of the chain.
A chain is called aperiodic if and only if the period of the chain is .
We have the following important result.
Proposition If a Markov chain with a finite state space is irreducible, then it has a unique stationary distribution.
If we assume not only irreducibility, but also aperiodicity, we get the following result.
Proposition If a Markov chain with a finite state space is irreducible and aperiodic, then, irrespective of the initial distribution , where is the unique stationary distribution of the chain.
So, even if the initial distribution of the chain is not the stationary distribution, the terms of the sequence become less and less dependent on the initial value as increases and their distributions converge to the stationary distribution.
Irreducibility can also be used to prove a Strong Law of Large Numbers.
Proposition If a Markov chain with a finite state space is irreducible, then, for any bounded function ,where is the unique stationary distribution of the chain and denotes almost sure convergence as tends to infinity.
Thus, when the state space is finite, irreducibility guarantees that sample averages taken across the chain converge to population averages taken across the states.
This kind of proposition is often called an ergodic theorem.
We now tackle the case in which the state space is infinite but countable.
In particular, we assume thatthat is, the elements of the state space can be arranged into a sequence .
The initial distribution of the chain is a sequence of initial probabilities such that
Then, we can choose a double sequence of transition probabilities such that, for any index ,and we imposefor all , and .
Note that the transition probabilities are independent of . This property is called time-homogeneity.
These two choices (of the initial distribution and the transition probabilities ) completely determine the distribution of all the terms of the chain.
The distribution of is a sequence whose elements can be derived as follows:
The distributions of the other terms of the chain are sequences that can be derived recursively as follows:
The concept of stationary distribution is almost identical to that introduced in the case of a finite state space.
If, for a given double sequence of transition probabilities , there is an initial distribution such that the distribution of all the terms of the chain is equal to the initial distribution, then is called a stationary distribution of the chain.
Furthermore, we have that
A Markov chain with countable state space is said to satisfy the detailed balance condition if and only if there exists a distribution such thatfor any .
This implies thator
But
Therefore,for any . So, is a stationary distribution of the chain.
The concept of irreducibility is the same found in the case of a finite state space.
When dealing with countable state spaces, we usually strengthen the concept of recurrence introduced for the case of a finite state space.
Remember that a state is called recurrent if and only if the probability that the chain will return to (after having started from itself) is equal to .
Even if a state is recurrent, it could happen thatthat is, the expected value of the return time is infinite.
A state is said to be positive recurrent if and only if the latter possibility is ruled out, that is, if and only if
A Markov chain is called positive recurrent if and only if all the elements of its state space are positive recurrent.
The concept of aperiodicity is identical to that found in the case of a finite state space.
The following important result holds.
Proposition If a Markov chain with a countable state space is irreducible and positive recurrent, then it has a unique stationary distribution.
Note that in the case of a finite state space irreducibility was sufficient to obtain the uniqueness of the stationary distribution. In the countable case, we need to add the requirement of positive recurrence.
The following convergence result holds for chains having a countable state space.
Proposition If a Markov chain with a countable state space is irreducible, positive recurrent and aperiodic, then, irrespective of the initial distribution , for any , where is the unique stationary distribution of the chain.
Compare this proposition to the one for finite state spaces:
finite state space + irreducibility + aperiodicity convergence to the stationary distribution;
countable state space + irreducibility + aperiodicity + positive recurrence convergence to the stationary distribution.
Positive recurrence is needed also to prove a Strong Law of Large Numbers.
Proposition If a Markov chain with a countable state space is irreducible and positive recurrent, thenwhere is any function such that the expected value exists and is finite, is the unique stationary distribution of the chain and denotes almost sure convergence as tends to infinity.
Often, there are cases in which positive recurrence is difficult to prove, but we know that the chain has a stationary distribution. In these cases, we can use the following proposition.
Proposition If a Markov chain with a countable state space is irreducible and has a unique stationary distribution , thenfor any function such that the expected value exists and is finite.
We now analyze the more difficult case in which the state space is infinite and uncountable.
The initial distribution of the chain is a probability measure such that for any event .
Then, we can choose a function called transition kernel and we imposefor all and all events .
The transition kernel does not depend on . It is time-homogeneous.
These two choices (of the initial distribution and the transition kernel ) completely determine the distribution of all the terms of the chain.
The distribution of can be derived as follows: where is any event and the integral is a Lebesgue integral with respect to the probability measure .
The distributions of the other terms of the chain can be derived recursively as follows:
When the terms of the sequence are continuous variables (or vectors), then has a probability density such thatand the transition kernel can be expressed in terms of a conditional probability density :
As a consequence, the marginal densities of the terms of the chain can be expressed as
The concept of stationary distribution is similar to that found above for the cases of a finite and a countable state space.
If, for a given transition kernel , there is an initial distribution such that the distribution of all the terms of the chain is equal to the initial distribution, then is called a stationary distribution of the chain.
Furthermore, we have that satisfiesfor any .
In the case in which the transition kernel and the stationary distribution have densities, we can write
A Markov chain with uncountable state space and transition kernel is said to satisfy the detailed balance condition if and only if there exists a probability measure such that
If the measure and the transition kernel can be written in terms of probability densities, then the detailed balance condition can be written as
By integrating both sides of the equation with respect to , we getor
Sincewe get
Thus, is a stationary distribution.
The concept of irreducibility can be generalized to the case of an uncountable state space.
Let and be an event. Define the hitting time where the subscript indicates that the chain is assumed to start from .
We say that leads to if and only if
Again, the notation means that the probability is computed under the assumption that .
A Markov chain is said to be -irreducible if and only if there is a measure such that every state leads to when .
In other words, a chain is said to be -irreducible if and only if there is a positive probability that for any starting state the chain will reach any set having positive measure in finite time.
When dealing with uncountable state spaces, we often use a concept of recurrence, called Harris recurrence, that is even stronger than the concept of positive recurrence introduced in the case of an uncountable state space.
An event is called Harris recurrent if and only if the probability that the chain will return to infinitely often (after having started from a point belonging to ) is equal to .
A Markov chain is called Harris recurrent if and only if it is -irreducible and all the events such that are Harris recurrent.
In the uncountable case, the definition of aperiodicity is slightly more complicated.
A Markov chain is said to have period if its state space can be partitioned into at most mutually exclusive events such that the chain always takes exactly periods to cycle through these events.
In symbols, if the events are ,...,, then you have
A chain is said to be aperiodic if its period is .
Unlike in the finite and countable cases, -irreducibility and Harris recurrence are not sufficient to guarantee the existence and uniqueness of the stationary distribution. Further technical conditions need to be met (see, e.g., Glynn 2013).
Because -irreducibility and Harris recurrence are not sufficient to guarantee the existence and uniqueness of the stationary distribution, the latter is often directly added among the sufficient conditions for the convergence of the chain to a stationary distribution.
Proposition If a Markov chain with an uncountable state space is -irreducible, Harris recurrent, aperiodic and has a stationary distribution , then converges to .
The convergence of the sequence of probability distributions to the stationary distribution is in total variation norm (a technical detail that we can safely skip here).
In the uncountable case, the following ergodic theorem holds.
Proposition If a Markov chain with an uncountable state space is -irreducible, Harris recurrent and has a stationary distribution , thenfor any function such that the expected value exists and is finite.
Gilks, W. R., Richardson, S., Spiegelhalter D. (1995) Markov Chain Monte Carlo in Practice, CRC Press.
Glynn, P. W. (2013) Harris recurrence, Stochastic Systems lecture notes, Stanford University.
Hoel, P. G., Port, S. C., Stone, C. J. (1986) Introduction to Stochastic Processes, Waveland Press.
Norris, J. R. (1998) Markov Chains, Cambridge University Press.
Pishro-Nik, H. (2014) Introduction to Probability, Statistics, and Random Processes, Kappa Research, LLC.
Roberts, G. O., Rosenthal, J. S. (2006) Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov Chains, The Annals of Applied Probability, Vol. 16, No. 4, 2123-2139.
Most of the learning materials found on this website are now available in a traditional textbook format.