The Monte Carlo method is a computational method that consists in using a computer-generated sample from a given probability distribution to produce a plug-in estimate of some feature of the given distribution (such as, for example, a moment or a quantile).

In what follows, we are often going to refer to the plug-in principle. If you are not familiar with the concept, you are advised to first read the lecture entitled Plug-in principle.

Let be a random variable with distribution function and suppose that we can generate (through a computer) a sample of realizations of random variables , ..., all having distribution function .

Denote by a feature of the distribution (e.g., its mean, its variance, or one of its quantiles).

Denote by the empirical distribution of the sample (i.e., a probability distribution that assigns probability to each one of the values ).

Then, the plug-in
estimateis
a **Monte Carlo approximation** of
.

In other words, the feature of the orignal distribution is approximated by the same feature of the empirical distribution of the computer-generated sample.

Example Let be a random variable having distribution function and let be a function. Suppose we want to approximate where we have written the expected value of as a Riemann-Stieltjes integral (see the lecture entitled Expected value) in order to highlight the fact that it depends on the distribution function . Now, suppose we have a computer-generated sample of draws , ..., from the distribution . Denote their empirical distribution function by . Then, the Monte Carlo approximation of iswhere we have used the facts that the empirical distribution is discrete and the expected value of a discrete random variable is the weighted sum of the values it can take, with weights equal to their respective probabilities (in this case the weights are all equal to ). In other words, we can approximate the expected value with the sample mean of the computer-generated draws .

When the Monte Carlo method is used to approximate an expected value (as in
the previous example), then the method is called **Monte Carlo
integration**. It consists in approximating the expected value of a
random variable
with the mean of a computer-generated sample of observations drawn at random
from the distribution of
:where
,...,
are the computer-generated draws from the distribution of
.
If the draws are IID, then
Kolmogorov's Strong Law of Large Numbers applies and the approximation
converges almost surely to
as the number of draws
becomes large. It is also possible, however, that the draws are not IID (as,
for instance, in the so-called Markov Chain Monte Carlo methods), but the
approximation converges anyway to
because a Law of Large Numbers for correlated sequences applies (see the
lecture entitled Laws of Large Numbers for more
details).

This approximation method is called Monte Carlo integration because the expected value being approximated is in fact an integral:where is the distribution function of , and the integral is a Riemann-Stieltjes integral. Moreover, if is a continuous variable, the integral can be written as an ordinary Riemann integral:where is the probability density function of .

It is very important to note that the method of Monte Carlo integration is often used when the distribution function of is not known analytically, but can be written as a function of a random vector and it is easy to produce computer-generated draws of . In such a case, draws ,..., are generated from the distribution of , which are then transformed into draws from the distribution of :and the expected value of is approximated by

Monte Carlo integration is also used to compute integrals that have no particular probabilistic interpretation but can be written as expected values. Suppose the integral to be computed iswhere is any integrable function. Now, take any legitimate probability density function such that for and for or . Then we can writewhere is a random variable having probability density function . If we are able to produce a computer-generated sample of draws ,..., from the distribution of , then the integral can be approximated as follows:

Up to this point, we have not clarified what we mean by a computer-generated sample of draws from a given distribution. While we will not go into details, we would like to highlight some facts.

First of all, there exist several algorithms that allow to use a computer to
produce sequences of numbers, called **pseudo-random** numbers,
which are not truly random, but approximate fairly well the properties of a
truly random sequence. If a statistician observed sequences of numbers
produced by these algorithms without knowing how they were produced, she would
conclude that they are indeed random (even after carrying out rigorous
statistical tests to check their randomness). See Wikipedia's article on
pseudo-random
number generators for more details.

Second, most available algorithms for producing sequences of pseudo-random numbers produce sequences of independent draws from a uniform distribution on the interval . These sequences of uniform random numbers are then transformed in some way in order to obtain sequences drawn from other distributions. For example, if is a pseudo-random number having a uniform distribution on and is an invertible distribution function, then the numberhas distribution function .

Proof

We have

This method of generating pseudo-random numbers from non-uniform distributions
is called **inverse transformation method**. Lots of other
methods exist (see, e.g., Wikipedia's article on
non-uniform
pseudo-random variate generation).

Third, all commonly used statistical software packages include efficient and thoroughly tested pseudo-random number generators for extracting random samples from the most important probability distributions. So, if you need samples of observations from these distributions to compute a Monte Carlo approximation, all you need to do is to find out how to use the pseudo-random number generator that comes with your favorite statistical software.

By approximating a quantity of interest with its Monte Carlo approximation , we commit an approximation error

In general, the properties of this approximation error (mean, variance,
asymptotic convergence) depend on the properties of the plug-in estimator
.
But, as noted in the lecture on the
plug-in principle, these
properties can be difficult to study in **general** terms.
However, things are relatively simple in the **particular** case
of Monte Carlo integration, that is, when
is an integral (an expected value). In this
case,and

When ,..., are independent and have the same distribution of , then converges almost surely to by Kolmogorov's Strong Law of Large Numbers (provided ). But this implies that the approximation error converges to zero as the sample size gets large. Furthermore, the expected value and the variance of the approximation error can be easily computed:

Proof

The expected value isand the variance is

From these formulae it is clear that the variance of the approximation error can be made as small as desired by increasing the sample size of the computer-generated sample (at the cost of increasing the computational intensity of the approximation, that is, the computer time and memory required to compute the approximation).

Note also that, since is usually not known, it can be estimated by the sample varianceand the variance of the approximation error is estimated by

Finally, let us mention that there exist several techniques that can be used to reduce the variance of a Monte Carlo approximation. These techniques are called variance reduction techniques. One of the most popular of these techniques is introduced in the lecture entitled Importance sampling.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.

Featured pages

- Chi-square distribution
- Binomial distribution
- Beta function
- Beta distribution
- Exponential distribution
- Delta method

Explore

Main sections

- Mathematical tools
- Fundamentals of probability
- Probability distributions
- Asymptotic theory
- Fundamentals of statistics
- Glossary

About

Glossary entries

- Discrete random variable
- Almost sure
- Type II error
- Power function
- Binomial coefficient
- Continuous mapping theorem

Share