Search for probability and statistics terms on Statlect

Posterior probability

by , PhD

The posterior probability is one of the quantities involved in Bayes' rule.

It is the conditional probability of a given event, computed after observing a second event whose conditional and unconditional probabilities were known in advance.

It is derived by updating the prior probability, which was assigned to the first event before observing the second event.

Table of Contents


The following is a more formal definition.

Definition Let A and $B$ be two events whose probabilities [eq1] and [eq2] are known. If also the conditional probability [eq3] is known, Bayes' rule gives[eq4]The conditional probability [eq5] thus computed is called posterior probability.

In other words, the posterior probability is the conditional probability [eq6] calculated after receiving the information that the event $B$ has happened.


Suppose that an individual is extracted at random from a population of men.

We know the following things:

If the individual extracted at random from the population turns out to be childless, what is the conditional probability that he is married?

This conditional probability is called posterior probability and it can be computed by using Bayes' rule above.

The quantities involved in the computation are[eq7]

The posterior probability is[eq8]

Quantities involved in the formula

There are four quantities in the formula[eq4]

We have said that [eq5] is called posterior probability.

The other three quantities are:

  1. the prior probability [eq11];

  2. the likelihood (or conditional probability) [eq12];

  3. the marginal probability [eq2].

We need to know these three quantities in order to compute the posterior.

Law of total probability

Sometimes, we do not know the marginal probability, but we know [eq14], the likelihood of the complement of A.

In those cases, we can use the law of total probability:[eq15]where [eq16]

Posterior distribution

A related concept is that of a posterior probability distribution, or posterior distribution for short.

In Bayesian statistics, we assume that some observed data x have been drawn from a distribution that depends on a parameter $	heta $.

In formal terms, we write this assumption as a likelihood[eq17]where $f$ denotes:

We assign a probability distribution[eq18]to the parameter, called a prior distribution.

The prior distribution reflects our subjective beliefs or information acquired previously.

The posterior distribution is[eq19]

The posterior distribution tells us how our prior has changed in light of the information provided by the data x.

Computation of the posterior

Thanks to its conceptual simplicity, the Bayesian approach is extremely powerful and versatile.

All we need to do is to specify a prior and a likelihood, and we face virtually no constraints in doing so.

The marginal distribution $fleft( x
ight) $ is derived from the prior and the likelihood.

We first derive the joint distribution [eq20]and then we marginalize it to obtain the posterior.

In the continuous case, the marginal is computed by integration[eq21]

In the discrete case, it is derived by calculating a sum[eq22]

Both the integral and the sum are over the whole support of $	heta $.

Closed-form posterior

There are important cases in which we are able to derive the marginal $fleft( x
ight) $ in closed form.

In those cases, the posterior [eq23] is known analytically.

If we are lucky, [eq24] is also a distribution whose properties (e.g., the mean and the variance) are well known.

Some examples of these fortunate cases can be found in the lectures on:

Computational challenges

In many other cases, however, we are not able to marginalize the joint distribution because the integral (or the sum) above is intractable.

In those cases, there are numerical methods that allow us to draw Monte Carlo samples from the posterior distribution.

Such methods are discussed in the lecture on Markov Chain Monte Carlo methods.

There are also popular methods that allow us to approximate the posterior distribution with relatively simple distributions, such as mixtures of normals. These methods are called variational inference methods.

Maximum a posteriori estimation

Moreover, we can derive interesting information about the posterior also if we do not know $fleft( x
ight) $.

For example, we can find the Maximum A Posteriori (MAP) estimator of $	heta $.

The MAP estimator, denoted by [eq25], solves the optimization problem[eq26]which is equivalent to the problem[eq27]

We can drop the unknown denominator $fleft( x
ight) $ from the objective function because it does not depend on $	heta $.

The MAP estimator is the mode of the posterior distribution, that is, the value of the parameter that is most likely according to the posterior distribution.

How to interpret the posterior

The posterior distribution is interpreted as a summary of two sources of information:

Being able to summarize these two sources of information in a single object (the posterior) is one of the main strengths of the Bayesian approach.

How to use the posterior

What do we do after computing the posterior?

There are many things we can do. The most common are:

More details

More details about the posterior probability and posterior distributions can be found in the lectures on:

Keep reading the glossary

Previous entry: Parameter space

Next entry: Power function

How to cite

Please cite as:

Taboga, Marco (2021). "Posterior probability", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.