Hierarchical Bayesian models

A hierarchical Bayesian model is a model in which the prior distribution of some of the model parameters depends on other parameters, which are also assigned a prior.

Table of contents

Definition
Examples

Example 1 - Random means
Example 2 - Normal mean and Gamma precision

Computations
More than two levels

Definition

Given the observed data , in a hierarchical Bayesian model, the likelihood depends on two parameter vectors and and the prioris specified by separately specifying the conditional distribution and the distribution .

In the literature it is often required that the likelihood does not depend on , that is,

In this special case, the parameter is called hyper-parameter and the prior is called hyper-prior.

We use a broader definition of hierarchical model, that does not necessarily include assumption (1), because it allows for a unified treatment of several interesting models.

Examples

The following examples illustrate two popular models that fall within our definition.

Example 1 - Random means

Suppose the sample is a vector of draws from normal distributions having different unknown means $mu _{i}$ and a known common variance : [eq9]

Denote by the vector of means:

Conditional on , the observations are assumed to be independent. As a consequence, the likelihood of the whole sample, conditional on , can be written as [eq11]

Now, assume the means $mu _{i}$ are a sample of IID draws from a normal distribution with unknown mean and known variance $au ^{2}$ , so that [eq12]

Finally, we assign a normal prior (with known mean $m_{0}$ and variance $u^{2}$ ) to the hyper-parameter : [eq13]

The model just described is a hierarchical model. With the notation used in the definition, we have , and the added assumption that

Example 2 - Normal mean and Gamma precision

Suppose that the sample is a vector of IID draws from a normal distribution having unknown mean and unknown variance .

The likelihood of the whole sample, conditional on and , is [eq17]

Now, assume that the mean is itself normal with known mean and variance $sigma ^{2}/ u$ , where is a known parameter: [eq18]

Finally, we assign an inverse-Gamma prior to the parameter (i.e., a Gamma distribution to the precision $1/sigma ^{2}$ ): [eq19] where and are the two parameters of the Gamma distribution.

This is a very popular model, known as normal - inverse Gamma model.

It fits the above definition of a hierarchical model with , .

Computations

The computation of the posterior distribution is usually performed in steps: first is taken as given, and a conditional distribution for is derived; then a posterior for is computed.

The steps are as follows.

Conditional on (i.e., by keeping it fixed), compute:
1. the prior predictive distribution of :
2. the posterior distribution of :
By using from step 1, compute:
1. the prior predictive distribution of :
2. the posterior marginal distribution of :
Compute the posterior joint distribution of and :
Compute the posterior marginal distribution of :

When we are not able to carry out the integrations required to derive the predictive distributions, or when we cannot compute posteriors with Bayes' rule, then we can use other computational methods (e.g., the factorization method illustrated in the lecture on Bayesian inference). In these cases, the steps of the above procedure remain valid: we first derive posterior and predictive distributions given , by using whatever method is available to us; then, we use the conditional distributions thus derived to compute the posterior of .

More than two levels

In the definition above, there were only two levels: a parameter and a hyper-parameter .

The definition can be generalized to more than two levels. For example, we could have a third parameter , the likelihood and the priorwhich is specified by separately specifying the conditional distributions , and the distribution .

With more than two levels, the computation strategy is similar to that illustrated in the previous section. First, we take all parameters but one as given, and we derive the prior predictive distribution of , conditional on the parameters that have been kept fixed. Then, we use the predictive distribution thus obtained as likelihood, and we use it to obtain another prior predictive distribution for , conditional on a smaller number of parameters than in the previous step. And so on.

How to cite

Please cite as:

Taboga, Marco (2021). "Hierarchical Bayesian models", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Hierarchical-Bayesian-models.