Search for probability and statistics terms on Statlect
Index > Glossary

Statistical model

by , PhD

A statistical model is a set of assumptions about the probability distribution that generated some observed data. In mathematical terms, the assumptions are formulated as restrictions on the set of probability distributions that could have generated the data.

Table of Contents


We provide here some examples of statistical models.

Example Suppose that we randomly draw n individuals from a certain population and measure their height. The measurements can be regarded as realizations of n random variables [eq1]. In principle, these random variables could have any probability distribution. If we assume that they have a normal distribution, as it is often done for height measurements, then we are formulating a statistical model: we are placing a restriction on the set of probability distributions that could have generated the data.

Example In the previous example, the random variables [eq2] could in principle have some form of dependence. If we assume that they are statistically independent, then we are placing a further restriction on their joint distribution, that is, we are adding an assumption to our statistical model.

Example Suppose that for the same n individuals we also collect weight measurements [eq3], and we assume that there is a linear relation between weight and height, described by a regression equation[eq4]where $lpha $ and $eta $ are regression coefficients and $arepsilon _{i}$ is an error term. This is a statistical model because we have placed a restriction on the set of joint distributions that could have generated the couples [eq5]: we have ruled out all the joint distributions in which the two variables have a non-linear relation (e.g., quadratic).

Example If we assume that all the errors $arepsilon _{i}$ in the previous regression equation have the same variance (i.e., the errors are not heteroskedastic), then we are placing a further restriction on the set of data-generating distributions. Thus, we have yet another statistical model.

Parametric models

The previous examples have illustrated that a model is just a set of probability distributions that might have generated the observed data. Denote such a set by $Phi $.

When the set $Phi $ is put into correspondence with a set [eq6] of real vectors, then we have a parametric model.

The set $Theta $ is called parameter space and any one of its members $\theta \in \Theta $ is called a parameter.

Example If we assume, as we did in the first example above, that the height measurements [eq7] come from a normal distribution, then the set $Phi $ is the set of all normal distributions. But a normal distribution is completely characterized by its mean mu and its variance sigma^2. As a consequence, each member of $Phi $ is put in correspondence with a vector of parameters [eq8]. The mean mu can take any real value and the variance sigma^2 needs to be positive. Therefore, the parameter space is [eq9].

How is a statistical model used?

What do we do after selecting a statistical model, that is, after restricting our attention to a set $Phi $ of probability distributions that could have generated the data (and to a parameter space $Theta $ put into correspondence with $Phi $)?

The typical things we do are:

Popular models

There are countless statistical models as their number is limited only by statisticians' imagination. However, you might want to familiarize yourself with two of the most popular models:

More details

More details about statistical modelling can be found in the lecture on statistical inference.

Keep reading the glossary

Previous entry: Stationary sequence

Next entry: Support of a random variable

The book

Most of the learning materials found on this website are now available in a traditional textbook format.