Search for probability and statistics terms on Statlect
StatLect

Multinomial distribution

by , PhD

The multinomial distribution is a multivariate discrete distribution that generalizes the binomial distribution.

Table of Contents

How the distribution is used

If you perform n times a probabilistic experiment that can have only two outcomes, then the number of times you obtain one of the two outcomes is a binomial random variable.

If you perform n times an experiment that can have K outcomes (K can be any natural number) and you denote by X_i the number of times that you obtain the i-th outcome, then the random vector X defined as[eq1]is a multinomial random vector.

Prerequisite

A multinomial vector can be seen as a sum of mutually independent Multinoulli random vectors.

This connection between the multinomial and Multinoulli distributions will be illustrated in detail in the rest of this lecture and will be used to demonstrate several properties of the multinomial distribution.

For this reason, we highly recommend to study the Multinoulli distribution before reading the following sections.

Definition

Multinomial random vectors are characterized as follows.

Definition Let X be a Kx1 discrete random vector. Let $nin U{2115} $. Let the support of X be the set of Kx1 vectors having non-negative integer entries summing up to n:[eq2]Let $p_{1}$, ..., $p_{K}$ be K strictly positive numbers such that[eq3]We say that X has a multinomial distribution with probabilities $p_{1}$, ..., $p_{K}$ and number of trials n, if its joint probability mass function is[eq4]where [eq5] is the multinomial coefficient.

Representation as a sum of Multinoulli random vectors

The connection between the multinomial and the Multinoulli distribution is illustrated by the following propositions.

Proposition If a random variable X has a multinomial distribution with probabilities $p_{1}$, ..., $p_{K}$ and number of trials $n=1$, then it has a Multinoulli distribution with probabilities $p_{1}$, ..., $p_{K}$.

Proof

The support of X is [eq6] and its joint probability mass function is[eq7]But[eq8]because, for each $i=1,ldots ,K$, either $x_{i}=0$ or $x_{i}=1,$ and [eq9]. As a consequence,[eq10]which is the joint probability mass function of a Multinoulli distribution.

Proposition A random vector X having a multinomial distribution with parameters [eq11] and n can be written as[eq12]where [eq13] are n independent random vectors all having a Multinoulli distribution with parameters [eq14].

Proof

The sum [eq15] is equal to the vector [eq16] when[eq17]Provided $x_{i}geq 0$ for each i and [eq18], there are several different realizations of the vector [eq19] satisfying these conditions. Since [eq20] are Multinoulli variables, each of these realizations has probability[eq21](see also the proof of the previous proposition). Furthermore, the number of the realizations satisfying the above conditions is equal to the number of partitions of n objects into K groups having numerosities [eq22] (see the lecture entitled Partitions), which in turn is equal to the multinomial coefficient [eq23]Therefore,[eq24]which proves that X and [eq25] have the same distribution.

Expected value

The expected value of a multinomial random vector X is[eq26]where the Kx1 vector p is defined as follows:[eq27]

Proof

Using the fact that X can be written as a sum of n Multinoulli variables with parameters [eq28], we obtain[eq29]where [eq30] is the expected value of a Multinoulli random variable.

Covariance matrix

The covariance matrix of a multinomial random vector X is[eq31]where Sigma is a $K	imes K$ matrix whose generic entry is[eq32]

Proof

Since X can be represented as a sum of n independent Multinoulli random variables with parameters [eq33], we obtain[eq34]

Joint moment generating function

The joint moment generating function of a multinomial random vector X is defined for any $tin U{211d} ^{K}$:[eq35]

Proof

Since X can be written as a sum of n independent Multinoulli random vectors with parameters [eq36], the joint moment generating function of X is derived from that of the summands:[eq37]

Joint characteristic function

The joint characteristic function of X is[eq38]

Proof

The derivation is similar to the derivation of the joint moment generating function (see above):[eq39]

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

A shop selling two items, labeled A and B, needs to construct a probabilistic model of the sales that will be generated by its next 10 customers. Each time a customer arrives, only three outcomes are possible: 1) nothing is sold; 2) one unit of item A is sold; 3) one unit of item B is sold. It has been estimated that the probabilities of these three outcomes are 0.50, 0.25 and 0.25 respectively. Furthermore, the shopping behavior of a customer is independent of the shopping behavior of all other customers. Denote by X a $3	imes 1$ vector whose entries $X_{1},$ X_2 and $X_{3}$ are equal to the number of times each of the three outcomes occurs. Derive the expected value and the covariance matrix of X.

Solution

The vector X has a multinomial distribution with parameters[eq40]and $n=10$. Therefore, its expected value is[eq41]and its covariance matrix is[eq42]

Exercise 2

Given the assumptions made in the previous exercise, suppose that item A costs $1,000 and item B costs $2,000. Derive the expected value and the variance of the total revenue generated by the 10 customers.

Solution

The total revenue Y can be written as a linear transformation of the vector X:[eq43]where[eq44]By the linearity of the expected value operator, we obtain[eq45]By using the formula for the covariance matrix of a linear transformation, we obtain[eq46]

How to cite

Please cite as:

Taboga, Marco (2021). "Multinomial distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/probability-distributions/multinomial-distribution.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.