StatlectThe Digital Textbook
Index > Fundamentals of probability

Covariance

Covariance is a measure of association between two random variables. It is positive if the deviations of the two variables from their respective means tend to have the same sign and negative if the deviations tend to have opposite signs.

The covariance between two random variables X and Y, denoted by [eq1], is defined as follows:[eq2]provided the above expected values exist and are well-defined.

Interpretation

To understand the meaning of covariance, let us analyze how it is constructed. It is the expected value of the product [eq3], where $overline{X}$ and $overline{Y}$ are defined as follows:[eq4]i.e. $overline{X}$ and $overline{Y}$ are the deviations of X and Y from their respective means.

When [eq5] is positive, it means that:

On the contrary, when [eq6] is negative, it means that:

In other words, when [eq7] is positive, X and Y are concordant (their deviations from the mean have the same sign); when [eq8] is negative, X and Y are discordant (their deviations from the mean have opposite signs). Since[eq9]a positive covariance means that on average X and Y are concordant; on the contrary, a negative covariance means that on average X and Y are discordant.

Thus, the covariance of X and Y provides a measure of the degree to which X and Y tend to "move together": a positive covariance indicates that the deviations of X and Y from their respective means tend to have the same sign; a negative covariance indicates that deviations of X and Y from their respective means tend to have opposite signs. Intuitively, we could express the concept as follows:[eq10]

When [eq11], X and Y do not display any of the above two tendencies.

A covariance formula

The following covariance formula is often used to compute the covariance between two random variables:[eq12]

Proof

First expand the product:[eq13]Then, by linearity of the expected value:[eq14]

This formula also makes clear that the covariance exists and is well-defined only as long as [eq15], [eq16] and [eq17] exist and are well-defined.

Example

The following example shows how to compute the covariance between two discrete random variables.

Example Let X be a $2	imes 1$ random vector and denote its components by X_1 and X_2. Let the support of X be: [eq18]and its joint probability mass function be:[eq19]The support of X_1 is:[eq20]and its marginal probability mass function is:[eq21]The expected value of X_1 is:[eq22]The support of X_2 is:[eq23]and its marginal probability mass function is:[eq24]The expected value of X_2 is:[eq25]Using the transformation theorem, we can compute the expected value of $X_{1}X_{2}$:[eq26]Hence, the covariance between X_1 and X_2 is:[eq27]

More details

The following subsections contain more details on covariance.

Covariance of a random variable with itself

Let X be a random variable, then:[eq28]

Proof

It descends from the definition of variance:[eq29]

Symmetry

The covariance operator is symmetric:[eq30]

Proof

Using the definition of covariance:[eq31]

Variance of the sum of two random variables

Let X_1 and X_2 be two random variables. Then the variance of their sum is: [eq32]

Proof

The above formula is derived as follows:[eq33]

Thus, to compute the variance of the sum of two random variables we need to know their covariance.

Obviously then, the formula:[eq34]holds only when X_1 and X_2 have zero covariance.

The formula for the variance of a sum of two random variables can be generalized to sums of more than two random variables (see variance of the sum of n random variables).

Bilinearity of the covariance operator

The covariance operator is linear in both of its arguments. Let X_1, X_2 and Y be three random variables and let a_1 and a_2 be two constants. Then, the first argument is linear:[eq35]

Proof

This is proved using the linearity of the expected value:[eq36]

By symmetry, also the second argument is linear:[eq37]

Linearity in both the first and second argument is called bilinearity.

By iteratively applying the above arguments, one can prove that bilinearity holds also for linear combinations of more than two variables:[eq38]

Variance of the sum of n random variables

The variance of the sum of n random variables is:[eq39]

Proof

This is demonstrated using the bilinearity of the covariance operator (see above):[eq40]

This formula implies that when all the random variables in the sum have zero covariance with each other, then the variance of the sum is just the sum of the variances:[eq41]This is true, for example, when the random variables in the sum are mutually independent (because independence implies zero covariance).

Solved exercises

Below you can find some exercises with explained solutions:

  1. Exercise set 1 (covariance between discrete random variables).

  2. Exercise set 2 (covariance between absolutely continuous random variables).

The book

Most learning materials found on this website are now available in a traditional textbook format.