Search for probability and statistics terms on Statlect
StatLect

Covariance formula

by , PhD

A covariance formula is an equation used to define or calculate the covariance between two variables.

There are several formulae that can be used, depending on the situation.

Table of Contents

General formula

We begin with a general formula, used to define the covariance between two random variables X and Y:[eq1]where:

This is a definition and it is useful because of its generality. However, you need to use the equations below if you need to compute covariance in practice.

Formula for discrete variables

When the two random variables are discrete, the above formula can be written as[eq2]where:

In other words, we sum the products of the deviations of the two random variables from their respective means. Each product is weighted by a probability.

Example

Suppose that the probability mass function is[eq5]

The support $R_{XY}$ contains three possible couples:[eq6]

The calculations are performed as follows:[eq7]

Formula for continuous variables

When the two random variables are continuous, the covariance formula involves a double integral:[eq8]where:

How to compute the double integral

The double integral is computed in two steps:

  1. we calculate the inner integral:[eq10]which will be found to be a function of $y$ only because x is "integrated out";

  2. we compute the outer integral[eq11]

Example

Let the joint probability density function be[eq12]

In order to compute the expected values, we first need to find the marginal density functions:[eq13]

We can now work out the covariance:[eq14]

Covariance formula based on moments

Instead of using the formulae above to find the covariance, it is often easier to use the following equivalent equation based on moments and cross moments:[eq15]

Example

In the previous example, after finding the expected values of X and Y, we could have done:[eq16]

Use with moment generating function

When we know the joint moment generating function of X and Y, we can use it to compute the moments [eq17], [eq18] and [eq19] and then plug their values in the formula above.

Formulae for the sample covariance

Until now, we have discussed how to calculate the covariance between two random variables.

However, there is another concept, that of sample covariance, which is used to measure the degree of association between two observed variables in a sample of data.

Given n observed couples [eq20]their sample covariance is calculated as[eq21]where $overline{x}$ and $overline{y}$ are the sample means of the two variables:[eq22]

Unbiased sample covariance

An alternative to the formula above is the so-called unbiased sample covariance[eq23]

The only difference is that we divide by $n-1$ instead of dividing by n.

If the n observed couples are independent draws from the joint distribution of two random variables X and Y, then $s_{xy}$ is an unbiased estimator of [eq24].

Example

In this example, there are four observed couples, whose values are reported in the columns of the table below.

The last two rows of the table are used to calculate the means and the sample covariance (biased and unbiased).

Observation number xj Deviation of xj from mean yj Deviation of yj from mean Product of deviations
1 1 -1 5 2 -2
2 3 1 0 -3 -3
3 0 -2 -1 -4 8
4 4 2 8 5 10
Sum 8 0 12 0 13
Divide sum by n 2 3 13/4
Divide sum by n-1 13/3

More details, proofs and exercises

More details about these formulae - including proofs and solved exercises - can be found in the lecture on Covariance.

Keep reading the glossary

Previous entry: Countable additivity

Next entry: Covariance stationary

How to cite

Please cite as:

Taboga, Marco (2021). "Covariance formula", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/covariance-formula.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.