Covariance is a measure of association between two random variables. It is positive if the deviations of the two variables from their respective means tend to have the same sign and negative if the deviations tend to have opposite signs.
The covariance between two random
variables
and
,
denoted by
,
is defined as
follows:
provided
the above expected values exist and are
well-defined.
To understand the meaning of covariance, let us analyze how it is constructed.
It is the expected value of the product
,
where
and
are defined as
follows:
i.e.
and
are the deviations of
and
from their respective means.
When
is positive, it means that:
either
and
are both above their respective means;
or
and
are both below their respective means.
On the contrary, when
is negative, it means that:
either
is above its mean and
is below its mean;
or
is below its mean and
is above its mean.
In other words, when
is positive,
and
are concordant (their deviations from the mean have the same
sign); when
is negative,
and
are discordant (their deviations from the mean have opposite
signs).
Since
a
positive covariance means that on average
and
are concordant; on the contrary, a negative covariance means that on average
and
are discordant.
Thus, the covariance of
and
provides a measure of the degree to which
and
tend to "move together": a positive covariance indicates that the deviations
of
and
from their respective means tend to have the same sign; a negative covariance
indicates that deviations of
and
from their respective means tend to have opposite signs. Intuitively, we could
express the concept as
follows:
When
,
and
do not display any of the above two tendencies.
The following covariance formula is
often used to compute the covariance between two random
variables:
First expand the
product:
Then,
by linearity of the expected
value:![[eq14]](http://images2.statlect.com/covari1__48.png)
This formula also makes clear that the covariance exists and is well-defined
only as long as
,
and
exist and are well-defined.
The following example shows how to compute the covariance between two discrete random variables.
Example
Let
be a
random vector and denote its components by
and
.
Let the support of
be:
and
its joint probability
mass function
be:
The
support of
is:
and
its marginal
probability mass function
is:
The
expected value of
is:
The
support of
is:
and
its marginal probability mass function
is:
The
expected value of
is:
Using
the transformation theorem,
we can compute the expected value of
:
Hence,
the covariance between
and
is:
The following subsections contain more details on covariance.
Let
be a random variable,
then:
It descends from the definition of
variance:
The covariance operator is
symmetric:![[eq30]](s.gif)
Using
the definition of
covariance:![[eq31]](s.gif)
Let
and
be two random variables. Then the variance of their sum is:
The above formula is derived as
follows:
Thus, to compute the variance of the sum of two random variables we need to know their covariance.
Obviously then, the
formula:
holds
only when
and
have zero covariance.
The formula for the variance of a sum of two random variables can be generalized to sums of more than two random variables (see variance of the sum of n random variables).
The covariance operator is linear in both of its arguments. Let
,
and
be three random variables and let
and
be two constants. Then, the first argument is
linear:![[eq35]](s.gif)
This
is proved using the linearity of the expected
value:
By symmetry, also the second argument is
linear:
Linearity in both the first and second argument is called bilinearity.
By iteratively applying the above arguments, one can prove that bilinearity
holds also for linear combinations of more than two
variables:
The variance of the sum of
random variables
is:
This is demonstrated using the bilinearity
of the covariance operator (see
above):![[eq40]](http://images1.statlect.com/covari1__97.png)
This formula implies that when all the random variables in the sum have zero
covariance with each other, then the variance of the sum is just the sum of
the
variances:
This
is true, for example, when the random variables in the sum are mutually
independent (because independence implies zero
covariance).
Below you can find some exercises with explained solutions:
Exercise set 1 (covariance between discrete random variables).
Exercise set 2 (covariance between absolutely continuous random variables).

Most learning materials found on this website have been collected in a single volume and are now available in a traditional textbook format.