StatlectThe Digital Textbook
Index > Fundamentals of probability

Linear correlation

Let X and Y be two random variables. The linear correlation coefficient (or Pearson's correlation coefficient) between X and Y, denoted by [eq1] or by $
ho _{XY}$, is defined as follows:[eq2]where [eq3] is the covariance between X and Y and [eq4] and [eq5] are the standard deviations of X and Y. Of course, the linear correlation coefficient is well-defined only as long as [eq3], [eq7] and [eq5] exist and are well-defined. Moreover, while the ratio is well-defined only if [eq9] and [eq5] are strictly greater than zero, it is often assumed that [eq11] when one of the two standard deviations is zero. This is equivalent to assuming that [eq12], because [eq13] when one of the two standard deviations is zero.

Interpretation

Linear correlation is a measure of dependence (or association) between two random variables. Its interpretation is similar to the interpretation of covariance (see the lecture entitled Covariance for a detailed explanation).

The correlation between X and Y provides a measure of the degree to which X and Y tend to "move together": [eq14] indicates that deviations of X and Y from their respective means tend to have the same sign; [eq15] indicates that deviations of X and Y from their respective means tend to have opposite signs; when [eq16], X and Y do not display any of these two tendencies.

Linear correlation has the property of being bounded between $-1$ and 1:[eq17]

Thanks to this property, correlation allows to easily understand the intensity of the linear dependence between two random variables: the closer correlation is to 1, the stronger the positive linear dependence between $X $ and Y is (and the closer it is to $-1$, the stronger the negative linear dependence between X and Y is).

Terminology

The following terminology is often used:

  1. If [eq18] then X and Y are said to be positively linearly correlated (or simply positively correlated).

  2. If [eq19] then X and Y are said to be negatively linearly correlated (or simply negatively correlated).

  3. If [eq20] then X and Y are said to be linearly correlated (or simply correlated).

  4. If [eq21] then X and Y are said to be uncorrelated. Also note that [eq22] $=0$, therefore two random variables X and Y are uncorrelated whenever [eq23].

Example

The following example shows how to compute the coefficient of linear correlation between two discrete random variables.

Example Let X be a $2$-dimensional random vector and denote its components by X_1 and X_2. Let the support of X be: [eq24]and its joint probability mass function be:[eq25]The support of X_1 is:[eq26]and its probability mass function is:[eq27]The expected value of X_1 is:[eq28]The expected value of $X_{1}^{2}$ is:[eq29]The variance of X_1 is:[eq30]The standard deviation of X_1 is:[eq31]The support of X_2 is:[eq32]and its probability mass function is:[eq33]The expected value of X_2 is:[eq34]The expected value of $X_{2}^{2}$ is:[eq35]The variance of X_2 is:[eq36]The standard deviation of X_2 is:[eq37]Using the transformation theorem, we can compute the expected value of $X_{1}X_{2}$:[eq38]Hence, the covariance between X_1 and X_2 is:[eq39]and the linear correlation coefficient is:[eq40]

More details

Correlation of a random variable with itself

Let X be a random variable, then:[eq41]

Proof

This is proved as follows:[eq42]where we have used the fact that:[eq43]

Symmetry

The linear correlation coefficient is symmetric:[eq44]

Proof

This is proved as follows:[eq45]where we have used the fact that covariance is symmetric:[eq46]

Solved exercises

Below you can find some exercises with explained solutions:

  1. Exercise set 1

The book

Most learning materials found on this website are now available in a traditional textbook format.