StatlectThe Digital Textbook
Index > Fundamentals of probability

Linear correlation

Let X and Y be two random variables. The linear correlation coefficient (or Pearson's correlation coefficient) between X and Y, denoted by [eq1] or by $
ho _{XY}$, is defined as follows:[eq2]where [eq3] is the covariance between X and Y and [eq4] and [eq5] are the standard deviations of X and Y. Of course, the linear correlation coefficient is well-defined only as long as [eq3], [eq7] and [eq5] exist and are well-defined. Moreover, while the ratio is well-defined only if [eq9] and [eq5] are strictly greater than zero, it is often assumed that [eq11] when one of the two standard deviations is zero. This is equivalent to assuming that [eq12], because [eq13] when one of the two standard deviations is zero.

Interpretation

Linear correlation is a measure of dependence (or association) between two random variables. Its interpretation is similar to the interpretation of covariance (see the lecture entitled Covariance for a detailed explanation).

The correlation between X and Y provides a measure of the degree to which X and Y tend to "move together": [eq14] indicates that deviations of X and Y from their respective means tend to have the same sign; [eq15] indicates that deviations of X and Y from their respective means tend to have opposite signs; when [eq16], X and Y do not display any of these two tendencies.

Linear correlation has the property of being bounded between $-1$ and 1:[eq17]

Thanks to this property, correlation allows to easily understand the intensity of the linear dependence between two random variables: the closer correlation is to 1, the stronger the positive linear dependence between $X $ and Y is (and the closer it is to $-1$, the stronger the negative linear dependence between X and Y is).

Terminology

The following terminology is often used:

  1. If [eq18] then X and Y are said to be positively linearly correlated (or simply positively correlated).

  2. If [eq19] then X and Y are said to be negatively linearly correlated (or simply negatively correlated).

  3. If [eq20] then X and Y are said to be linearly correlated (or simply correlated).

  4. If [eq21] then X and Y are said to be uncorrelated. Also note that [eq22] $=0$, therefore two random variables X and Y are uncorrelated whenever [eq23].

Example

The following example shows how to compute the coefficient of linear correlation between two discrete random variables.

Example Let X be a $2$-dimensional random vector and denote its components by X_1 and X_2. Let the support of X be [eq24]and its joint probability mass function be[eq25]The support of X_1 is[eq26]and its probability mass function is[eq27]The expected value of X_1 is[eq28]The expected value of $X_{1}^{2}$ is[eq29]The variance of X_1 is[eq30]The standard deviation of X_1 is:[eq31]The support of X_2 is:[eq32]and its probability mass function is[eq33]The expected value of X_2 is[eq34]The expected value of $X_{2}^{2}$ is[eq35]The variance of X_2 is[eq36]The standard deviation of X_2 is[eq37]Using the transformation theorem, we can compute the expected value of $X_{1}X_{2}$:[eq38]Hence, the covariance between X_1 and X_2 is[eq39]and the linear correlation coefficient is:[eq40]

More details

The following sections contain more details about the linear correlation coefficient.

Correlation of a random variable with itself

Let X be a random variable, then[eq41]

Proof

This is proved as follows:[eq42]where we have used the fact that[eq43]

Symmetry

The linear correlation coefficient is symmetric:[eq44]

Proof

This is proved as follows:[eq45]where we have used the fact that covariance is symmetric:[eq46]

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Let X be a $2	imes 1$ discrete random vector and denote its components by X_1 and X_2. Let the support of X be[eq47]and its joint probability mass function be[eq48]

Compute the coefficient of linear correlation between X_1 and X_2.

Solution

The support of X_1 is[eq49]and its marginal probability mass function is[eq50]The expected value of X_1 is[eq51]The expected value of $X_{1}^{2}$ is[eq52]The variance of X_1 is[eq53]The standard deviation of X_1 is[eq54]The support of X_2 is[eq55]and its marginal probability mass function is[eq56]The expected value of X_2 is[eq57]The expected value of $X_{2}^{2}$ is[eq58]The variance of X_2 is[eq59]The standard deviation of X_1 is[eq60]Using the transformation theorem, we can compute the expected value of $X_{1}X_{2}$:[eq61]Hence, the covariance between X_1 and X_2 is[eq62]and the coefficient of linear correlation between X_1 and X_2 is[eq63]

Exercise 2

Let X be a $2	imes 1$ discrete random vector and denote its entries by X_1 and X_2. Let the support of X be[eq64]and its joint probability mass function be[eq65]

Compute the covariance between X_1 and X_2.

Solution

The support of X_1 is[eq66]and its marginal probability mass function is[eq67]The mean of X_1 is[eq68]The expected value of $X_{1}^{2}$ is[eq69]The variance of X_1 is[eq70]The standard deviation of X_1 is[eq71]The support of X_2 is[eq72]and its probability mass function is[eq73]The mean of X_2 is[eq74]The expected value of $X_{2}^{2}$ is[eq75]The variance of X_2 is[eq76]The standard deviation of X_2 is[eq77]The expected value of the product $X_{1}X_{2}$ can be derived using the transformation theorem[eq78]Therefore, putting pieces together, the covariance between X_1 and $X_{2} $ is[eq79]and the coefficient of linear correlation between X_1 and X_2 is[eq80]

Exercise 3

Let [eq81] be an absolutely continuous random vector with support [eq82]and let its joint probability density function be[eq83]Compute the covariance between X and Y.

Solution

The support of Y is[eq84]When $y
otin R_{Y}$, the marginal probability density function of Y is 0, while, when $yin R_{Y}$, the marginal probability density function of Y can be obtained by integrating x out of the joint probability density as follows:[eq85]Thus, the marginal probability density function of Y is[eq86]The expected value of Y is[eq87]The expected value of $Y^{2}$ is[eq88]The variance of Y is[eq89]The standard deviation of Y is[eq90]The support of X is[eq91]When $x
otin R_{X}$, the marginal probability density function of X is 0, while, when $xin R_{X}$, the marginal probability density function of X can be obtained by integrating $y$ out of the joint probability density as follows:[eq92]We do not explicitly compute the integral, but we write the marginal probability density function of X as follows:[eq93]The expected value of X is[eq94]The expected value of $X^{2}$ is[eq95]The variance of X is[eq96]The standard deviation of X is[eq97]The expected value of the product $XY$ can be computed by using the transformation theorem:[eq98]Hence, by the covariance formula, the covariance between X and Y is[eq99]and the coefficient of linear correlation between X and Y is[eq100]

The book

Most learning materials found on this website are now available in a traditional textbook format.