Index > Fundamentals of probability

Linear correlation

by Marco Taboga, PhD

Linear correlation is a measure of dependence between two random variables.

It has the following characteristics:

it ranges between -1 and 1;
it is proportional to covariance;
its interpretation is very similar to that of covariance (see here).

Scatter plots of couples of random variables having different correlation coefficients.

Table of contents

Definition
Zero standard deviations
Interpretation
Terminology
Example
More details
1. Correlation of a random variable with itself
2. Symmetry
Solved exercises

Definition

Let and be two random variables.

The linear correlation coefficient (or Pearson's correlation coefficient) between and is [eq1] where:

is the covariance between and ;
and are the standard deviations of and .

The linear correlation coefficient is well-defined only as long as , and exist and are well-defined.

It is often denoted by $ho _{XY}$ .

Zero standard deviations

In principle, the ratio is well-defined only if and are strictly greater than zero.

However, it is often assumed that when one of the two standard deviations is zero.

This is equivalent to assuming that because when one of the two standard deviations is zero.

Interpretation

The interpretation is similar to the interpretation of covariance: the correlation between and provides a measure of how similar their deviations from the respective means are (see the lecture on Covariance for a detailed explanation).

Linear correlation ranges between and :

Thanks to this property, correlation allows us to easily understand the intensity of the linear dependence between two random variables:

the closer correlation is to , the stronger the positive linear dependence between and is;
the closer it is to , the stronger the negative linear dependence between and is.

Terminology

The following terminology is often used:

If then and are said to be positively linearly correlated (or simply positively correlated).
If then and are said to be negatively linearly correlated (or simply negatively correlated).
If then and are said to be linearly correlated (or simply correlated).
If then and are said to be uncorrelated. Also note that . Therefore, two random variables and are uncorrelated whenever .

Example

In this example we show how to compute the coefficient of linear correlation between two discrete random variables.

Let be a -dimensional random vector and denote its entries by and .

Let the support of be and its joint probability mass function be [eq21]

The support of isand its probability mass function is [eq23]

The expected value of is [eq24]

The expected value of $X_{1}^{2}$ is [eq25]

The variance of is [eq26]

The standard deviation of is [eq27]

The support of is:and its probability mass function is [eq29]

The expected value of is [eq30]

The expected value of $X_{2}^{2}$ is [eq31]

The variance of is [eq32]

The standard deviation of is [eq33]

Using the transformation theorem, we can compute the expected value of $X_{1}X_{2}$ : [eq34]

Hence, the covariance between and isand the linear correlation coefficient is [eq36]

More details

The following sections contain more details about the linear correlation coefficient.

Correlation of a random variable with itself

Let be a random variable, then

Proof

This is proved as follows: [eq38] where we have used the fact that

Symmetry

The linear correlation coefficient is symmetric:

Proof

This is proved as follows: [eq41] where we have used the fact that covariance is symmetric:

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Let be a discrete random vector and denote its components by and .

Let the support of be [eq43] and its joint probability mass function be [eq44]

Compute the coefficient of linear correlation between and .

Solution

The support of isand its marginal probability mass function is [eq46] The expected value of is [eq47] The expected value of $X_{1}^{2}$ is [eq48] The variance of is [eq49] The standard deviation of is [eq50] The support of isand its marginal probability mass function is [eq52] The expected value of is [eq53] The expected value of $X_{2}^{2}$ is [eq54] The variance of is [eq55] The standard deviation of is [eq56] Using the transformation theorem, we can compute the expected value of $X_{1}X_{2}$ : [eq57] Hence, the covariance between and is [eq58] and the coefficient of linear correlation between and is [eq59]

Exercise 2

Let be a discrete random vector and denote its entries by and .

Let the support of be [eq60] and its joint probability mass function be [eq61]

Compute the coefficient of linear correlation between and .

Solution

The support of isand its marginal probability mass function is [eq63] The mean of is [eq64] The expected value of $X_{1}^{2}$ is [eq65] The variance of is [eq66] The standard deviation of is [eq67] The support of isand its probability mass function is [eq69] The mean of is [eq70] The expected value of $X_{2}^{2}$ is [eq71] The variance of is [eq72] The standard deviation of is [eq73] The expected value of the product $X_{1}X_{2}$ can be derived using the transformation theorem [eq74] Therefore, putting pieces together, the covariance between and $X_{2}$ is [eq75] and the coefficient of linear correlation between and is [eq76]

Exercise 3

Let be a continuous random vector with support and let its joint probability density function be [eq79]

Compute the coefficient of linear correlation between and .

Solution

The support of isWhen $y otin R_{Y}$ , the marginal probability density function of is , while, when $yin R_{Y}$ , the marginal probability density function of can be obtained by integrating out of the joint probability density as follows: [eq81] Thus, the marginal probability density function of is [eq82] The expected value of is [eq83] The expected value of $Y^{2}$ is [eq84] The variance of is [eq85] The standard deviation of is [eq86] The support of isWhen $x otin R_{X}$ , the marginal probability density function of is , while, when $xin R_{X}$ , the marginal probability density function of can be obtained by integrating out of the joint probability density as follows: [eq88] We do not explicitly compute the integral, but we write the marginal probability density function of as follows: [eq89] The expected value of is [eq90] The expected value of $X^{2}$ is [eq91] The variance of is [eq92] The standard deviation of is [eq93] The expected value of the product can be computed by using the transformation theorem: [eq94] Hence, by the covariance formula, the covariance between and is [eq95] and the coefficient of linear correlation between and is [eq96]

How to cite

Please cite as:

Taboga, Marco (2021). "Linear correlation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/linear-correlation.