Let and be two random variables. The linear correlation coefficient (or Pearson's correlation coefficient) between and , denoted by or by , is defined as follows:where is the covariance between and and and are the standard deviations of and . Of course, the linear correlation coefficient is well-defined only as long as , and exist and are well-defined. Moreover, while the ratio is well-defined only if and are strictly greater than zero, it is often assumed that when one of the two standard deviations is zero. This is equivalent to assuming that , because when one of the two standard deviations is zero.
Linear correlation is a measure of dependence (or association) between two random variables. Its interpretation is similar to the interpretation of covariance (see the lecture entitled Covariance for a detailed explanation).
The correlation between and provides a measure of the degree to which and tend to "move together": indicates that deviations of and from their respective means tend to have the same sign; indicates that deviations of and from their respective means tend to have opposite signs; when , and do not display any of these two tendencies.
Linear correlation has the property of being bounded between and :
Thanks to this property, correlation allows to easily understand the intensity of the linear dependence between two random variables: the closer correlation is to , the stronger the positive linear dependence between and is (and the closer it is to , the stronger the negative linear dependence between and is).
The following terminology is often used:
If then and are said to be positively linearly correlated (or simply positively correlated).
If then and are said to be negatively linearly correlated (or simply negatively correlated).
If then and are said to be linearly correlated (or simply correlated).
If then and are said to be uncorrelated. Also note that , therefore two random variables and are uncorrelated whenever .
The following example shows how to compute the coefficient of linear correlation between two discrete random variables.
Example Let be a -dimensional random vector and denote its components by and . Let the support of be and its joint probability mass function beThe support of isand its probability mass function isThe expected value of isThe expected value of isThe variance of isThe standard deviation of is:The support of is:and its probability mass function isThe expected value of isThe expected value of isThe variance of isThe standard deviation of isUsing the transformation theorem, we can compute the expected value of :Hence, the covariance between and isand the linear correlation coefficient is:
The following sections contain more details about the linear correlation coefficient.
Let be a random variable, then
This is proved as follows:where we have used the fact that
The linear correlation coefficient is symmetric:
This is proved as follows:where we have used the fact that covariance is symmetric:
Below you can find some exercises with explained solutions.
Let be a discrete random vector and denote its components by and . Let the support of beand its joint probability mass function be
Compute the coefficient of linear correlation between and .
The support of isand its marginal probability mass function isThe expected value of isThe expected value of isThe variance of isThe standard deviation of isThe support of isand its marginal probability mass function isThe expected value of isThe expected value of isThe variance of isThe standard deviation of isUsing the transformation theorem, we can compute the expected value of :Hence, the covariance between and isand the coefficient of linear correlation between and is
Let be a discrete random vector and denote its entries by and . Let the support of beand its joint probability mass function be
Compute the covariance between and .
The support of isand its marginal probability mass function isThe mean of isThe expected value of isThe variance of isThe standard deviation of isThe support of isand its probability mass function isThe mean of isThe expected value of isThe variance of isThe standard deviation of isThe expected value of the product can be derived using the transformation theoremTherefore, putting pieces together, the covariance between and isand the coefficient of linear correlation between and is
Let be an absolutely continuous random vector with support and let its joint probability density function beCompute the covariance between and .
The support of isWhen , the marginal probability density function of is , while, when , the marginal probability density function of can be obtained by integrating out of the joint probability density as follows:Thus, the marginal probability density function of isThe expected value of isThe expected value of isThe variance of isThe standard deviation of isThe support of isWhen , the marginal probability density function of is , while, when , the marginal probability density function of can be obtained by integrating out of the joint probability density as follows:We do not explicitly compute the integral, but we write the marginal probability density function of as follows:The expected value of isThe expected value of isThe variance of isThe standard deviation of isThe expected value of the product can be computed by using the transformation theorem:Hence, by the covariance formula, the covariance between and isand the coefficient of linear correlation between and is
Most learning materials found on this website are now available in a traditional textbook format.