Let and be two random variables. The linear correlation coefficient (or Pearson's correlation coefficient) between and , denoted by or by , is defined as follows:where is the covariance between and and and are the standard deviations of and . Of course, the linear correlation coefficient is well-defined only as long as , and exist and are well-defined. Moreover, while the ratio is well-defined only if and are strictly greater than zero, it is often assumed that when one of the two standard deviations is zero. This is equivalent to assuming that , because when one of the two standard deviations is zero.
Linear correlation is a measure of dependence (or association) between two random variables. Its interpretation is similar to the interpretation of covariance (see the lecture entitled Covariance for a detailed explanation).
The correlation between and provides a measure of the degree to which and tend to "move together": indicates that deviations of and from their respective means tend to have the same sign; indicates that deviations of and from their respective means tend to have opposite signs; when , and do not display any of these two tendencies.
Linear correlation has the property of being bounded between and :
Thanks to this property, correlation allows to easily understand the intensity of the linear dependence between two random variables: the closer correlation is to , the stronger the positive linear dependence between and is (and the closer it is to , the stronger the negative linear dependence between and is).
The following terminology is often used:
If then and are said to be positively linearly correlated (or simply positively correlated).
If then and are said to be negatively linearly correlated (or simply negatively correlated).
If then and are said to be linearly correlated (or simply correlated).
If then and are said to be uncorrelated. Also note that , therefore two random variables and are uncorrelated whenever .
The following example shows how to compute the coefficient of linear correlation between two discrete random variables.
Example Let be a -dimensional random vector and denote its components by and . Let the support of be: and its joint probability mass function be:The support of is:and its probability mass function is:The expected value of is:The expected value of is:The variance of is:The standard deviation of is:The support of is:and its probability mass function is:The expected value of is:The expected value of is:The variance of is:The standard deviation of is:Using the transformation theorem, we can compute the expected value of :Hence, the covariance between and is:and the linear correlation coefficient is:
Let be a random variable, then:
This is proved as follows:where we have used the fact that:
The linear correlation coefficient is symmetric:
This is proved as follows:where we have used the fact that covariance is symmetric:
Below you can find some exercises with explained solutions:
Most learning materials found on this website are now available in a traditional textbook format.