next up previous contents index
Next: Exercises Up: Correlation Previous: Correlation

Computing the Pearson Correlation Coefficient

 

One formula for the Pearson correlation coefficient r is as follows:


 \begin{displaymath}
r = \frac{ \sum{X Y} - \frac{ (\sum{X})(\sum{Y})}{n} }
{ \s...
... \right)
\left( \sum{Y^2} -\frac{ (\sum{Y})^2}{n} \right) } }
\end{displaymath} (10.1)

The following numerical example shows how the formula ( 10.1) is used:


\begin{displaymath}\begin{array}{ll}
{\bf X} & {\bf Y} \\
1 & 2 \\
3 & 5 \\
4 & 5 \\
4 & 8
\end{array}\end{displaymath}

$\sum{X Y} = (1)(2) + (3)(5) + (4)(5) + (4)(8) = 69 $
$\sum{X} = 1 + 3 + 4 + 4 = 12$
$\sum{Y} = 2 + 5 + 5 + 8 = 20$
$\sum{X^2} = 1^2 + 3^2 + 4^2 + 4^2 = 42$
$\sum{Y^2} = 2^2 + 5^2 + 5^2 + 8^2 = 118$

\begin{displaymath}r = \frac{ 69 - \frac{ (12)(20)}{4} }
{ \sqrt{ \left( 42 -\f...
...}{4} \right)
\left( 118 -\frac{ (20)^2}{4} \right) } } = .866
\end{displaymath}

We present a second formula that is harder to compute but easier to interpret.  

 \begin{displaymath}
r = \frac{ \sum (X-\overline{X})(Y-\overline{Y}) }
{ \sqrt{ \sum (X-\overline{X})^2 } \sqrt{ \sum (Y-\overline{Y})^2 } }
\end{displaymath} (10.2)

Consider the Ad Spending example at the start of this chapter. Many of the (X, Y) points are simultaneously above average, since companies that have higher than average Advertising Spending also have higher than average Impressions. Both $X-\overline{X}$ and $Y-\overline{Y}$ are positive for these companies. Therefore, the product $(X-\overline{X})(Y-\overline{Y})$ is positive for these companies. Most of the remaining companies have lower than average Spending and lower than average Impressions. Both $X-\overline{X}$ and $Y-\overline{Y}$ are negative for these companies, but the product $(X-\overline{X})(Y-\overline{Y})$ is still positive! Hence the numerator in ( 10.2) tends to be a large positive number for the Ad Spending data.

If the points were sloped downwards, then high X-values tend to go with low Y-values, and the product $(X-\overline{X})(Y-\overline{Y})$ is negative for these points. This is partly how the correlation formula ( 10.2) works. The denominator terms have been put in to ensure that r does not go beyond -1 or +1.


next up previous contents index
Next: Exercises Up: Correlation Previous: Correlation

2003-09-08