Next: Computing the Pearson Correlation Up: Correlation Previous: Correlation

# Correlation

The following data appeared in the Wall Street Journal in 1984. Advertisements were selected by an annual survey conducted by Video Board Tests, Inc., a New York ad-testing company, based on interviews with 20,000 adults who were asked to name the most outstanding TV commercial they had seen, noticed, and liked. The retained impressions were based on a survey of 4,000 adults, in which regular product users were asked to cite a commercial they had seen for that product category in the past week. TV Ad Budget' was the 1983 advertising budget in \$ millions. Impressions' is the estimated number of million impressions per week.

           Company             TV Ad   Impressions
Budget

MILLER LITE           50.1      32.1
PEPSI                 74.1      99.6
STROH'S               19.3      11.7
FED'L EXPRESS         22.9      21.9
BURGER KING           82.4      60.8
COCO-COLA             40.1      78.6
MC DONALD'S          185.9      92.4
MCI                   26.9      50.7
DIET COLA             20.4      21.4
FORD                 166.2      40.1
LEVI'S                27.0      40.8
BUD LITE              45.6      10.4
ATT/BELL             154.9      88.9
CALVIN KLEIN           5.0      12.0
WENDY'S               49.7      29.2
POLAROID              26.9      38.0
SHASTA                 5.7      10.0
MEOW MIX               7.6      12.3
OSCAR MEYER            9.2      23.4
CREST                 32.4      71.1
KIBBLES 'N BITS        6.1       4.4


Figure  10.1 shows a scatterplot  of Impressions versus Spending. Note that the points seem to loosely fall around a line sloped upwards. We say that there is a positive linear association, or linear relationship , between spending and the number of impressions made.

If the points fall around a straight line sloped downwards, we say that there is a negative association.

The direction and strength of association    is often expressed in a single number called the (Pearson) correlation coefficient . Typically denoted by r, the correlation coefficient r is a number between -1 and +1, inclusive. A value of r=0 means that no linear association exists; the points either look like a random scatter, or fall around a horizontal line. A value of r=+1 (or r=-1) indicates a perfectly linear relationship; all the points fall on a straight line sloped upwards (downwards).

The correlation between TV Ad Budget and Impressions in the data above is +.65.

Figure  10.2 shows various scatterplots with different correlations. Although .5 is halfway between 0 and 1, note that the plot corresponding to r=.5 barely shows a pattern of association. In practice, plots can show correlations up to .3 purely by accident (e.g. correlation between GPA and, say, shoe size). When correlation reaches +1 or -1, all points fall on a straight line.

Next: Computing the Pearson Correlation Up: Correlation Previous: Correlation

2003-09-08