We need to discuss an estimate of scale that we use in conjunction with the mean. It is a measure of deviation from the mean. For instance, the value
is the deviation of the first point from the mean. Hence, we have the n deviations:
It does not matter here whether the deviation is negative or positive. One way to get rid of the sign is to square the deviation. But we still have n squared deviations. So we will take the average of these squared deviations, except we will divide by n - 1 and not n. The resulting statistic is called the sample variance and we usually use the symbol s2 to represent it. However, the units of s2 are squared units. For example if we are data consists of the weights in pounds of individuals then s2 will be in pounds squared. We rectify this by taking the square root and we call the resulting statistic the sample standard deviation , s . In notation we have
Lets use the simple data set 11, 18, 6, 4, 8, 15, 22, for an example. The sample mean is 12, hence the deviations are -1, 6, -6, -8 3, and 10. The squared deviations are 1, 36, 36, 64, 9 and 100. Thus
s2 = 246/6 = 41. So that the sample standard deviation is
.
Of course the easy way to compute is to just enter these data into the data box and choosing summary from the analysis menu. Then check the variable name and the covariance button.
The sample standard deviation is not robust, as the table below, on the simple example with changes to the last data point, dramatically shows,
Data median mean IQ s
Set 1: 11 18 6 4 8 15 22 11 12 12 6.61
Set 2: 11 18 6 4 8 15 72 11 19.1 12 23.8
Set 3: 11 18 6 4 8 15 720 11 112 12 268
Set 4: 11 18 6 4 8 15 2200 11 323 12 828
Set 5: 11 18 6 4 8 15 7200 11 1037 12 2717
Set 6: 11 18 6 4 8 15 72000 11 10295 12 27210
Even the first change (22 to 72) brings almost a 4 fold increase in noise as measured by s. The interquartile range is robust.
Data set 1
102 131 137 63 42 12 23 49 63 21
56 68 35 63 62 19 85 38 76 29
31 16 0 8 47 40 2 44 8 16
7 43 2 50 22 1 51 34 4 78
Data set 2
1020 131 137 63 42 12 23 49 63 21
56 68 35 63 62 19 85 38 76 29
31 16 0 8 47 40 2 44 8 16
7 43 2 50 22 1 51 34 4 78
Notice that in the second data set,the 102 was changed to 1020. Which statistics were robust to this change? Which weren't?
First: 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2
Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8
Placebo: 64 49 54 64 97 66 76 44 71 89
70 72 71 55 60 62 46 77 86 71
Drug A: 40 31 50 48 152 44 74 38 81 64