We need to discuss an estimate of scale that we use in conjunction with the mean. It is a measure of deviation from the mean. For instance, the value
is the deviation of the first point from the mean. Hence, we have the *n* deviations:

It does not matter here whether the deviation is negative or positive. One way to get rid of the sign is to square the deviation. But we still have *n* squared deviations. So we will take the average of these squared deviations, except we will divide by *n - 1* and not *n*. The resulting statistic is called the **sample variance** and we usually use the symbol *s*^{2} to represent it. However, the units of *s*^{2} are squared units. For example if we are data consists of the weights in pounds of individuals then *s*^{2} will be in pounds squared. We rectify this by taking the square root and we call the resulting statistic the **sample standard deviation** , *s* . In notation we have

Lets use the simple data set 11, 18, 6, 4, 8, 15, 22, for an example. The sample mean is 12, hence the deviations are -1, 6, -6, -8 3, and 10. The squared deviations are 1, 36, 36, 64, 9 and 100. Thus
*s*^{2} = 246/6 = 41. So that the sample standard deviation is
.
Of course the easy way to compute is to just enter these data into the data box and choosing summary from the analysis menu. Then check the variable name and the **covariance** button.

The sample standard deviation is not robust, as the table below, on the simple example with changes to the last data point, dramatically shows,

Data median mean IQ s Set 1: 11 18 6 4 8 15 22 11 12 12 6.61 Set 2: 11 18 6 4 8 15 72 11 19.1 12 23.8 Set 3: 11 18 6 4 8 15 720 11 112 12 268 Set 4: 11 18 6 4 8 15 2200 11 323 12 828 Set 5: 11 18 6 4 8 15 7200 11 1037 12 2717 Set 6: 11 18 6 4 8 15 72000 11 10295 12 27210Even the first change (22 to 72) brings almost a 4 fold increase in noise as measured by

What does

- 1.
- Use the summary module to obtain these statistics for the two data sets in #1, Exercise 1.4. Using these statistics, obtain comparison boxplots of the two samples.
- 2.
- Check the robustness of the statistics in the descriptive statistics command on the following two data sets using the summary module.
Data set 1 102 131 137 63 42 12 23 49 63 21 56 68 35 63 62 19 85 38 76 29 31 16 0 8 47 40 2 44 8 16 7 43 2 50 22 1 51 34 4 78 Data set 2 1020 131 137 63 42 12 23 49 63 21 56 68 35 63 62 19 85 38 76 29 31 16 0 8 47 40 2 44 8 16 7 43 2 50 22 1 51 34 4 78

Notice that in the second data set,the 102 was changed to 1020. Which statistics were robust to this change? Which weren't? - 3.
- Same as the last exercise but change the 1020 to 10200.
- 4.
- Same as the last exercise but change the 10200 to 102000.
- 5.
- Did Manuel I shortchange the people by having less silver in in later days mintings? Try to answer this question by comparing the following two data sets (use
comparison boxplots). The first data set is the amount of silver (percentage)in Manuel's first minting while the second data set is the amount of silver (percentage)
in Manuel's fourth minting.
First: 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2 Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8

- 6.
- Using the LDL levels of quail a drug compound (call it A) was put on test. In the experiment, 30 quail were randomly chosen and 20 were assigned to a
placebo and the other 10 to the treatment using Drug A. The drug was mixed in their food. Other than this, though, the quail were treated the same. At the end of
the treament period, the Low Density Lipid levels of the quail were measured and are given below. Here smaller is definitely better. The data are real.
Placebo: 64 49 54 64 97 66 76 44 71 89 70 72 71 55 60 62 46 77 86 71 Drug A: 40 31 50 48 152 44 74 38 81 64

- (a)
- Obtain comparison dot plots of the data and try to decide if the drug A was effective.
- (b)
- Obtain the descriptive statistics for each data sets. Which (difference in means, difference in medians, difference in HL) seem more appropriate here? Why?