next up previous contents index
Next: Measures of Scale or Up: Other Statistics Previous: Other Statistics

Measures of Center

The sample median , Q2, is one measure of center which we have already discussed. Recall that 50% of the data is less than or equal to Q2 and 50% of the data is greater than or equal to Q2. Another measure of center that we will frequently use is the sample mean , which is just the arithmetic average of the sample; i.e., add up all the data and divide by the sample size n. In terms of notation we will use $\bar{x}$  to denote the average of x1, x2, ... , xn. For example, consider the data (Set 1):
Set 1:    11    18     6     4     8    15    22
The median is 11. The data add up to 84 and their are 7 data points; hence, the sample mean is 84/7 = 12. You can use the summary module to obtain the sample mean.

What does the mean mean? The mean is the center of gravity of a histogram of the sample along its horizontal axis. Consider, yet again, the 25 Etruscan skull sizes: 

126    132    138    140    141    141    142    143    144    144    144
145    146    147    148    148    149    149    150    150    150    154
155    158    158

Again enter these data into the data box and choose summary from the analysis menu. The sample average is 145.68 while the median is 146. To get a histogram of the data just click on the histogram button before submitting. The histogram is approximately symmetric so it is not surprising that the mean and the median are similar. But for data sets which are asymmetric these statistics can be quite different.

Furthermore the mean is quite sensitive to outliers. Consider again the simple data set: 11, 18, 6, 4, 8, 15, 22. The median and mean are 11 and 12, respectively. Both statistics are in the center of the data which is where they should be since they are measures of center. Now suppose instead of 22 the last data point is 72. The median of course does not change but the mean is now 19.14. That is, the median is still in the center of the data but the mean has moved beyond the sixth data point, 18. The mean is no longer measuring the center of the data. If the last data point is 220 instead of 22 the mean changes to 40.3, well beyond the center of the data. Below is a table of data sets. The first row is the original set and the subsequent rows are with changed data points for the data point 22. Another statistic given in the table is s which we will discuss later.

                      Data                        median   mean   IQ   s
Set 1: 11    18     6     4     8    15    22       11     12     12   6.61
Set 2: 11    18     6     4     8    15    72       11     19.1   12   23.8
Set 3: 11    18     6     4     8    15   720       11    112     12   268
Set 4: 11    18     6     4     8    15  2200       11    323     12   828
Set 5: 11    18     6     4     8    15  7200       11   1037     12  2717
Set 6: 11    18     6     4     8    15 72000       11  10295     12 27210
Thus the mean is very sensitive to outliers while the median is not. Hence the mean is not a robust statistic.

Another measure of center which we use occasionally is the median of all the pairwise averages of the data. For the simple data set 11, 18, 6, 4, 8, 15, 22, just order the data and make a table with rows and columns labeled by these data points. Then just compute the average of the pairs associated with row and column elements. This is shown in the table below. These pairwise averages are called Walsh Averages . For a pair of data points we only compute the average once; hence, we only need the top half as shown.

           4       6       8      11      15      18    22
        4  4       5      6.5     7.5     9.5     11    13
        6          6       7      8.5    10.5     12    14
        8                  8      9.5    11.5     13    15
       11                         11     13      14.5   16.5
       15                                15      16.5   18.8
       18                                        18     20
       22                                               22

Ignore the row and column labels and compute median of the other entries in the table. There are 28 entries in the table so the median is the average of the 14th and 15th entries; i.e, the average of 11.5 and 12 which is 11.75. This estimate is often called the Hodges-Lehmann  estimate so we will denote it by HL . Okay. I realize it is not fun to compute this table so you can also do it the easy way. Just enter these data into the data box, choose summary from the analysis menu and check the button for numerical summaries after submitting. As you see HL = 11.75.

The Hodges-Lehmann estimate is robust . If you change the last data point to 72000, the Hodges-Lehmann estimate remains at 11.75.


next up previous contents index
Next: Measures of Scale or Up: Other Statistics Previous: Other Statistics

2001-01-01