Set 1: 11 18 6 4 8 15 22The median is 11. The data add up to 84 and their are 7 data points; hence, the sample mean is 84/7 = 12. You can use the summary module to obtain the sample mean.
What does the mean mean? The mean is the center of gravity of a histogram of the sample along its horizontal axis. Consider, yet again, the 25 Etruscan skull sizes:
126 132 138 140 141 141 142 143 144 144 144 145 146 147 148 148 149 149 150 150 150 154 155 158 158
Again enter these data into the data box and choose summary from the analysis menu.
The sample average is 145.68 while the median is 146. To get a histogram of the data just click on the histogram button before submitting. The histogram is approximately symmetric so it is not surprising that the mean and the median are similar. But for data sets which are asymmetric these statistics can be quite different.
Furthermore the mean is quite sensitive to outliers. Consider again the simple data set: 11, 18, 6, 4, 8, 15, 22. The median and mean are 11 and 12, respectively. Both statistics are in the center of the data which is where they should be since they are measures of center. Now suppose instead of 22 the last data point is 72. The median of course does not change but the mean is now 19.14. That is, the median is still in the center of the data but the mean has moved beyond the sixth data point, 18. The mean is no longer measuring the center of the data. If the last data point is 220 instead of 22 the mean changes to 40.3, well beyond the center of the data. Below is a table of data sets. The first row is the original set and the subsequent rows are with changed data points for the data point 22. Another statistic given in the table is s which we will discuss later.
Data median mean IQ s
Set 1: 11 18 6 4 8 15 22 11 12 12 6.61
Set 2: 11 18 6 4 8 15 72 11 19.1 12 23.8
Set 3: 11 18 6 4 8 15 720 11 112 12 268
Set 4: 11 18 6 4 8 15 2200 11 323 12 828
Set 5: 11 18 6 4 8 15 7200 11 1037 12 2717
Set 6: 11 18 6 4 8 15 72000 11 10295 12 27210
Thus the mean is very sensitive to outliers while the median is not. Hence the mean is not a robust statistic.
Another measure of center which we use occasionally is the median of all the pairwise averages of the data. For the simple data set 11, 18, 6, 4, 8, 15, 22, just order the data and make a table with rows and columns labeled by these data points. Then just compute the average of the pairs associated with row and column elements. This is shown in the table below. These pairwise averages are called Walsh Averages . For a pair of data points we only compute the average once; hence, we only need the top half as shown.
4 6 8 11 15 18 22
4 4 5 6.5 7.5 9.5 11 13
6 6 7 8.5 10.5 12 14
8 8 9.5 11.5 13 15
11 11 13 14.5 16.5
15 15 16.5 18.8
18 18 20
22 22
Ignore the row and column labels and compute median of the other entries in the table. There are 28 entries in the table so the median is the average of the 14th and 15th entries; i.e, the average of 11.5 and 12 which is 11.75. This estimate is often called the Hodges-Lehmann estimate so we will denote it by HL . Okay. I realize it is not fun to compute this table so you can also do it the easy way. Just enter these data into the data box, choose summary from the analysis menu and check the button for numerical summaries after submitting. As you see
HL = 11.75.
The Hodges-Lehmann estimate is robust . If you change the last data point to 72000, the Hodges-Lehmann estimate remains at 11.75.