As with dotplots, boxplots lend themselves to comparisons. Just make sure that the same number scale is used for each boxplot. Then simply draw the boxplots in rows (or in columns). As an example, reconsider he subsample of Italian skull sizes given by,
133 128 136 140 127 136 131 131 128 132 125 133 134 136 134 129 132 139 143 138
Recall that the 5 basic descriptive statistics are: 123, 129, 133, 136, and 143. Hence, h = .5(136-129)= 10.5 and the fences are LIF = 129-10.5 = 118.5 and UIF = 136+10.5 = 146.5. The adjacent points are 125 and 143. Based on these statistics, the comparison boxplots are:
--------------
Etruscan * -----------------I + I-------------
--------------
-----------
Italian ---------I + I-----------
-----------
--+---------+---------+---------+---------+---------+----
126.0 132.0 138.0 144.0 150.0 156.0
Use the summary module to obtain:
126 132 138 140 141 141 142 143 144 144 144
123 324 145 156 265 143 221 322 133 233 142 144 244
A final remark on this example is in order. Notice that the scales (noise levels) in the data sets are about same; i.e., the interquartiles ranges are about the same, 8 and 7, and ignoring the outlier the ranges are about the same. We do not have much data here to comment on the shapes of the distributions but based on the comparison dotplots above symmetry cannot
be discounted.
In light of this, what catches your eye as you look at the box plots? There is a shift ; that is, the Etruscan data is shifted up from the Italian data. If you draw lines connecting the Etruscan and Italian lower quartiles and then a line connecting their upper quartiles the lines will be almost parallel. The line connecting the medians will also be almost parallel with these lines. In fact, it is tempting to summarize the data with one number which is the difference in the medians. In this case the difference is 146 - 133 = 13. This is called a location problem . These problems are characterized by the samples having similar shapes and scales (noise levels). In such cases, a convenient summary is a difference in locations or centers. Here, that difference is 13; so the Etruscan head sizes are shifted up 13mm from the Italian head sizes. Be very careful, though. This number 13 is based on just two samples. We also need a measure of sample error. If this measure turns out to be greater than 13 then our estimate of shift loses a lot of meaning. In later chapters we will say it is insignificant . If sampling error is small (less than 13 here) then our estimate of shift is meaningful. In later chapters we will say it is significant .
Group 1: 153 150 132 123 148 146 140 154
137 112
Group 2: 148 113 69 129 150 129 157 184
143 167 141 179 124 130 166
A: 41 289 214 102 38
94 179 87 116 155
B: 39 65 22 64 22
191 99 32 142 317
C: 24 95 139 122 41
360 318 34 43 18