Suppose the samples are:
14-8=6, 14-12=2, 14-16=-2, 19-8=11, 19-12=7, 19-16=6, 22-8=14, 22-12=10, 22-16=6.Here are the sorted differences:
-2 2 3 6 6 7 10 11 14As our point estimate we will take the median of the differences, i.e., 6. Here are the X's and the unshifted Y's; i.e., Y-6:
X: 8 12 16 Y - 6: 8 13 16Now compute the Wilcoxon test statistic on the X's and the unshifted Y's. You will get T=4.5 which is . This is what you expect T to be if there are no differences. Hence, the median of the differences has unshifted the Y's.
In general, the estimate of the shift in locations based on the Wilcoxon
is the median of the differences Y_{j} - X_{i}.
Consider the battery example of the last chapter. Recall that we had two types of batteries XX and YY and we wanted to see if a typical YY lasts longer that a typical XX. Lets estimate the difference in lifetimes of typical YY and XX batteries. Here are the samples (lifetime in hours):
XX 49 53 74 111 113 335 YY 62 101 167 174 190Here is the comparison dotplot :
.. . : . XX -----+---------+---------+---------+---------+---------+- YY . . .. . -----+---------+---------+---------+---------+---------+- 60 120 180 240 300 360It seems though YY's are beating XX's. To estimate the shift we need to get all 30 differences of the form YY_{j}-XX_{i}. When we get this estimate by hand calculation, the table of differences discussed in the last chapter really helps. Sort the samples. Then the columns of the table are the sorted Y's and the rows of the table are the sorted X's. Then obtain the differences Y_{j} - X_{i}. As you will see the median is easy to get.
62 | 101 | 167 | 174 | 190 | |
49 | 13 | 52 | 118 | 125 | 141 |
53 | 9 | 48 | 114 | 121 | 137 |
74 | -12 | 27 | 93 | 100 | 116 |
111 | -49 | -10 | 56 | 63 | 79 |
113 | -51 | -12 | 54 | 61 | 77 |
335 | -273 | -234 | -168 | -161 | -145 |
Our point estimate is the median which is 53 (do a quick stem-leaf then
compute the median). Could you guess it from the plot? (Take the YY's shift
them back 53 units. Do these "aligned" samples seem about the same?). So
a typical YY battery lasts 53 hours longer than a typical XX battery. Takes
care of that problem. What's that? Oh right, it could just be sampling
error. We need a confidence interval!
We will use percentile confidence intervals based on resampling. So its old stuff! The steps for a general situation are:
-161.0 -105.0 -78.5 -78.5 -76.0 -49.0 -49.0 -45.5 -44.5 -12.0 -11.0 -11.0 -11.0 -10.0 -10.0 -0.5 9.0 9.0 9.0 9.0 11.0 11.0 13.0 13.0 13.0 13.0 22.0 27.0 27.0 27.0 27.0 27.0 27.0 30.5 34.5 37.5 37.5 48.0 48.0 48.0 48.0 48.0 51.0 51.0 51.0 52.0 52.0 52.0 52.0 53.0 53.0 53.0 54.0 54.0 54.5 55.0 56.0 56.0 56.0 56.0 56.0 56.0 56.5 57.5 58.5 61.0 61.0 61.0 61.0 63.0 63.0 64.5 70.0 72.5 77.0 77.0 77.0 77.0 77.0 79.0 79.0 85.0 86.0 93.0 93.0 93.0 93.0 93.0 93.0 93.0 93.0 96.5 100.0 100.0 107.0 114.0 116.0 117.0 121.0 137.0The confidence interval is (-78.5, 117). It contains 0, hence, the results are inconclusive. Remember we took differences of the form YY minus XX, so positive values in the CI means YY beats XX, but negative values mean XX beats YY. Our conclusion would be: a typical YY battery has a shorter lifetime than a typical XX by 78.5 hours to a typical YY battery has a longer lifetime than a typical XX by 117 hours. Though right, this sounds a bit odd. It is better to say the results were inconclusive . The value of 53 did not overcome the noise level. Note that on this data set, this is the same conclusion which we came to in Chapter 8.
A picture is worth a 1000 words, so here is a histogram of the 100 resampled medians of the differences. I have located the CI on it with [ ]'s.
. : : : : : : : . : . : . : : : : .:: : : . . : : :. :::. : : . . : :. : . : .:: ::::::: ::.:: . +---------+------[--+---------+---------+--------]+-------C1 -180 -120 -60 0 60 120Using 1000 resamples, I got the confidence interval (-105,115). So the conclusion remains the same.
Using the class code (Two-Sample hypothesis test and confidence interval for the location parameter based on the Wilcoxon) you try it. Simply bring up class code in a second window, drop the XX sample in the first box (data set 1), drop the YY sample in the second box (data set 2), and submit.
X 12 15 18 Y 16 19 25 28
2 9 2 2 7 2 2 3 0 8 8 1 9 8 8 2 3 3 4 0 9 2 1 0 7 9 3 6 6 2 3 7 6 8 8 7 0 5 0 3 4 3 5 7 7 3 4 5 0 1
Switch .212 .218 .236 .242 .251 .251 .254 .261 .270 .282 Left .238 .271 .279 .283 .284 .290 .300 .303
Ital. 134 132 126 134 131 130 130 125 132 126 Etru. 141 145 145 146 142 126 144 146 154 149 143 131
Hitters: 155 155 160 160 160 166 170 175 175 175 180 185 185 185 185 185 185 185 190 190 190 190 190 195 195 195 195 200 205 207 210 211 230 Pitchers: 160 175 180 185 185 185 190 190 195 195 195 200 200 200 200 205 205 210 210 218 219 220 222 225 225 232