The test statistic that we have chosen, T, is simple: just count up the number of
times a YY battery beats (lasts longer than) a XX battery. This is called the two sample Wilcoxon test statistic , which we will refer to as the Wilcoxon test statistic. Note that there
are
match ups between the samples. Under the null hypothesis,
H0,
you expect T to be
;
i.e., under H0
in the 30 match ups, you expect half the time that the YY battery will last longer
than the XX battery and half the time that the XX battery will last longer than the YY battery. You reject H0 in favor of HA if T is too large.
Suppose the data are:
XX 49 53 74 111 113 335 YY 62 101 167 174 190To compute T just go use each YY data point:
62 beats 2 XX's, namely 49, 53 101 beats 3 XX's, namely 49, 53, 74 167 beats 5 XX's, namely 49, 53, 74, 111, 113 174 beats 5 XX's, namely 49, 53, 74, 111, 113 190 beats 5 XX's, namely 49, 53, 74, 111, 113
So T = 20.
So T is 20, this is more than 15. The question is: Is this enough
more? We will answer that after a few remarks and exercises.
X 78 108 121 123 127 140 141
Y 104 107 119 124 135 136
(Ans: T= 17).
Switch .212 .218 .236 .242 .251 .251 .254 .261 .270 .282
Left .238 .271 .279 .283 .284 .290 .300 .303
(Ans: T = 71).
Ital. 134 132 126 134 131 130 125 132 126
Etru. 141 145 145 146 142 126 144 146 154 149 143 131
Right .225 .238 .239 .243 .244 .245 .262 .271 .271
.274 .274 .276 .282 .286 .286
Left .238 .271 .279 .283 .284 .290 .300 .303 .240
First: 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2
Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8
General Case.
We need a little notation. In general (not just the battery example), let
X1, X2, ..., Xm denote the random sample from the first population and let
Y1, Y2, ..., Yn denote the random sample from the second population. Denote the Wilcoxon test statistic by
Read : T is the number of matches between Y and X in which Y is larger than X.
There are
matches. If H0 is true, we expect T to be
.
Actually a table that proves useful here and in the next chapter is the table of differences. Sort each sample. Then columns of the table are sorted Y's and the rows are sorted X's. The entries in the table are the differences Yj - Xi. The statistic T is just the number of positive differences. Here is the table for the battery data.
| 62 | 101 | 167 | 174 | 190 | |
| 49 | 13 | 52 | 118 | 125 | 141 |
| 53 | 9 | 48 | 114 | 121 | 137 |
| 74 | -12 | 27 | 93 | 100 | 116 |
| 111 | -49 | -10 | 56 | 63 | 79 |
| 113 | -51 | -12 | 54 | 61 | 77 |
| 335 | -273 | -234 | -168 | -161 | -145 |
In general, we need to know how large T should be to reject H0
in favor of HA. The key is very large values of T
should
be rare if H0 is true. So we calculate the probability
that T is greater than or equal to the observed value of T assuming
that H0 is true. This is called the p-value
or the observed significance level of the test. Oh, oh! We need
the distribution of T assuming that
H0 is true.
How do we get that? What's that? Resampling! That's right. We approximate
this distribution by resampling.
We need to resample assuming H0 is true. We can do
this by combining the samples into one large sample of size N = m + n. Then sample with replacement from this combined sample, randomly
assigning m of these values to be the new X's and the remaining
n
of these values to be the new Y's. Note that the null hypothesis
is true for these new samples, they are from the big combined sample.
Battery Example
Let's try it on the battery data. Recall that the samples are:
XX 49 53 74 111 113 335 YY 62 101 167 174 190Now combine it into one data set:
Null Population: 49 53 74 111 113 335 62
101 167 174 190
Resample with replacement from this data set and assign the first 6 to
be a X and the last 5 to be a Y. (I did this by mixing the
numbers together in a hat, drawing one out, recording it, putting it back
in, mixing them up, ETC!!! Here's the results:
New X's: 335 167 53 335 74 62 New Y's: 62 49 53 174 190Now compute T. (Here, we are going to get some Y=X, so we will count such a match as 1/2). Hence starting with 62, T = 1.5 + 0 + .5 + 4 + 4 = 10. Recall that the value of T on the original sample was 20. So the event
Now do this 1000 times and count the times the event
occurs. Divide this number by 1000 and we have the p-value of the
test.
The class code discussed below will do this. But for now, here are the results of doing it 100 times. These are the 100 sorted resampled test statistics:
0.5 2.0 3.5 4.0 4.5 5.0 6.0 7.0 7.5 8.0 8.5 9.0 9.0 9.5 9.5 *10.0 10.5 10.5 10.5 10.5 11.0 11.0 11.5 11.5 11.5 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.5 12.5 12.5 12.5 13.0 13.5 13.5 13.5 13.5 13.5 14.0 14.0 14.5 14.5 15.0 15.0 15.0 15.5 16.0 16.0 16.0 16.0 16.0 16.5 17.0 17.0 17.5 17.5 17.5 17.5 17.5 17.5 18.0 18.0 18.0 18.0 18.5 18.5 18.5 19.0 19.5 19.5 19.5 20.0 20.0 20.0 20.0 20.5 20.5 20.5 20.5 20.5 21.0 21.0 21.5 22.0 22.5 23.0 23.5 23.5 23.5 23.5 24.0 24.5 26.5 26.5 26.5I put a * at the resample we just did (i.e., resampled T=10). How many times did the resampled T exceed 20? Well just count them up: 24 times. So the p-value of the test was .24. That's not too rare! One-out-of-four times. Hence, we would probably not reject H0. We would conclude: There is insufficient evidence to conclude that Battery YY lasts longer than Battery XX.
There is nothing like a picture of a p-value . Here's a dotplot of the 100 resampled T's.
:
: . . : .
: .:: : . : ::. .:: : .
. . .... . ....::.:::::.::::.:.::::.::::....:.. :
+---------+---------+---------+---------X---------+-------C1
0.0 5.0 10.0 15.0 20.0 25.0
I put an X on 20. If you count the dots from 20 on you will get 24. If
this were a histogram, .24 would be the shaded area to the right of 20.
Now you try it with the class code (Two-Sample bootstrap Wilcoxon statistic). Drop the XX and YY samples into the boxes (they are printed below), enter 100 for the number of trials, and click submit. You will get back 100 sorted Wilcoxon's. Determine the p-value; i.e., the number of resampled T's which exceed 20. The data are:
XX 49 53 74 111 113 335 YY 62 101 167 174 190
X 78 108 121 123 127 140 141
Y 104 107 119 124 135 136
Switch .212 .218 .236 .242 .251 .251 .254 .261 .270 .282
Left .238 .271 .279 .283 .284 .290 .300 .303
Ital. 134 132 126 134 131 130 130 125 132 126
Etru. 141 145 145 146 142 126 144 146 154 149 143 131
Right .225 .238 .239 .243 .244 .245 .262 .271 .271
.274 .274 .276 .282 .286 .286
Left .238 .271 .279 .283 .284 .290 .300 .303 .240
First: 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2
Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8