The test statistic that we have chosen, T, is simple: just count up the number of
times a YY battery beats (lasts longer than) a XX battery. This is called the two sample Wilcoxon test statistic , which we will refer to as the Wilcoxon test statistic. Note that there
are
match ups between the samples. Under the null hypothesis,
H_{0},
you expect T to be
;
i.e., under H_{0}
in the 30 match ups, you expect half the time that the YY battery will last longer
than the XX battery and half the time that the XX battery will last longer than the YY battery. You reject H_{0} in favor of H_{A} if T is too large.
Suppose the data are:
XX 49 53 74 111 113 335 YY 62 101 167 174 190To compute T just go use each YY data point:
62 beats 2 XX's, namely 49, 53 101 beats 3 XX's, namely 49, 53, 74 167 beats 5 XX's, namely 49, 53, 74, 111, 113 174 beats 5 XX's, namely 49, 53, 74, 111, 113 190 beats 5 XX's, namely 49, 53, 74, 111, 113
So T = 20.
So T is 20, this is more than 15. The question is: Is this enough
more? We will answer that after a few remarks and exercises.
X 78 108 121 123 127 140 141 Y 104 107 119 124 135 136(Ans: T= 17).
Switch .212 .218 .236 .242 .251 .251 .254 .261 .270 .282 Left .238 .271 .279 .283 .284 .290 .300 .303(Ans: T = 71).
Ital. 134 132 126 134 131 130 125 132 126 Etru. 141 145 145 146 142 126 144 146 154 149 143 131
Right .225 .238 .239 .243 .244 .245 .262 .271 .271 .274 .274 .276 .282 .286 .286 Left .238 .271 .279 .283 .284 .290 .300 .303 .240
First: 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2 Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8
General Case.
We need a little notation. In general (not just the battery example), let
X_{1}, X_{2}, ..., X_{m} denote the random sample from the first population and let
Y_{1}, Y_{2}, ..., Y_{n} denote the random sample from the second population. Denote the Wilcoxon test statistic by
Read : T is the number of matches between Y and X in which Y is larger than X.
There are
matches. If H_{0} is true, we expect T to be
.
Actually a table that proves useful here and in the next chapter is the table of differences. Sort each sample. Then columns of the table are sorted Y's and the rows are sorted X's. The entries in the table are the differences Y_{j} - X_{i}. The statistic T is just the number of positive differences. Here is the table for the battery data.
62 | 101 | 167 | 174 | 190 | |
49 | 13 | 52 | 118 | 125 | 141 |
53 | 9 | 48 | 114 | 121 | 137 |
74 | -12 | 27 | 93 | 100 | 116 |
111 | -49 | -10 | 56 | 63 | 79 |
113 | -51 | -12 | 54 | 61 | 77 |
335 | -273 | -234 | -168 | -161 | -145 |
In general, we need to know how large T should be to reject H_{0}
in favor of H_{A}. The key is very large values of T
should
be rare if H_{0} is true. So we calculate the probability
that T is greater than or equal to the observed value of T assuming
that H_{0} is true. This is called the p-value
or the observed significance level of the test. Oh, oh! We need
the distribution of T assuming that
H_{0} is true.
How do we get that? What's that? Resampling! That's right. We approximate
this distribution by resampling.
We need to resample assuming H_{0} is true. We can do
this by combining the samples into one large sample of size N = m + n. Then sample with replacement from this combined sample, randomly
assigning m of these values to be the new X's and the remaining
n
of these values to be the new Y's. Note that the null hypothesis
is true for these new samples, they are from the big combined sample.
Battery Example
Let's try it on the battery data. Recall that the samples are:
XX 49 53 74 111 113 335 YY 62 101 167 174 190Now combine it into one data set:
Null Population: 49 53 74 111 113 335 62 101 167 174 190Resample with replacement from this data set and assign the first 6 to be a X and the last 5 to be a Y. (I did this by mixing the numbers together in a hat, drawing one out, recording it, putting it back in, mixing them up, ETC!!! Here's the results:
New X's: 335 167 53 335 74 62 New Y's: 62 49 53 174 190Now compute T. (Here, we are going to get some Y=X, so we will count such a match as 1/2). Hence starting with 62, T = 1.5 + 0 + .5 + 4 + 4 = 10. Recall that the value of T on the original sample was 20. So the event did not occur.
Now do this 1000 times and count the times the event
occurs. Divide this number by 1000 and we have the p-value of the
test.
The class code discussed below will do this. But for now, here are the results of doing it 100 times. These are the 100 sorted resampled test statistics:
0.5 2.0 3.5 4.0 4.5 5.0 6.0 7.0 7.5 8.0 8.5 9.0 9.0 9.5 9.5 *10.0 10.5 10.5 10.5 10.5 11.0 11.0 11.5 11.5 11.5 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.5 12.5 12.5 12.5 13.0 13.5 13.5 13.5 13.5 13.5 14.0 14.0 14.5 14.5 15.0 15.0 15.0 15.5 16.0 16.0 16.0 16.0 16.0 16.5 17.0 17.0 17.5 17.5 17.5 17.5 17.5 17.5 18.0 18.0 18.0 18.0 18.5 18.5 18.5 19.0 19.5 19.5 19.5 20.0 20.0 20.0 20.0 20.5 20.5 20.5 20.5 20.5 21.0 21.0 21.5 22.0 22.5 23.0 23.5 23.5 23.5 23.5 24.0 24.5 26.5 26.5 26.5I put a * at the resample we just did (i.e., resampled T=10). How many times did the resampled T exceed 20? Well just count them up: 24 times. So the p-value of the test was .24. That's not too rare! One-out-of-four times. Hence, we would probably not reject H_{0}. We would conclude: There is insufficient evidence to conclude that Battery YY lasts longer than Battery XX.
There is nothing like a picture of a p-value . Here's a dotplot of the 100 resampled T's.
: : . . : . : .:: : . : ::. .:: : . . . .... . ....::.:::::.::::.:.::::.::::....:.. : +---------+---------+---------+---------X---------+-------C1 0.0 5.0 10.0 15.0 20.0 25.0I put an X on 20. If you count the dots from 20 on you will get 24. If this were a histogram, .24 would be the shaded area to the right of 20.
Now you try it with the class code (Two-Sample bootstrap Wilcoxon statistic). Drop the XX and YY samples into the boxes (they are printed below), enter 100 for the number of trials, and click submit. You will get back 100 sorted Wilcoxon's. Determine the p-value; i.e., the number of resampled T's which exceed 20. The data are:
XX 49 53 74 111 113 335 YY 62 101 167 174 190
X 78 108 121 123 127 140 141 Y 104 107 119 124 135 136
Switch .212 .218 .236 .242 .251 .251 .254 .261 .270 .282 Left .238 .271 .279 .283 .284 .290 .300 .303
Ital. 134 132 126 134 131 130 130 125 132 126 Etru. 141 145 145 146 142 126 144 146 154 149 143 131
Right .225 .238 .239 .243 .244 .245 .262 .271 .271 .274 .274 .276 .282 .286 .286 Left .238 .271 .279 .283 .284 .290 .300 .303 .240
First: 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2 Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8