next up previous contents index
Next: Wilcoxon: Other Alternatives Up: Tests of Hypotheses Previous: A Testing Procedure

  
The Wilcoxon

Simple Example . Suppose our company makes batteries, and, in particular, we make an expensive battery, called the XX, that is used in the space station. Suppose you have a bright idea of how to increase the life time of the battery by changing one of the resources used in its manufacture. You call your new battery the YY. Your hypotheses are: So you take a sample of XX batteries, say 6 of them, and a sample of 5 YY batteries. These 11 batteries are made under identical conditions, (except for the new resource that goes into the YY's). Also they must have been made independent of one another. (good sampling is expensive, but you avoid GIGO).

The test statistic that we have chosen, T, is simple: just count up the number of times a YY battery beats (lasts longer than) a XX battery. This is called the two sample Wilcoxon test statistic , which we will refer to as the Wilcoxon test statistic. Note that there are $30 = 5 \times 6$ match ups between the samples. Under the null hypothesis, H0, you expect T to be $(1/2) \times 30 = 15$; i.e., under H0 in the 30 match ups, you expect half the time that the YY battery will last longer than the XX battery and half the time that the XX battery will last longer than the YY battery. You reject H0 in favor of HA if T is too large.

Suppose the data are:

XX    49     53     74    111    113    335
YY    62    101    167    174    190
To compute T just go use each YY data point:
62 beats 2 XX's, namely 49, 53
101 beats 3 XX's, namely 49, 53, 74
167 beats 5 XX's, namely 49, 53, 74, 111, 113
174 beats 5 XX's, namely 49, 53, 74, 111, 113
190 beats 5 XX's, namely 49, 53, 74, 111, 113

So T = 20.

So T is 20, this is more than 15. The question is: Is this enough more? We will answer that after a few remarks and exercises.


Exercise 9.3.1  
1.
Obtain a comparison dotplot of the two samples (X and Y) below. Let T be the number of time a Y beats a X. Under the null hypothesis, what do you expect T to be? Next compute T.
      X    78    108    121    123    127    140    141
      Y   104    107    119    124    135    136
(Ans: T= 17).
2.
Below are the batting averages of the switch hitters and the left-handed hitters from the baseball data set. Obtain a comparison dotplot. Dotplot the two samples. Let T be the number of time an average of a left-handed hitter is bigger than the average of a switch-hitter. Under the null hypothesis, what do you expect T to be? Next compute T .
    Switch  .212  .218  .236  .242  .251  .251  .254  .261  .270 .282
     Left   .238  .271  .279  .283  .284  .290  .300  .303
(Ans: T = 71).
3.
Consider the following samples of Italian and Etruscan skull sizes. Let T be the number of time an Etruscan skull size is bigger than an Italian skull size. Under the null hypothesis, what do you expect T to be? Next compute T. It's easier if you sort the samples first!
    Ital.  134   132  126  134  131  130  125  132  126
    Etru.  141   145  145  146  142  126  144  146  154  149  143  131
4.
Below are the batting averages of the right-handed hitters and the left-handed hitters from the baseball data set. Dotplot the two samples. Let T be the number of times an average of a left-handed hitter is bigger than the average of a right-handed hitter. Under the null hypothesis, what do you expect T to be? Next compute T .
      Right   .225  .238  .239  .243  .244  .245  .262  .271  .271
              .274  .274  .276  .282  .286  .286

      Left    .238  .271  .279  .283  .284  .290  .300  .303  .240
5.
Did Manuel I shortchange the people by having less silver in in later days mintings? Try to answer this question by comparing the following two data sets (use comparison boxplots). Let T be the number of times a First minting has a higher percentage than a Fourth minting. Under the null hypothesis, what do you expect T to be? Next compute T .
      First:     5.9  6.8   6.4  7.0  6.6  7.7  7.2  6.9  6.2
      Fourth     5.3  5.6   5.5  5.1  6.2  5.8  5.8


General Case.

We need a little notation. In general (not just the battery example), let X1, X2, ..., Xm denote the random sample from the first population and let Y1, Y2, ..., Yn denote the random sample from the second population. Denote the Wilcoxon test statistic by

\begin{displaymath}T = \char93 \{Y_j > X_i \}
\end{displaymath}

Read : T is the number of matches between Y and X in which Y is larger than X.

There are $m \times n$ matches. If H0 is true, we expect T to be $\frac{m \times n}{2}$.

Actually a table that proves useful here and in the next chapter is the table of differences. Sort each sample. Then columns of the table are sorted Y's and the rows are sorted X's. The entries in the table are the differences Yj - Xi. The statistic T is just the number of positive differences. Here is the table for the battery data.
  62 101 167 174 190
49 13 52 118 125 141
53 9 48 114 121 137
74 -12 27 93 100 116
111 -49 -10 56 63 79
113 -51 -12 54 61 77
335 -273 -234 -168 -161 -145

In general, we need to know how large T should be to reject H0 in favor of HA. The key is very large values of T should be rare if H0 is true. So we calculate the probability that T is greater than or equal to the observed value of T assuming that H0 is true. This is called the p-value  or the observed significance level  of the test. Oh, oh! We need the distribution of T assuming that H0 is true. How do we get that? What's that? Resampling! That's right. We approximate this distribution by resampling.

We need to resample assuming H0 is true. We can do this by combining the samples into one large sample of size N = m + n. Then sample with replacement from this combined sample, randomly assigning m of these values to be the new X's and the remaining n of these values to be the new Y's. Note that the null hypothesis is true for these new samples, they are from the big combined sample.

Battery Example
Let's try it on the battery data. Recall that the samples are:

XX    49     53     74    111    113    335
YY    62    101    167    174    190
Now combine it into one data set:
Null Population:   49     53     74    111    113    335  62
                  101    167    174    190
Resample with replacement from this data set and assign the first 6 to be a X and the last 5 to be a Y. (I did this by mixing the numbers together in a hat, drawing one out, recording it, putting it back in, mixing them up, ETC!!! Here's the results:
New X's:   335    167     53    335     74     62
New Y's:    62     49     53    174    190
Now compute T. (Here, we are going to get some Y=X, so we will count such a match as 1/2). Hence starting with 62, T = 1.5 + 0 + .5 + 4 + 4 = 10. Recall that the value of T on the original sample was 20. So the event $T \geq 20$ did not occur.

Now do this 1000 times and count the times the event $T \geq 20$ occurs. Divide this number by 1000 and we have the p-value of the test.

The class code discussed below will do this. But for now, here are the results of doing it 100 times. These are the 100 sorted resampled test statistics:

  0.5  2.0  3.5  4.0  4.5  5.0  6.0  7.0  7.5  8.0  8.5  9.0  9.0  9.5  9.5
*10.0 10.5 10.5 10.5 10.5 11.0 11.0 11.5 11.5 11.5 12.0 12.0 12.0 12.0 12.0
 12.0 12.0 12.0 12.5 12.5 12.5 12.5 13.0 13.5 13.5 13.5 13.5 13.5 14.0 14.0
 14.5 14.5 15.0 15.0 15.0 15.5 16.0 16.0 16.0 16.0 16.0 16.5 17.0 17.0 17.5
 17.5 17.5 17.5 17.5 17.5 18.0 18.0 18.0 18.0 18.5 18.5 18.5 19.0 19.5 19.5
 19.5 20.0 20.0 20.0 20.0 20.5 20.5 20.5 20.5 20.5 21.0 21.0 21.5 22.0 22.5
 23.0 23.5 23.5 23.5 23.5 24.0 24.5 26.5 26.5 26.5
I put a * at the resample we just did (i.e., resampled T=10). How many times did the resampled T exceed 20? Well just count them up: 24 times. So the p-value of the test was .24. That's not too rare! One-out-of-four times. Hence, we would probably not reject H0. We would conclude: There is insufficient evidence to conclude that Battery YY lasts longer than Battery XX.

There is nothing like a picture of a p-value . Here's a dotplot of the 100 resampled T's.

                                  :
                                  :  .    .  :     .
                               : .:: :  . :  ::. .::     :     .
           .  .  .... . ....::.:::::.::::.:.::::.::::....:..   :
          +---------+---------+---------+---------X---------+-------C1
        0.0       5.0      10.0      15.0      20.0      25.0
I put an X on 20. If you count the dots from 20 on you will get 24. If this were a histogram, .24 would be the shaded area to the right of 20.

Now you try it with the class code (Two-Sample bootstrap Wilcoxon statistic). Drop the XX and YY samples into the boxes (they are printed below), enter 100 for the number of trials, and click submit. You will get back 100 sorted Wilcoxon's. Determine the p-value; i.e., the number of resampled T's which exceed 20. The data are:

XX    49     53     74    111    113    335
YY    62    101    167    174    190


Exercise 9.3.2  
1.
In the last set of exercises, you obtained T be the number of time a Y beats a X. Now use the class code (Two-Sample bootstrap Wilcoxon statistic (Sorted)) to compute the p-value based on 100 trials.
      X    78    108    121    123    127    140    141
      Y   104    107    119    124    135    136
2.
Below are the batting averages of the switch hitters and the left-handed hitters from the baseball data set. Let T be the number of time an average of a left-handed hitter is bigger than the average of a switch-hitter. Recall T = 71. Now use the class code (Two-Sample bootstrap Wilcoxon statistic (Sorted)) to compute the p-value based on 100 trials.
    Switch  .212  .218  .236  .242  .251  .251  .254  .261  .270  .282
     Left   .238  .271  .279  .283  .284  .290  .300  .303
3.
Consider the following samples of Italian and Etruscan skull sizes. Let T be the number of time an Etruscan skull size is bigger than an Italian skull size. You computed T in the last set of exercises. Now use the class code (Two-Sample bootstrap Wilcoxon statistic (Sorted)) to compute the p-value based on 100 trials.
    Ital.  134  132  126  134  131  130  130  125  132  126
    Etru.  141  145  145  146  142  126  144  146  154  149  143  131
4.
Below are the batting averages of the right-handed hitters and the left-handed hitters from the baseball data set. Let T be the number of times an average of a left-handed hitter is bigger than the average of a right-handed hitter. You computed T in the last set of exercises. Now use the class code (Two-Sample bootstrap Wilcoxon statistic (Sorted)) to compute the p-value based on 100 trials.
      Right   .225  .238  .239  .243  .244  .245  .262  .271  .271
              .274  .274  .276  .282  .286  .286

      Left    .238  .271  .279  .283  .284  .290  .300  .303  .240
5.
Did Manuel I shortchange the people by having less silver in in later days mintings? Try to answer this question by comparing the following two data sets (use comparison boxplots). Let T be the number of times a first minting has a higher percentage than a Fourth minting. You computed T in the last set of exercises. Now use the class code (Two-Sample bootstrap Wilcoxon statistic (Sorted)) to compute the p-value based on 100 trials.
      First:     5.9  6.8   6.4  7.0  6.6  7.7  7.2  6.9  6.2
      Fourth     5.3  5.6   5.5  5.1  6.2  5.8  5.8


next up previous contents index
Next: Wilcoxon: Other Alternatives Up: Tests of Hypotheses Previous: A Testing Procedure

2001-01-01