next up previous contents index
Next: Difference Between Proportions Up: Estimation of Effect : Previous: Estimation and Confidence Interval

Estimation and Confidence Intervals Based on Means and Medians

There are estimation schemes other than the one based on the Wilcoxon. One that is commonly used is the difference of the means. It is similar to the above discussion except instead of the median of the differences, we consider the difference of the sample means .

Again, suppose we have two populations which we assume differ by at most a shift in locations. Let $\Delta$ be the difference in locations: Population Y - Population X. Remember, shift is shift. So for here write $\Delta = \mu_2 - \mu_1$, where $\mu_1$ and $\mu_2$ are the true population means of Populations X and Y, respectively. We draw random samples from each population. Let X1, X2, ... ,Xm denote the sample of size m from the first population. Let Y1, Y2, ... ,Yn denote the sample of size n from the second population. Our estimate of $\Delta$ is $\bar{Y} - \bar{X}$.

Suppose the samples are:

Then $\bar{X} = 12$ and $\bar{Y} = 19$. Hence the estimate of $\Delta$ is 19 - 12 = 7.

This is only an estimate, so once again we need to get a confidence interval. But the algorithm discussed in the last section will still work. Simply replace median of differences with difference in means; i.e.,

1.
Resample m X's with replacement.

2.
Resample n Y 's with replacement.

3.
Obtain the difference in sample means of these resamples.

4.
Record this difference.

5.
Repeat steps (1) through (4) 1000 times.

6.
Sort the 1000 difference in means,

7.
Pick off the 25th and 976th sorted differences in means. This is our 95% confidence interval.
This becomes very tedious, so again we have a class code, Two-Sample hypothesis test and confidence interval for the location parameter based on the mean, to obtain the point estimate and the confidence interval. It works just like one in the last section.

In the same way, we could use medians instead of means. Although this seems similar to the procedure using the Wilcoxon, it is much different.

Which procedure should we use in practice? That's a hard question to answer. The interval based on the means is not robust. So if there are outliers present, we avoid using this interval. The other two intervals are robust. Of these two, I would choose the Wilcoxon. It offers protection but it is also more powerful in most cases, giving shorter confidence intervals. The exercises will be helpful here.


Exercise 10.3.1  
1.
To investigate the robustness of the three point estimates, consider the following data set:
      X   12 15 18
      Y   16 19 25 28
(a)
Obtain the three estimates: median of differences, difference in means, difference in medians. (Answers: 7, 7, 7).
(b)
Next replace the Y observation 28 by 2800. Obtain the three estimates: median of differences, difference in means, difference in medians. (Answers: 7, 700, 7).
2.
We will use the next two problems to investigate the robustness of the confidence intervals.
(a)
Obtain comparison dotplots of the following data:
   X:
    31   32   33   37   37   44   44   45   45   46   50   50   50 
    57   57   58   59   59   67   67

   Y:
    40   45   45   47   50   52   53   53   54   54   55   61    63 
    66   67   68   73   73   76   83
(b)
Using the class code (Two-Sample Hypothesis and CI (Wilcoxon)) obtain the estimate of $\Delta$ and the confidence interval for it using the Wilcoxon.
(c)
Using the class code (Two-Sample Hypothesis and CI (mean)) obtain the estimate of $\Delta$ and the confidence interval for it using the difference in means.
(d)
Using the class code (Two-Sample Hypothesis and CI (median)), obtain the estimate of $\Delta$ and the confidence interval for it using the difference in medians.
(e)
Compare the intervals.
3.
Consider the samples (same as last problem but the typo of 67 on the last data point of the X's was discovered and its true value of 670 has been put in):
   X:
    31   32   33   37   37   44   44   45   45   46   50   50   50 
    57   57   58   59   59   67   670

   Y:
    40   45   45   47   50   52   53   53   54   54   55   61   63 
    66   67   68   73   73   76   83
(a)
Using the class code (Two-Sample Hypothesis and CI (Wilcoxon)) obtain the estimate of $\Delta$ and the confidence interval for it using the Wilcoxon.
(b)
Using the class code (Two-Sample Hypothesis and CI (mean)) obtain the estimate of $\Delta$ and the confidence interval for it using the difference in means.
(c)
Using the class code (Two-Sample Hypothesis and CI (median)) obtain the estimate of $\Delta$ and the confidence interval for it using the difference in medians.
(d)
Compare the intervals.


next up previous contents index
Next: Difference Between Proportions Up: Estimation of Effect : Previous: Estimation and Confidence Interval

2001-01-01