Next: Difference Between Proportions
Up: Estimation of Effect :
Previous: Estimation and Confidence Interval
There are estimation schemes other than the one based on the Wilcoxon.
One that is commonly used is the difference of the means. It is similar
to the above discussion except instead of the median of the differences,
we consider the difference of the sample means .
Again, suppose we have two populations which we assume differ by at
most a shift in locations. Let
be the difference in locations:
Population Y - Population X. Remember, shift is shift. So for here
write
,
where
and
are the true population means of Populations X and Y, respectively. We draw random samples from each population. Let
X1, X2, ... ,Xm denote the sample of size m from the first population. Let
Y1, Y2, ... ,Yn denote the sample of size n from the second population. Our estimate
of
is
.
Suppose the samples are:
Then
and
.
Hence the estimate of
is
19 - 12 = 7.
This is only an estimate, so once again we need to get a confidence
interval. But the algorithm discussed in the last section will still work.
Simply replace median of differences with difference in means; i.e.,
- 1.
- Resample m X's with replacement.
- 2.
- Resample n Y 's with replacement.
- 3.
- Obtain the difference in sample means of these resamples.
- 4.
- Record this difference.
- 5.
- Repeat steps (1) through (4) 1000 times.
- 6.
- Sort the 1000 difference in means,
- 7.
- Pick off the 25th and 976th sorted differences in means.
This is our 95% confidence interval.
This becomes very tedious, so again we have a class code, Two-Sample hypothesis test and confidence interval for the location parameter based on the mean, to obtain the point estimate and the confidence interval. It works just like one in the last section.
In the same way, we could use medians instead of means. Although this seems similar to the procedure using the Wilcoxon, it is much different.
Which procedure should we use in practice? That's a hard question to
answer. The interval based on the means is not robust. So if there are
outliers present, we avoid using this interval. The other two intervals
are robust. Of these two, I would choose the Wilcoxon. It offers protection
but it is also more powerful in most cases, giving shorter confidence intervals. The exercises will be helpful
here.
Exercise 10.3.1
- 1.
- To investigate the robustness of the three point estimates, consider the following data set:
X 12 15 18
Y 16 19 25 28
- (a)
- Obtain the three estimates: median of differences, difference in means, difference in medians. (Answers: 7, 7, 7).
- (b)
- Next replace the Y observation 28 by 2800. Obtain the three estimates: median of differences, difference in means, difference in medians. (Answers: 7, 700, 7).
- 2.
- We will use the next two problems to investigate the robustness of the confidence intervals.
- (a)
- Obtain comparison dotplots of the following data:
X:
31 32 33 37 37 44 44 45 45 46 50 50 50
57 57 58 59 59 67 67
Y:
40 45 45 47 50 52 53 53 54 54 55 61 63
66 67 68 73 73 76 83
- (b)
- Using the class code (Two-Sample Hypothesis and CI (Wilcoxon)) obtain the estimate of
and the confidence interval for it using the Wilcoxon.
- (c)
- Using the class code (Two-Sample Hypothesis and CI (mean)) obtain the estimate of
and the confidence interval for it using the difference in means.
- (d)
- Using the class code (Two-Sample Hypothesis and CI (median)), obtain the estimate of
and the confidence interval for it using the difference in medians.
- (e)
- Compare the intervals.
- 3.
- Consider the samples (same as last problem but the typo of 67 on the last data point of the X's was discovered and its true value of 670 has been put in):
X:
31 32 33 37 37 44 44 45 45 46 50 50 50
57 57 58 59 59 67 670
Y:
40 45 45 47 50 52 53 53 54 54 55 61 63
66 67 68 73 73 76 83
- (a)
- Using the class code (Two-Sample Hypothesis and CI (Wilcoxon)) obtain the estimate of
and the confidence interval for it using the Wilcoxon.
- (b)
- Using the class code (Two-Sample Hypothesis and CI (mean)) obtain the estimate of
and the confidence interval for it using the difference in means.
- (c)
- Using the class code (Two-Sample Hypothesis and CI (median)) obtain the estimate of
and the confidence interval for it using the difference in medians.
- (d)
- Compare the intervals.
Next: Difference Between Proportions
Up: Estimation of Effect :
Previous: Estimation and Confidence Interval
2001-01-01