There are many other two sample problems which time does not allow us to
consider. You can always sign up for another stat class and I'll be happy
to recommend some to you. But we would be remiss if we didn't discuss the
difference in proportions problem. Also, it's a cinch with resampling.
So consider two population proportions. Examples are far too numerous
to list here. One that occurs all too often is: the president's rating
this month versus his rating last month. Another one that is the sign of
the times is: Candidate A is worried about his financial backing. He needs
to show that his popularity (i.e. proportion that will vote for him) is
rising in order to attract more money. So he decides that he will come out
strongly for (or against) an issue that is very popular with certain segments
of the population. He then talks about this nonstop on morning, afternoon,
and evening talk shows. Population I is the proportion of voters who favor
him before this takes place and Population II is the proportion of voters
who favor him after this change and the subsequent push on the talk shows. He must convince his backers that there has been an increase in the proportion of voters who favor him.
Ah, notation, but it's simple here. Just let p1 and
p2
be the population proportions of Populations I and II, respectively. We
are interested in estimating and determining a confidence interval for
p2 - p1. So we draw random samples from Populations I and II
of size m and n, respectively. Our estimate of p2 - p1 is just the difference in sample proportions. We will
do the CI next, but first lets look at an example:
There are two different treatments (Drug I and Drug II) for a certain disease. Which is better? A scientist comes up with the following plan: He selects 100 patients who have the disease. He randomly assigns them to Drug I or II by a preassigned random scheme (in particular he does not decide!). The patients are treated by doctors who do not know which drug the patient is getting. At the end of the treatment period the proportion cured by each drug is tabulated. This is called a double blind study . Suppose the results are:
Cured Not Cured
Drug I 39 13
Drug II 26 22
The estimate of p2 - p1 is
26/48 - 39/52 = .54 - .75 = -.21. So it looks like Drug I is better. What's that?
Oh yes. How could I forget? Small samples, sampling error, etc. We need
a confidence interval.
Resampling to the rescue. Recall that sample proportions are sample means. So we can use the algorithm of the last section, but we do need the samples. These are not the tabled values above, but what produced the tabled results. The first sample consists of 39 1's and 13 0's. The second sample consists of 26 1's and 22 0's. Here they are:
X Drug I:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0
Y Drug II:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Now just call up class code for difference in means, drop these samples in the X and Y boxes and submit. To set ideas, here are the results of 100 resampled difference in proportions:
-0.47115385 -0.45352564 -0.43269231 -0.42948718 -0.41346154 -0.39583333 -0.34455128 -0.33012821 -0.33012821 -0.32852564 -0.32532051 -0.31089744 -0.31089744 -0.30929487 -0.30769231 -0.30769231 -0.30608974 -0.30608974 -0.30608974 -0.30608974 -0.30608974 -0.29647436 -0.29326923 -0.29006410 -0.29006410 -0.29006410 -0.28846154 -0.28525641 -0.27243590 -0.27083333 -0.26923077 -0.26923077 -0.26762821 -0.26602564 -0.26442308 -0.25480769 -0.25160256 -0.25160256 -0.25000000 -0.24839744 -0.24679487 -0.24679487 -0.24519231 -0.23717949 -0.23237179 -0.23237179 -0.23237179 -0.23237179 -0.23237179 -0.22756410 -0.22756410 -0.22756410 -0.22115385 -0.21153846 -0.21153846 -0.21153846 -0.21153846 -0.20993590 -0.20833333 -0.20512821 -0.20352564 -0.20192308 -0.20032051 -0.19230769 -0.19070513 -0.19070513 -0.18910256 -0.18910256 -0.18910256 -0.18910256 -0.18750000 -0.18108974 -0.17628205 -0.17628205 -0.17467949 -0.16506410 -0.16506410 -0.15544872 -0.15064103 -0.15064103 -0.14903846 -0.14903846 -0.14743590 -0.14583333 -0.12500000 -0.11378205 -0.10897436 -0.10737179 -0.10576923 -0.10576923 -0.09935897 -0.09455128 -0.09134615 -0.08814103 -0.08493590 -0.06410256 -0.04807692 -0.03365385 -0.01442308 0.05608974Hence our estimate is -.21 and our confidence interval is (-.43, -.03). The interval does not include 0, so we would conclude that Drug I is better. Here we will want the CI based on at least 1000 bootstraps. I did this and got the interval (-.39, -.02). Hence, I get the same conclusion. Now you try it.
Will Contribute Will Not
Before Decision 68 13
After Decision 38 20
:
: : : . :
: : : : : : : : .
: : : : : : :.:.. : : .
. . : .. .: :.: :.:.:.:::::.: . :.:. .. . . .
+---------+---------+---------+---------+---------+-------C1
-0.50 -0.40 -0.30 -0.20 -0.10 0.00