There are many other two sample problems which time does not allow us to
consider. You can always sign up for another stat class and I'll be happy
to recommend some to you. But we would be remiss if we didn't discuss the
difference in proportions problem. Also, it's a cinch with resampling.

So consider two population proportions. Examples are far too numerous
to list here. One that occurs all too often is: the president's rating
this month versus his rating last month. Another one that is the sign of
the times is: Candidate A is worried about his financial backing. He needs
to show that his popularity (i.e. proportion that will vote for him) is
rising in order to attract more money. So he decides that he will come out
strongly for (or against) an issue that is very popular with certain segments
of the population. He then talks about this nonstop on morning, afternoon,
and evening talk shows. Population I is the proportion of voters who favor
him before this takes place and Population II is the proportion of voters
who favor him after this change and the subsequent push on the talk shows. He must convince his backers that there has been an increase in the proportion of voters who favor him.

Ah, notation, but it's simple here. Just let *p*_{1} and
*p*_{2}
be the population proportions of Populations I and II, respectively. We
are interested in estimating and determining a confidence interval for
*p*_{2} - *p*_{1}. So we draw random samples from Populations I and II
of size *m* and *n*, respectively. Our estimate of *p*_{2} - *p*_{1} is just the difference in sample proportions. We will
do the CI next, but first lets look at an example:

There are two different treatments (Drug I and Drug II) for a certain
disease. Which is better? A scientist comes up with the following plan:
He selects 100 patients who have the disease. He randomly assigns them
to Drug I or II by a preassigned random scheme (in particular he does not
decide!). The patients are treated by doctors who do not know which drug
the patient is getting. At the end of the treatment period the proportion
cured by each drug is tabulated. This is called a **double blind study** .
Suppose the results are:

Cured Not Cured Drug I 39 13 Drug II 26 22The estimate of

Resampling to the rescue. Recall that sample proportions are sample
means. So we can use the algorithm of the last section, but we do need
the samples. These are not the tabled values above, but what produced the
tabled results. The first sample consists of 39 *1*'s and 13 *0*'s.
The second sample consists of 26 *1*'s and 22 *0*'s. Here they
are:

X Drug I: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Y Drug II: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Now just call up class code for difference in means, drop these samples in the X and Y boxes and submit. To set ideas, here are the results of 100 resampled difference in proportions:

-0.47115385 -0.45352564 -0.43269231 -0.42948718 -0.41346154 -0.39583333 -0.34455128 -0.33012821 -0.33012821 -0.32852564 -0.32532051 -0.31089744 -0.31089744 -0.30929487 -0.30769231 -0.30769231 -0.30608974 -0.30608974 -0.30608974 -0.30608974 -0.30608974 -0.29647436 -0.29326923 -0.29006410 -0.29006410 -0.29006410 -0.28846154 -0.28525641 -0.27243590 -0.27083333 -0.26923077 -0.26923077 -0.26762821 -0.26602564 -0.26442308 -0.25480769 -0.25160256 -0.25160256 -0.25000000 -0.24839744 -0.24679487 -0.24679487 -0.24519231 -0.23717949 -0.23237179 -0.23237179 -0.23237179 -0.23237179 -0.23237179 -0.22756410 -0.22756410 -0.22756410 -0.22115385 -0.21153846 -0.21153846 -0.21153846 -0.21153846 -0.20993590 -0.20833333 -0.20512821 -0.20352564 -0.20192308 -0.20032051 -0.19230769 -0.19070513 -0.19070513 -0.18910256 -0.18910256 -0.18910256 -0.18910256 -0.18750000 -0.18108974 -0.17628205 -0.17628205 -0.17467949 -0.16506410 -0.16506410 -0.15544872 -0.15064103 -0.15064103 -0.14903846 -0.14903846 -0.14743590 -0.14583333 -0.12500000 -0.11378205 -0.10897436 -0.10737179 -0.10576923 -0.10576923 -0.09935897 -0.09455128 -0.09134615 -0.08814103 -0.08493590 -0.06410256 -0.04807692 -0.03365385 -0.01442308 0.05608974Hence our estimate is -.21 and our confidence interval is (-.43, -.03). The interval does not include 0, so we would conclude that Drug I is better. Here we will want the CI based on at least 1000 bootstraps. I did this and got the interval (-.39, -.02). Hence, I get the same conclusion. Now you try it.

- 1.
- If you didn't do it, use class code to obtain a 95% confidence interval for the true difference in proportions for the above example of the two different drugs.
- 2.
- Should the president be worried? Polls of wealthy financial backers before and after she made a controversial decision were tabulated and given to her.
What do you think? Base your answer on a 95% confidence interval.
Will Contribute Will Not Before Decision 68 13 After Decision 38 20

- 3.
- In the example of the two different drugs found above, the sorted resampled differences in proportions were given. Here's a dot plot of them. Locate the
estimate and the confidence interval on it.
: : : : . : : : : : : : : : . : : : : : : :.:.. : : . . . : .. .: :.: :.:.:.:::::.: . :.:. .. . . . +---------+---------+---------+---------+---------+-------C1 -0.50 -0.40 -0.30 -0.20 -0.10 0.00