Next: Design of Experiments Up: Estimation of Effect : Previous: Estimation and Confidence Intervals

# Difference Between Proportions

There are many other two sample problems which time does not allow us to consider. You can always sign up for another stat class and I'll be happy to recommend some to you. But we would be remiss if we didn't discuss the difference in proportions  problem. Also, it's a cinch with resampling.

So consider two population proportions. Examples are far too numerous to list here. One that occurs all too often is: the president's rating this month versus his rating last month. Another one that is the sign of the times is: Candidate A is worried about his financial backing. He needs to show that his popularity (i.e. proportion that will vote for him) is rising in order to attract more money. So he decides that he will come out strongly for (or against) an issue that is very popular with certain segments of the population. He then talks about this nonstop on morning, afternoon, and evening talk shows. Population I is the proportion of voters who favor him before this takes place and Population II is the proportion of voters who favor him after this change and the subsequent push on the talk shows. He must convince his backers that there has been an increase in the proportion of voters who favor him.

Ah, notation, but it's simple here. Just let p1 and p2 be the population proportions of Populations I and II, respectively. We are interested in estimating and determining a confidence interval for p2 - p1. So we draw random samples from Populations I and II of size m and n, respectively. Our estimate of p2 - p1 is just the difference in sample proportions. We will do the CI next, but first lets look at an example:

There are two different treatments (Drug I and Drug II) for a certain disease. Which is better? A scientist comes up with the following plan: He selects 100 patients who have the disease. He randomly assigns them to Drug I or II by a preassigned random scheme (in particular he does not decide!). The patients are treated by doctors who do not know which drug the patient is getting. At the end of the treatment period the proportion cured by each drug is tabulated. This is called a double blind study . Suppose the results are:

```             Cured   Not Cured
Drug I         39      13
Drug II        26      22
```
The estimate of p2 - p1 is 26/48 - 39/52 = .54 - .75 = -.21. So it looks like Drug I is better. What's that? Oh yes. How could I forget? Small samples, sampling error, etc. We need a confidence interval.

Resampling to the rescue. Recall that sample proportions are sample means. So we can use the algorithm of the last section, but we do need the samples. These are not the tabled values above, but what produced the tabled results. The first sample consists of 39 1's and 13 0's. The second sample consists of 26 1's and 22 0's. Here they are:

```X Drug I:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0

Y Drug II:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
```
Now just call up class code for difference in means, drop these samples in the X and Y boxes and submit. To set ideas, here are the results of 100 resampled difference in proportions:
```  -0.47115385 -0.45352564 -0.43269231 -0.42948718 -0.41346154 -0.39583333
-0.34455128 -0.33012821 -0.33012821 -0.32852564 -0.32532051 -0.31089744
-0.31089744 -0.30929487 -0.30769231 -0.30769231 -0.30608974 -0.30608974
-0.30608974 -0.30608974 -0.30608974 -0.29647436 -0.29326923 -0.29006410
-0.29006410 -0.29006410 -0.28846154 -0.28525641 -0.27243590 -0.27083333
-0.26923077 -0.26923077 -0.26762821 -0.26602564 -0.26442308 -0.25480769
-0.25160256 -0.25160256 -0.25000000 -0.24839744 -0.24679487 -0.24679487
-0.24519231 -0.23717949 -0.23237179 -0.23237179 -0.23237179 -0.23237179
-0.23237179 -0.22756410 -0.22756410 -0.22756410 -0.22115385 -0.21153846
-0.21153846 -0.21153846 -0.21153846 -0.20993590 -0.20833333 -0.20512821
-0.20352564 -0.20192308 -0.20032051 -0.19230769 -0.19070513 -0.19070513
-0.18910256 -0.18910256 -0.18910256 -0.18910256 -0.18750000 -0.18108974
-0.17628205 -0.17628205 -0.17467949 -0.16506410 -0.16506410 -0.15544872
-0.15064103 -0.15064103 -0.14903846 -0.14903846 -0.14743590 -0.14583333
-0.12500000 -0.11378205 -0.10897436 -0.10737179 -0.10576923 -0.10576923
-0.09935897 -0.09455128 -0.09134615 -0.08814103 -0.08493590 -0.06410256
-0.04807692 -0.03365385 -0.01442308  0.05608974
```
Hence our estimate is -.21 and our confidence interval is (-.43, -.03). The interval does not include 0, so we would conclude that Drug I is better. Here we will want the CI based on at least 1000 bootstraps. I did this and got the interval (-.39, -.02). Hence, I get the same conclusion. Now you try it.

Exercise 10.4.1
1.
If you didn't do it, use class code to obtain a 95% confidence interval for the true difference in proportions for the above example of the two different drugs.
2.
Should the president be worried? Polls of wealthy financial backers before and after she made a controversial decision were tabulated and given to her. What do you think? Base your answer on a 95% confidence interval.
```                        Will Contribute   Will Not
Before Decision        68              13
After Decision         38              20
```
3.
In the example of the two different drugs found above, the sorted resampled differences in proportions were given. Here's a dot plot of them. Locate the estimate and the confidence interval on it.
```
:
:     : : . :
: : : : : : :   :   .
: : : : : : :.:.. :   : .
. . : ..     .: :.: :.:.:.:::::.: . :.:. .. . .      .
+---------+---------+---------+---------+---------+-------C1
-0.50     -0.40     -0.30     -0.20     -0.10      0.00
```

Next: Design of Experiments Up: Estimation of Effect : Previous: Estimation and Confidence Intervals

2001-01-01