Difference Between Proportions : Dependent Samples

As an example suppose a national poll was conducted in which voter were asked who they are going to vote for if the election was held today : Bush, Gore, Nader, etc., I don't know, I never vote, etc. Let *p*_{B} and *p*_{G} be the two proportions of citizens who will vote for Bush and Gore, respectively. Then the difference of interest is *p*_{B} - *p*_{G}. The estimate is, of course,
the sample proportions in our poll.

We need a confidence interval for it. We could get a confidence interval by resampling but in this case we will use the CLT and give a formula. The main reason for doing this is that these types of polls occur all the time, so in the future you may want to impress your friends and obtain the error margin of the poll. Suppose in the poll *n* votes were sampled (at random!!). The error in the poll is

and the 95% confidence interval for

Conclusions based on the confidence interval:

- 1.
- If 0 is in the confidence interval then the results are inconclusive. The paper might use the term "too close to call".
- 2.
- If the confidence interval consists entirely of negative values then the result is significant and the poll is predicting that Gore will win. Remember the poll begins with, "If the election was held today, ... ". The poll is only good for "this time". Things can change, but at the moment the poll is predicting that Gore will win, with 95% confidence in that prediction.
- 3.
- If the confidence interval consists entirely of positive values then the poll is predicting that Bush will win, with 95% confidence in that prediction.

**Example :** Poll was over 1500 voters. The results are

Bush Gore All Others 580 595 325

Then
and
; hence the error is

So the 95% confidence interval is

Based on this confidence interval (0 is in it), we would say the election is too close to call.