next up previous contents index
Next: Regression : Second Pass Up: Design of Experiments Previous: Signed-Rank Wilcoxon

  
Difference Between Proportions : Dependent Samples

In Chapter 9 we talked about a difference in proportions. This involved two independent samples. Another difference in proportions which involves one sample occurs quite frequently and we would be remiss if we didn't discuss it.

As an example suppose a national poll was conducted in which voter were asked who they are going to vote for if the election was held today : Bush, Gore, Nader, etc., I don't know, I never vote, etc. Let pB and pG be the two proportions of citizens who will vote for Bush and Gore, respectively. Then the difference of interest is pB - pG. The estimate is, of course, $\hat{p}_B - \hat{p}_G$ the sample proportions in our poll.

We need a confidence interval for it. We could get a confidence interval by resampling but in this case we will use the CLT and give a formula. The main reason for doing this is that these types of polls occur all the time, so in the future you may want to impress your friends and obtain the error margin of the poll. Suppose in the poll n votes were sampled (at random!!). The error in the poll is

\begin{displaymath}\textrm{Error } = 1.96 \sqrt{\frac{\hat{p}_B + \hat{p}_G - (\hat{p}_B - \hat{p}_G)^2}{n}}
\end{displaymath}

and the 95% confidence interval for pB - pG is

\begin{displaymath}\hat{p}_B - \hat{p}_G \pm 1.96 \sqrt{\frac{\hat{p}_B + \hat{p}_G - (\hat{p}_B - \hat{p}_G)^2}{n}}
\end{displaymath}

Conclusions based on the confidence interval:

1.
If 0 is in the confidence interval then the results are inconclusive. The paper might use the term "too close to call".
2.
If the confidence interval consists entirely of negative values then the result is significant and the poll is predicting that Gore will win. Remember the poll begins with, "If the election was held today, ... ". The poll is only good for "this time". Things can change, but at the moment the poll is predicting that Gore will win, with 95% confidence in that prediction.
3.
If the confidence interval consists entirely of positive values then the poll is predicting that Bush will win, with 95% confidence in that prediction.

Example : Poll was over 1500 voters. The results are

    Bush    Gore    All Others
    580     595     325

Then $\hat{p}_B = 580/1500 = 0.3867$ and $\hat{p}_G = 595/1500 = 0.3967$ ; hence the error is

\begin{displaymath}\textrm{Error } = 1.96 \sqrt{\frac{\hat{p}_B + \hat{p}_G - (\...
... \hat{p}_G)^2}{n}} = 1.96 \sqrt{\frac{.7834 - .0001}{1500}} = .0448
\end{displaymath}

So the 95% confidence interval is

\begin{displaymath}(.3867 - .3967) \pm .0448
\end{displaymath}


\begin{displaymath}(-.0348 \, , \, .0548)
\end{displaymath}

Based on this confidence interval (0 is in it), we would say the election is too close to call.
next up previous contents index
Next: Regression : Second Pass Up: Design of Experiments Previous: Signed-Rank Wilcoxon

2001-01-01