next up previous contents index
Next: Confidence Intervals for Proportions Up: Confidence Intervals Previous: Introduction

  
Confidence Intervals for Means

Lets pick on the mean, $\mu$. That is, we have a population with unknown mean $\mu$. So we take a random sample of size n from this distribution, say, X1, X2, ... , Xn. Then our estimate of $\mu$ is the sample average $\bar{X}$.

Income Example : Suppose we take a sample of 25 students from Smith University and record their family incomes. Suppose the incomes (in thousands of dollars) are:

28     29     35     42     42     44     50     52     54     56     59
78     84     90     95    101    108    116    121    122    133    150
158    167    235
The data have been sorted. So the lowest income is $28,000 and the highest income is $235,000. The average is (Either add up all the numbers or use the summary module) 89.96, i.e, about $90,000. So now we need to determine how much our estimate missed by.

In general, our estimate of $\mu$ is $\bar{X}$. And we know something about the distribution of $\bar{X}$. The Central Limit Theorem tells us that the distribution of $\bar{X}$ is approximately normal with mean $\mu$ (the population mean) and standard deviation $\sigma/\sqrt{n}$, ($\sigma$ is the population standard deviation). By the empirical rule , 95% of the time $\bar{X}$ falls in the interval $\mu - 1.96 \frac{\sigma}{\sqrt{n}}$ to $\mu + 1.96\frac{\sigma}{\sqrt{n}}$, (1.96 is more accurate than 2 which we have been using). A picture of it is seen in Figure 7.1.


  
Figure 7.1: A 95% confidence interval
\begin{figure}
\begin{center}
\epsfig{file=fig35.ps, height=5in, width=5in, angle= -90}\end{center}\end{figure}

We need an interval which we are fairly confident contains $\mu$. The interval in the above plot $(\mu - 1.96 \frac{\sigma}{\sqrt{n}}, \mu + 1.96 \frac{\sigma}{\sqrt{n}})$ occurs 95% of the time. It's endpoints are the 2.5 and 97.5 percentiles of the distribution of $\bar{X}$. But we can't use it because we don't know $\mu$. Well if you don't know it, estimate it. Ignoring $\sigma$, consider the interval

\begin{displaymath}(\bar{X} - 1.96 \frac{\sigma}{\sqrt{n}}, \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}})
\end{displaymath}

Oddly enough, this interval works. When will this interval not cover $\mu$? If $\bar{X} < \mu - 1.96 \frac{\sigma}{\sqrt{n}}$ then the right side of the interval $\bar{X} + 1.96 \frac{\sigma}{\sqrt{n}}$ will be less than $\mu$. This will happen 2.5% of the time. If $\bar{X} > \mu + 1.96 \frac{\sigma}{\sqrt{n}}$ then the left side of the interval $\bar{X} - 1.96 \frac{\sigma}{\sqrt{n}}$ will be greater than $\mu$. This will happen 2.5% of the time. If these two things don't occur then the interval $(\bar{X} - 1.96 \frac{\sigma}{\sqrt{n}}, \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}})$ will contain $\mu$. That is, this interval will contain $\mu$ 95% of the time.

What's that? We don't know $\sigma$ so we can't use the interval! That's right. We will replace $\sigma$ by the sample standard deviation s. Thus the interval we will use is:

\begin{displaymath}(\bar{X} - 1.96 \frac{s}{\sqrt{n}}, \bar{X} + 1.96 \frac{s}{\sqrt{n}})
\end{displaymath}

Income Example : Lets apply to the income example. Recall that the data are:

28     29     35     42     42     44     50     52     54     56     59
78     84     90     95    101    108    116    121    122    133    150
158    167    235
Recall the average income is 89.96. The sample standard deviation is (Either do it by hand or check the numerical summaries button in the summary module):
Rweb:> # STANDARD DEVIATION of x  
Rweb:> var(1)^.5  
[1]    51.68
Hence s = 51.68. Note for the interval we actually need $s/\sqrt{n}$ which is called Standard Error of the Mean  : $s/\sqrt{n} = 10.33$. So the interval we want is:
           (89.96 - 1.96*10.33, 89.96 + 1.96*10.33)
                        (69.71, 110.21)
So we estimate the mean family income of a Smith University student to be between $69,710 to $110,210. Our error of estimation is $1.96 \times 10.33 = 20.25$; i.e., $20,250. That seems like a lot. How can we reduce the error of estimation? A larger sample size; i.e, as n gets larger, $s/\sqrt{n}$ gets smaller.

Interpretation. What is this interval? One way of thinking about it is: the probability that the random interval $(\bar{X} - 1.96 \frac{s}{\sqrt{n}}, \bar{X} + 1.96 \frac{s}{\sqrt{n}})$ traps $\mu$ is .95. What the heck does this mean? Think of it this way. This interval is a result of a Bernoulli trial with probability of success .95. In practice, we have only one sample and one interval. It will either catch $\mu$ or not. But it is the outcome of a Bernoulli trial with probability of success .95. Hence, we are fairly confident of success. So we call it a 95% confidence interval. 

Other Remarks. There are two approximations in our confidence interval:

1.
It is based on the Central Limit Theorem which says the distribution of $\bar{X}$ is approximately normal, and we used it as exactly normal.
2.
We estimated $\sigma$ by s.
So our confidence interval is really an approximate confidence interval. It's close enough in most applications.

A final remark of considerable importance: The end points of our confidence interval are estimates of the 2.5 and 97.5 percentiles of the distribution of $\bar{X}$, the estimator. This will be very important in the section after next.


Exercise 8.2.1  
1.
To set ideas, obtain a 95% confidence interval for $\mu$ if the data are:
       10   12  16  18  24
Do this one by hand. The sample mean and standard deviation are easy to get and $\sqrt{5} = 2.24$.
2.
Obtain a 95% confidence interval for $\mu$ if the data are:
      76  87  98 102 111 114 115 115 120 126
First boxplot the data. Next mark the sample average and the endpoints of the confidence interval on the plot. Here's some output from the summary module to do the confidence interval:
Rweb:> summary(variables)  
        x         
 Min.   : 76.0   
 1st Qu.: 99.0   
 Median :112.5   
 Mean   :106.4   
 3rd Qu.:115.0   
 Max.   :126.0   

Rweb:> # STANDARD DEVIATION of x  
Rweb:> var(x)^.5  
[1] 15.5863
3.
Obtain a 95% confidence interval for $\mu$ if the data are:
     6   8  14  30  31  32  51  57  87  87 109 145 156 171 342
First boxplot the data. Next mark the sample average and the endpoints of the confidence interval on the plot. Here's the output from the summary module to do the confidence interval:
Rweb:> summary(variables)  
        x         
 Min.   :  6.0   
 1st Qu.: 30.5   
 Median : 57.0   
 Mean   : 88.4   
 3rd Qu.:127.0   
 Max.   :342.0

Rweb:> # STANDARD DEVIATION of x  
Rweb:> var(x)^.5  
[1] 88.8005
4.
Consider the following sample of Etruscan skull sizes: Obtain a 95% confidence interval for $\mu$.
  141  145  145  146  142  126  144  146  154  149  143  131
(Ans: $142.6 \pm 4.25$).
5.
Same as the last question for a sample of size 10 of Italian skull sizes:
   134  132  126  134  131  130  130  125  132  126
(Ans: $130 \pm 2.04$).
6.
Plot the confidence intervals from the last two problems on a line. What do conclude about the true mean skull sizes of Etruscans and Italians based on this comparison?
7.
Now use all the Etruscan and Italian data (Appendix A) to do the last three exercises.


next up previous contents index
Next: Confidence Intervals for Proportions Up: Confidence Intervals Previous: Introduction

2001-01-01