Confidence Intervals for Means

Lets pick on the mean, .
That is, we have a population with **unknown**
mean .
So we take a random sample of size *n* from this distribution,
say,
*X*_{1}, *X*_{2}, ... , *X*_{n}. Then our estimate of
is the sample average .

**Income Example** : Suppose we take a sample of 25 students from
Smith University and record their family incomes. Suppose the incomes (in
thousands of dollars) are:

28 29 35 42 42 44 50 52 54 56 59 78 84 90 95 101 108 116 121 122 133 150 158 167 235The data have been sorted. So the lowest income is $28,000 and the highest income is $235,000. The average is (Either add up all the numbers or use the summary module) 89.96, i.e, about $90,000. So now we need to determine how much our estimate missed by.

In general, our estimate of is . And we know something about the distribution of . The Central Limit Theorem tells us that the distribution of is approximately normal with mean (the population mean) and standard deviation , ( is the population standard deviation). By the empirical rule , 95% of the time falls in the interval to , (1.96 is more accurate than 2 which we have been using). A picture of it is seen in Figure 7.1.

We need an interval which we are fairly confident contains
.
The interval in the above plot
occurs 95% of the time. **It's endpoints are the 2.5 and 97.5 percentiles
of the distribution of ****.** But we can't use it because we don't know .
Well if you don't know it, estimate it. Ignoring ,
consider the interval

Oddly enough, this interval works. When will this interval not cover ? If
then the right side of the interval
will be less than .
This will happen 2.5% of the time. If
then the left side of the interval
will
be greater than .
This will happen 2.5% of the time. If these
two things don't occur then the interval
will contain .
That is, this interval will contain
95% of the time.

What's that? We don't know
so we can't use the interval!
That's right. We will replace
by the sample standard deviation
*s*.
Thus the interval we will use is:

**Income Example** : Lets apply to the income example. Recall that
the data are:

28 29 35 42 42 44 50 52 54 56 59 78 84 90 95 101 108 116 121 122 133 150 158 167 235Recall the average income is 89.96. The sample standard deviation is (Either do it by hand or check the

Rweb:> # STANDARD DEVIATION of x Rweb:> var(1)^.5 [1] 51.68Hence

(89.96 - 1.96*10.33, 89.96 + 1.96*10.33) (69.71, 110.21)So we estimate the mean family income of a Smith University student to be between $69,710 to $110,210. Our error of estimation is ; i.e., $20,250. That seems like a lot. How can we reduce the error of estimation? A larger sample size; i.e, as

**Interpretation**. What is this interval? One way of thinking about
it is: the probability that the random interval
traps
is .95. What the heck does this mean? Think of it this way. This interval is a result of a Bernoulli
trial with probability of success .95. In practice, we have only one sample
and one interval. It will either catch
or not. But it is the
outcome of a Bernoulli trial with probability of success .95. Hence, we
are fairly confident of success. So we call it a 95% confidence interval.

**Other Remarks.** There are two approximations in our confidence
interval:

- 1.
- It is based on the Central Limit Theorem which says the distribution of is approximately normal, and we used it as exactly normal.
- 2.
- We estimated
by
*s*.

**A final remark of considerable importance: The end points of our
confidence interval are estimates of the 2.5 and 97.5 percentiles of the
distribution of ****, the estimator. This will be very important
in the section after next.**

- 1.
- To set ideas, obtain a 95% confidence interval for
if the data are:
10 12 16 18 24

Do this one by hand. The sample mean and standard deviation are easy to get and . - 2.
- Obtain a 95% confidence interval for
if the data are:
76 87 98 102 111 114 115 115 120 126

First boxplot the data. Next mark the sample average and the endpoints of the confidence interval on the plot. Here's some output from the summary module to do the confidence interval:Rweb:> summary(variables) x Min. : 76.0 1st Qu.: 99.0 Median :112.5 Mean :106.4 3rd Qu.:115.0 Max. :126.0 Rweb:> # STANDARD DEVIATION of x Rweb:> var(x)^.5 [1] 15.5863

- 3.
- Obtain a 95% confidence interval for
if the data are:
6 8 14 30 31 32 51 57 87 87 109 145 156 171 342

First boxplot the data. Next mark the sample average and the endpoints of the confidence interval on the plot. Here's the output from the summary module to do the confidence interval:Rweb:> summary(variables) x Min. : 6.0 1st Qu.: 30.5 Median : 57.0 Mean : 88.4 3rd Qu.:127.0 Max. :342.0 Rweb:> # STANDARD DEVIATION of x Rweb:> var(x)^.5 [1] 88.8005

- 4.
- Consider the following sample of Etruscan skull sizes: Obtain a 95% confidence interval for .
141 145 145 146 142 126 144 146 154 149 143 131

(Ans: ). - 5.
- Same as the last question for a sample of size 10 of Italian skull sizes:
134 132 126 134 131 130 130 125 132 126

(Ans: ). - 6.
- Plot the confidence intervals from the last two problems on a line. What do conclude about the true mean skull sizes of Etruscans and Italians based on this comparison?
- 7.
- Now use all the Etruscan and Italian data (Appendix A) to do the last three exercises.