Next: Confidence Intervals for Proportions Up: Confidence Intervals Previous: Introduction

# Confidence Intervals for Means

Lets pick on the mean, . That is, we have a population with unknown mean . So we take a random sample of size n from this distribution, say, X1, X2, ... , Xn. Then our estimate of is the sample average .

Income Example : Suppose we take a sample of 25 students from Smith University and record their family incomes. Suppose the incomes (in thousands of dollars) are:

28     29     35     42     42     44     50     52     54     56     59
78     84     90     95    101    108    116    121    122    133    150
158    167    235

The data have been sorted. So the lowest income is $28,000 and the highest income is$235,000. The average is (Either add up all the numbers or use the summary module) 89.96, i.e, about $90,000. So now we need to determine how much our estimate missed by. In general, our estimate of is . And we know something about the distribution of . The Central Limit Theorem tells us that the distribution of is approximately normal with mean (the population mean) and standard deviation , ( is the population standard deviation). By the empirical rule , 95% of the time falls in the interval to , (1.96 is more accurate than 2 which we have been using). A picture of it is seen in Figure 7.1. We need an interval which we are fairly confident contains . The interval in the above plot occurs 95% of the time. It's endpoints are the 2.5 and 97.5 percentiles of the distribution of . But we can't use it because we don't know . Well if you don't know it, estimate it. Ignoring , consider the interval Oddly enough, this interval works. When will this interval not cover ? If then the right side of the interval will be less than . This will happen 2.5% of the time. If then the left side of the interval will be greater than . This will happen 2.5% of the time. If these two things don't occur then the interval will contain . That is, this interval will contain 95% of the time. What's that? We don't know so we can't use the interval! That's right. We will replace by the sample standard deviation s. Thus the interval we will use is: Income Example : Lets apply to the income example. Recall that the data are: 28 29 35 42 42 44 50 52 54 56 59 78 84 90 95 101 108 116 121 122 133 150 158 167 235  Recall the average income is 89.96. The sample standard deviation is (Either do it by hand or check the numerical summaries button in the summary module): Rweb:> # STANDARD DEVIATION of x Rweb:> var(1)^.5 [1] 51.68  Hence s = 51.68. Note for the interval we actually need which is called Standard Error of the Mean : . So the interval we want is:  (89.96 - 1.96*10.33, 89.96 + 1.96*10.33) (69.71, 110.21)  So we estimate the mean family income of a Smith University student to be between$69,710 to $110,210. Our error of estimation is ; i.e.,$20,250. That seems like a lot. How can we reduce the error of estimation? A larger sample size; i.e, as n gets larger, gets smaller.

Interpretation. What is this interval? One way of thinking about it is: the probability that the random interval traps is .95. What the heck does this mean? Think of it this way. This interval is a result of a Bernoulli trial with probability of success .95. In practice, we have only one sample and one interval. It will either catch or not. But it is the outcome of a Bernoulli trial with probability of success .95. Hence, we are fairly confident of success. So we call it a 95% confidence interval.

Other Remarks. There are two approximations in our confidence interval:

1.
It is based on the Central Limit Theorem which says the distribution of is approximately normal, and we used it as exactly normal.
2.
We estimated by s.
So our confidence interval is really an approximate confidence interval. It's close enough in most applications.

A final remark of considerable importance: The end points of our confidence interval are estimates of the 2.5 and 97.5 percentiles of the distribution of , the estimator. This will be very important in the section after next.

Exercise 8.2.1
1.
To set ideas, obtain a 95% confidence interval for if the data are:
       10   12  16  18  24

Do this one by hand. The sample mean and standard deviation are easy to get and .
2.
Obtain a 95% confidence interval for if the data are:
      76  87  98 102 111 114 115 115 120 126

First boxplot the data. Next mark the sample average and the endpoints of the confidence interval on the plot. Here's some output from the summary module to do the confidence interval:
Rweb:> summary(variables)
x
Min.   : 76.0
1st Qu.: 99.0
Median :112.5
Mean   :106.4
3rd Qu.:115.0
Max.   :126.0

Rweb:> # STANDARD DEVIATION of x
Rweb:> var(x)^.5
[1] 15.5863

3.
Obtain a 95% confidence interval for if the data are:
     6   8  14  30  31  32  51  57  87  87 109 145 156 171 342

First boxplot the data. Next mark the sample average and the endpoints of the confidence interval on the plot. Here's the output from the summary module to do the confidence interval:
Rweb:> summary(variables)
x
Min.   :  6.0
1st Qu.: 30.5
Median : 57.0
Mean   : 88.4
3rd Qu.:127.0
Max.   :342.0

Rweb:> # STANDARD DEVIATION of x
Rweb:> var(x)^.5
[1] 88.8005

4.
Consider the following sample of Etruscan skull sizes: Obtain a 95% confidence interval for .
  141  145  145  146  142  126  144  146  154  149  143  131

(Ans: ).
5.
Same as the last question for a sample of size 10 of Italian skull sizes:
   134  132  126  134  131  130  130  125  132  126

(Ans: ).
6.
Plot the confidence intervals from the last two problems on a line. What do conclude about the true mean skull sizes of Etruscans and Italians based on this comparison?
7.
Now use all the Etruscan and Italian data (Appendix A) to do the last three exercises.

Next: Confidence Intervals for Proportions Up: Confidence Intervals Previous: Introduction

2001-01-01