Comparing the Averages of Two Independent Samples

Is there "grade inflation" in WMU? How does the average GPA of WMU students today compare with, say 10, years ago? Suppose a random sample of 100 student records from 10 years ago yields a sample average GPA of 2.90 with a standard deviation of .40. A random sample of 100 current students today yields a sample average of 2.98 with a standard deviation of .45. The difference between the two sample means is 2.98-2.90 = .08. Is this proof that GPA's are higher today than 10 years ago? Well....first we need to account for the fact that 2.98 and 2.90 are not the true averages, but are computed from random samples. Therefore, .08 is not the true difference, but simply an estimate of the true difference. Can this estimate miss by much? Fortunately, statistics has a way of measuring the expected size of the ``miss'' (or error of estimation) . For our example, it is .06 (we show how to calculate this later). Therefore, we can state the bottom line of the study as follows: "The average GPA of WMU students today is .08 higher than 10 years ago, give or take .06 or so."

We now show how to calculate the .06, the standard error of the
estimate. But first, a note on terminology. The estimate .08=2.98-2.90 is
a difference between averages (or means) of two independent random samples.
"Independent" refers to the sampling luck-of-the-draw:
the luck of the second sample is unaffected by the first sample.
In other words, there were two independent chances to have gotten lucky
or unlucky with the sampling.
The likely size of the error of estimation in the .08 is called
the *standard error of the difference* between independent means.
We calculate it using the following formula:

where and .

Note that and are the SE's of and , respectively. The formula looks easier without the notation and the subscripts. 2.98 is a sample mean, and has standard error (since SE= ). Similarly, 2.90 is a sample mean and has standard error . Summarizing, we write the two mean estimates (and their SE's in parentheses) as

2.98 (SE=.045)If two independent estimates are subtracted, the formula ( 7.6) shows how to compute the SE of the difference :

2.90 (SE=.040)

2.98 - 2.90 (SE= )or .08 .06.

Remember the Pythagorean Theorem in geometry?
Think of the two SE's as the length of the two sides
of the triangle (call them *a* and *b*). The SE of the difference
then equals the length of the hypotenuse (SE of difference =
).

We are now ready to state a confidence interval for the difference between two independent means.

The correct *z* critical value for a 95% confidence interval is *z*=1.96. Therefore
a 95% *z*-confidence interval for
is

or (-.04, .20).

There is a second procedure that is preferable when either *n*_{1} or *n*_{2} or both
are small. However, this method needs additional requirements to be satisfied
(at least approximately):

Let

Requirement R1: Both samples follow a normal-shaped histogram

Requirement R2: The population SD's and are equal.

The following confidence interval is called a ``Pooled SD'' or ``Pooled Variance'' confidence interval.

Returning to the grade inflation example, the pooled SD is

Therefore, , , and the difference between means is estimated as

where the second term is the standard error. For a 95% confidence interval, the appropriate value from the

or (-.04, .20).

Note that the *t*-confidence interval ( 7.8) with pooled SD looks like
the *z*-confidence interval ( 7.7),
except that
*S*_{1} and *S*_{2} are replaced by *S*_{p}, and *z* is replaced by *t*.
We present a summary of the situations
under which each method is recommended.

R1 and R2 are both satisfied | R1 or R2 or both not satisfied | |

Both samples are large | Use z or t |
Use z |

One or both samples small | Use t |
Consult a statistician |