The Excel Printout

Following is the printout for running regression in Excel.

********************************************************************************* SUMMARY OUTPUT Regression Statistics Multiple R 0.640955792 R Square 0.410824328 0.351906761 Standard Error 3283.378116 Observations 12 ANOVA df SS MS F Significance F Regression 1 75171487.75 75171488 6.972866 0.024707725 Residual 10 107805718.5 10780572 total 11 182977206.3 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 8136.150298 1519.8802 5.353152 0.000322 4749.645588 11522.65501 miles -0.051269039 0.0194155 -2.64062 0.024708 -0.09452957 -0.00800851 *********************************************************************************

In the previous section, we noted that 41% of this total price
variability is explained by the car's mileage (since *R*^{2}=SSR/SSTo=.41).
This `R Square' is reported on the second line of the printout under `Regression Statistics'.

The correlation between Miles and Price (*X* and *Y*) is
*r*=.640955792. The square of this value is
*r*^{2}=(.640955792)^{2}=.410824328.
This number should look familiar, because it is ALSO the value of
SSR/SSTo, which explains why SSR/SSTo is called `R Square'. Note the
use of upper case `R' instead of lower case `r' in the Excel printout.
Use of the upper case R is standard notation in regression studies,
while lower case r tends to be used in correlation studies.

Now move down to the Analysis of Variance (ANOVA) table . It contains three rows; "Regression" (or R for short), "Residual" (or E), and "Total" (or To). The sum of squares column (denoted "SS") gives SSR, SSE, and SSTo respectively. Mathematically, SSR and SSE are not comparable in size because they contain different amounts of information, also called "degrees of freedom" or simply "df". This is where the degrees of freedom column comes in. We have SSE=107.8M, but it contains 10 pieces of information, which averages to 10.78M per piece. On the other hand, SSR=75.1M contains only 1 piece of information, so the average is 75.1M per piece. These "average sum of squares per df" are called "mean squares" and reported under the MS column. Note that MS for Regression is 6.97 times the MS for Residual. This may be interpreted as follows. ``Prices of cars vary partly because they have different mileage, right? How much of the variability in price is due to (or explained by) mileage? Answer: the average variation EXPLAINED by Miles is 6.97 times larger than UNEXPLAINED .'' Is 6.97 statistically significant? The answer is Yes, because there is only a 2.47% chance of getting a ratio that large from pure chance variation. Since this P-value is smaller than 5%, the result is statistically significant.

A generic representation of the ANOVA table for simple regression is given below.
As usual, the sample size is denoted by *n*. (For the Saturn price data,
*n*=12.) The formulas for SSR, SSE and SSTo have already been discussed previously.
MSR and MSE are obtained by division. F is the ratio between MSR and MSE.
The P-value, or observed significance level of the F-ratio, is obtained from
an F-table (which is beyond the scope of this class).

***************************************************************** ANOVA df SS MS F Significance F Regression 1 SSR MSR=SSR/dfR F=MSR/MSE P-value Residual n-2 SSE MSE=SSE/dfE total n-1 SSTo *****************************************************************

Below the ANOVA table in the Excel regression printout are the estimates of intercept and slope (under the column "coefficients"). Their standard errors are also reported along with a t-ratio, p-value for the t-ratio, and 95% confidence interval. The standard errors are interpreted the usual way, as follows. If another random sample of 12 cars were selected, the computed slope of the regression line would probably change. By how much? Answer: by approximately .019. The t-ratio is the estimate divided by its standard error, and the P-value measures how likely it is from chance alone to get a ratio that large.

Consider the following two statements:

- 1.
- The price of a used Saturn car should be around
__$4999__give or take__$4079__or so. - 2.
- The price of a used Saturn car with 80,000 miles should be around
__$4034__give or take__$3283__or so.

The first statement gives no information on mileage of the car. In the absence of an *X*-value,
our best guess is the sample mean
.
The standard error for this prediction
is the *standard deviation from the mean*, which is simply the regular
sample standard deviation
.
The second statement, on the other hand, tells us the car mileage.
The predicted price for a car with 80,000 miles is obtained
from the regression equation as
.
The standard error for this prediction
is the *standard deviation from predicted values*, which turns out to be
the squareroot of the MSE
.
This is reported in the Excel
printout as "Standard Error" (4th line of printout under `Regression Statistics').
To summarize: if a prediction is based on the average, the appropriate give-or-take
is the sample SD. If a prediction is based on a regression line, the
appropriate give-or-take is the squareroot of the MSE.