next up previous contents index
Next: Multiple Linear Regression Up: More on Simple Regression Previous: More on Simple Regression

   
The Excel Printout

Following is the printout for running regression in Excel.



*********************************************************************************
SUMMARY OUTPUT
Regression Statistics


Multiple R       0.640955792
R Square         0.410824328
                 0.351906761
Standard Error   3283.378116
Observations     12


                                                        
 ANOVA
             df     SS        MS          F        Significance F
Regression   1  75171487.75 75171488 6.972866        0.024707725
Residual    10  107805718.5 10780572 
total       11  182977206.3                                                           

          Coefficients  Standard Error  t Stat   P-value  Lower 95%   Upper 95%
Intercept 8136.150298   1519.8802   5.353152   0.000322 4749.645588   11522.65501
miles    -0.051269039   0.0194155   -2.64062   0.024708 -0.09452957   -0.00800851
*********************************************************************************



In the previous section, we noted that 41% of this total price variability is explained by the car's mileage (since R2=SSR/SSTo=.41). This `R Square' is reported on the second line of the printout under `Regression Statistics'.

The correlation between Miles and Price (X and Y) is r=.640955792. The square of this value is r2=(.640955792)2=.410824328. This number should look familiar, because it is ALSO the value of SSR/SSTo, which explains why SSR/SSTo is called `R Square'. Note the use of upper case `R' instead of lower case `r' in the Excel printout. Use of the upper case R is standard notation in regression studies, while lower case r tends to be used in correlation studies.

Now move down to the Analysis of Variance (ANOVA) table   . It contains three rows; "Regression" (or R for short), "Residual" (or E), and "Total" (or To). The sum of squares column (denoted "SS") gives SSR, SSE, and SSTo respectively. Mathematically, SSR and SSE are not comparable in size because they contain different amounts of information, also called "degrees of freedom" or simply "df". This is where the degrees of freedom column comes in. We have SSE=107.8M, but it contains 10 pieces of information, which averages to 10.78M per piece. On the other hand, SSR=75.1M contains only 1 piece of information, so the average is 75.1M per piece. These "average sum of squares per df" are called "mean squares" and reported under the MS column. Note that MS for Regression is 6.97 times the MS for Residual. This may be interpreted as follows. ``Prices of cars vary partly because they have different mileage, right? How much of the variability in price is due to (or explained by) mileage? Answer: the average variation EXPLAINED by Miles is 6.97 times larger than UNEXPLAINED .'' Is 6.97 statistically significant? The answer is Yes, because there is only a 2.47% chance of getting a ratio that large from pure chance variation. Since this P-value is smaller than 5%, the result is statistically significant.

A generic representation of the ANOVA table for simple regression is given below. As usual, the sample size is denoted by n. (For the Saturn price data, n=12.) The formulas for SSR, SSE and SSTo have already been discussed previously. MSR and MSE are obtained by division. F is the ratio between MSR and MSE. The P-value, or observed significance level of the F-ratio, is obtained from an F-table (which is beyond the scope of this class).   



   *****************************************************************
    ANOVA
                df     SS        MS          F        Significance F
   Regression    1    SSR   MSR=SSR/dfR  F=MSR/MSE      P-value    
   Residual    n-2    SSE   MSE=SSE/dfE
   total       n-1    SSTo                                   

   *****************************************************************



Below the ANOVA table in the Excel regression printout are the estimates of intercept and slope (under the column "coefficients"). Their standard errors are also reported along with a t-ratio, p-value for the t-ratio, and 95% confidence interval. The standard errors are interpreted the usual way, as follows. If another random sample of 12 cars were selected, the computed slope of the regression line would probably change. By how much? Answer: by approximately .019. The t-ratio is the estimate divided by its standard error, and the P-value measures how likely it is from chance alone to get a ratio that large.

Consider the following two statements:

1.
The price of a used Saturn car should be around $4999 give or take $4079 or so.
2.
The price of a used Saturn car with 80,000 miles should be around $4034 give or take $3283 or so.

The first statement gives no information on mileage of the car. In the absence of an X-value, our best guess is the sample mean $\overline{Y}=\$4999$. The standard error for this prediction is the standard deviation from the mean, which is simply the regular sample standard deviation $\mbox{SD}_Y= \$4079$. The second statement, on the other hand, tells us the car mileage. The predicted price for a car with 80,000 miles is obtained from the regression equation as $\mbox{Pred Y}=8136-.05127 (80000)= \$4034$. The standard error for this prediction is the standard deviation from predicted values, which turns out to be the squareroot of the MSE $\sqrt{10780572}=\$3283$. This is reported in the Excel printout as "Standard Error" (4th line of printout under `Regression Statistics'). To summarize: if a prediction is based on the average, the appropriate give-or-take is the sample SD. If a prediction is based on a regression line, the appropriate give-or-take is the squareroot of the MSE.


next up previous contents index
Next: Multiple Linear Regression Up: More on Simple Regression Previous: More on Simple Regression

2003-09-08