Next: Optional: The Multicollinearity Problem Up: Linear Regression Previous: The Excel Printout

# Multiple Linear Regression

The term multiple attached to linear regression means that there are two or more X-variables used to predict Y. Recall the Saturn price data again, but this time with the Model year included.

             The mileage, model year, and price (highest bid)
for 12 Saturn cars on eBay in July 2002:

Car    Miles   Year   Price

1    73676   1996    2000
2    77006   1998    2750
3    10565   2000   15500
4   146088   1995     960
5    15000   2001    4400
6    65940   2000    8800
7     9300   2000    7100
8    93739   1996    2550
9   153260   1994    1025
10    17764   2002    5900
11    57000   1998    4600
12    15000   2000    4400


This time, consider the prediction problem:

Problem: How much would I expect to get for a model 1996 Saturn car that has been driven 60000 miles?

Unlike the previous section, we now know the model year of the car, not just the mileage. Since we want to predict PRICE given YEAR and MILES, we want to fit a regression model that looks like this:

PREDICTED Y = a + b1 X1 + b2 X2

where Y=PRICE, X1=MILES, and X2=YEAR. The fitted regression line, in terms of variable names, is

 PREDICTED PRICE = - 800615 - 0.0327 (MILES) + 404 (YEAR) (11.1)

How do we interpret the coefficients in ( 11.1)? For Saturn cars within the same model year, every additional mile driven corresponds to a price drop of about 3 cents (or a drop of about $327 for every 10000 additional miles). Similarly, mileage being equal, a more recent model year will tend to have a higher price, on the average about$404 per year difference.

CAUTION: If the X variables are highly correlated, the coefficients may have a meaningless interpretation. This is called the multicollinearity problem (see optional section below).

We can now use the fitted model ( 11.1) to estimate the selling price of a model year 1996 Saturn car with 60000 miles. Plugging in the desired values for MILES and YEAR, we get

PREDICTED PRICE = - 800615 - 0.0327 (60000) + 404 (1996) = 3807.

A car with the same mileage 60000 miles from model year 1999 (i.e. 3 years more recent) would have what estimated price? Answer: Since it is 3 years more recent, its estimated price should be higher than $3807 by about 3(404)=$1212, which comes out to $3807+$1212=\$5019.

This can be verified by using the fitted model ( 11.1) on the new numbers

PREDICTED PRICE = - 800615 - 0.0327 (60000) + 404 (1999) = 5019.

Next: Optional: The Multicollinearity Problem Up: Linear Regression Previous: The Excel Printout

2003-09-08