Next: Using Excel Up: Linear Regression Previous: Multiple Linear Regression

# Optional: The Multicollinearity Problem

When the X-variables are highly correlated, the coefficients lose their meaning. Consider the following hypothetical data:

                           X1     X2      Y

10     20     140
20     40     180
30     60     220
40     80     260
50    100     300


The data is fit perfectly by the model

However, note that X2= 2 (X1) (i.e. one is a linear function of the other). Thus, the following models, all with different coefficients, also fit perfectly.

 Model B: Y= 100 + 4 (X1) + 0 (X2) Model C: Y= 100 + 0 (X1) + 2 (X2) Model D: Y= 100 - 6 (X1) + 5 (X2) Model E: Y= 100 + 10 (X1) - 3 (X2)

Since a model with fits just as well as a model with , the beta coefficents have no real meaningful interpretation. In statistics, this coexistence of multiple correct models is called the nonidentifiability problem. It is a consequence of one of the X-variables being a linear combination of the others, also called the multicollinearity problem.

Moral: Avoid models with X-variables which are too strongly correlated with each other.

Of course, sometimes multicollinearity is unavoidable. In the Saturn price example, the two X-variables MILES and YEAR have correlation r=-.91, so the multicollineariy problem exists to some extent. Therefore, the coefficients have to be very carefully interpreted. One way to minimize this collinearity problem is to try to include cars which are new but have high mileage, or old but have low mileage. This will reduce the correlation between MILES and YEAR.

Next: Using Excel Up: Linear Regression Previous: Multiple Linear Regression

2003-09-08