The following table contains data on winning bid price for 12 Saturn cars on eBaY in July 2002. The car mileage is also given, and the cars have been arranged in increasing order of Miles.
Car Miles Price ($)
1 9300 7100
2 10565 15500
3 15000 4400
4 15000 4400
5 17764 5900
6 57000 4600
7 65940 8800
8 73676 2000
9 77006 2750
10 93739 2550
11 146088 960
12 153260 1025
Problem: Based on the data, how much do I expect to get for a Saturn car that has been driven 60000 miles?An initial analysis would go like this: "Car 7 has 65000 miles and has a bid of $8800. I should expect to get a little more for mine, maybe $9000(?). However, Car 6 only has 57000 miles, yet the high bid is only $4600. Based on this observation, I should expect to get a little less than $4600, maybe $4400 (?)." This type of ad hoc data analysis looks at a few observations (Cars 6 and 7) without considering the rest of the data.
Simple linear regression is a data analysis technique that tries to find a linear pattern in the data. In linear regression, we use all of the data to calculate a straight line which may be used to predict Price based on Miles. Since Miles is used to predict Price, Miles is called an `Explanatory Variable' while Price is called a `Response Variable'. Table 11.1 shows a scatterplot of Price (on the Y-axis) versus Miles (on the X-axis):
Notice that the points seem to fall around a straight line sloping downwards. Can you draw this line? We will discuss one way to do this, called the least squares (LS) method. For now, suppose that the LS line has already been computed (we will do this later). The LS line overlayed on the scatterplot looks like Figure 11.2.
The formula for this line, in the form Y= a + bX, is
We can now use the line to predict
the selling price of a car with 60000 miles. What is
the height or Y value of the line at X=60000? The answer is