next up previous
Next: About this document ...

Simple Linear Regression

Consider the following exercise.

Suppose that the management of a chain of package delivery stores would like to develop a model for predicting the weekly sales (in thousands of dollars) for individual stores based on the number of customers who made purchases.   A random sample of 20 stores was selected from among all the stores in the chain.   Since we wish to predict Sales with number of Customers, that makes Sales the dependent, response, or "Y" variable, and number of Customers is the independent, explanatory, or "X" variable.

Customers    Weekly Sales
---------    ------------
   907          11.20
   926          11.05
   506           6.84
   741           9.21
   789           9.42
   889          10.08
   874           9.45
   510           6.73
   529           7.24
   420           6.12
   679           7.63
   872           9.43
   924           9.46
   607           7.64
   452           6.92
   729           8.95
   794           9.33
   844          10.23
  1010          11.77
   621           7.41

Part (a) asks for a scatter diagram, part (b) asks for the regression coefficients, part (c) asks for an interpretation of the slope, and part (d) asks for the predicted value of Y when X=600.   Parts (a) and (b) require Excel, and part (d) can either be done by hand or with the computer.

Enter the Package data set into Excel.  
Choose Tools, | Data Analysis, | and then Regression.  
Y Variable Cell Range is b1:b21, since Sales is the response variable.
X Variable Cell Range is a1:a21, since Customers is the explanatory variable.
Make sure the First cells in both ranges contain label box is checked.
Check the Residual plot box, even though this isn't asked for in this problem.
Check the Line Fit Plot box.
Click OK.

Excel dialog box

Many sheets will be created.

(a) Set up a scatter diagram.
Here is the scatter plot from Excel:

Excel scatter plot

Notice the increasing relationship.   As the number of Customers increases, Sales increase.

Here is the regression output:

Excel output

(b) Assuming a linear relationship, use the least-squares method to find the regression coefficients b0 and b1.  

From the output, we see that b0 = 2.423, and b1 = 0.00873.

(c) Interpret the meaning of the slope b1 in this problem.

As the number of Customers increases by 1, Sales increases by $8.73.   Remember that the Sales numbers are in thousands of dollars.

(d) Predict the average weekly Sales (in thousands in dollars) for stores that have 600 customers.

We can plug into the regression equation for this by hand:

\begin{eqnarray*}\hat{Y} & = & 2.423 + 0.00873 X \\
\emph{Sales} & = & 2.423 + 0.00873 (600)\\
& = & 7.661
\end{eqnarray*}


So, the average weekly Sales for stores with 600 Customers is $7,661.

(g) How much variation in Sales is explained by number of Customers?
Answer: $R^2 = 91.19\%$

(h) What is the standard error of the estimated regression line?
Answer: s = .50150, or $501.50 (remember that s is always in Y units!)

(i) Based on the residual plot, does the linear fit look okay?
Note that the residual plot shows up on the SLR sheet, over to the right.

Excel/PHStat residual plot

Answer: Yes, since there isn't any kind of obvious pattern here.

(j) Using $\alpha =.05$, is there evidence of a linear relationship between Sales and number of Customers?
Answer: We are testing $H_0: \beta_1 = 0$ vs. $H_1: \beta_1 \neq 0$
Test statistic: t = 13.6462
p-value $= 6.206 \times 10^{-11} < .05$
Reject H0. There is evidence of a significant relationship.

(k) Give a 95% confidence interval for the true slope.
Answer: We can do this (partially) by hand:

$b_1 \pm t_crit S_{b_1}$
$.00873 \pm 2.1009(.00063969)$
$.00873 \pm .0013439$
(.00739, .01007)
Also, this is included in the output (can you find it?)



 
next up previous
Next: About this document ...

21 August 2003