Stat 216 Final Exam Review Problems

Note: The final exam will also include some exam 2 topics not covered here.

 

For the next 20 problems, consider the Emeadow data listed on page F-4 of Appendix F. Appraised Value (in thousands of dollars), number of Bedrooms, number of Bathrooms, and Age of the house are given, among other variables, for 74 homes in East Meadow, New York.

For now, letís try to predict appraised Value with the number of Bathrooms.

1.) Which of these two variables is the response, or Y, variable? What does that make the other variable? [Go to answer]

2.) What does the following scatterplot tell us about the relationship between Value and Bathrooms? [Go to answer]

scatterplot

Note: Excel/PhStat automatically and incorrectly put Value on the horizontal axis, and Bathrooms on the vertical axis. When doing regressions, you should look out for this Ė take a close look at the units on each axis and decide if the plot has the variables on the appropriate axes. If not, it can be fixed by doing the following: Click on the sheet for the scatterplot. Pull down the Chart menu and choose Source Data. Click on Series. Now change X values from a2:a75 to d2:d75 (which is the cell range for Bathrooms), and change Y values from d2:d75 to a2:a75 (which is the cell range for Value). Now the plot will be correct.

3.) Here is the regression output from Excel/PhStat. What is the regression equation? Use it to predict the average Value of houses with two Bathrooms. [Go to answer]

regression output

4.) Interpret the slope. [Go to answer]

5.) How much variation in Value is explained by Bathrooms? [Go to answer]

6.) What is the standard error of the estimated regression line? [Go to answer]

7.) Is there evidence at a = .05 that Bathrooms is a significant linear predictor of Value? [Go to answer]

8.) Give a 95% confidence interval for the true slope. [Go to answer]

9.) Suppose we are interested in houses with one Bathroom. See the following estimation output. What is a 99% interval estimate for the average Value of such homes? [Go to answer]

estimation output

10.) Now suppose we are interested in a house with three Bathrooms. See the following estimation output. What is a 99% interval estimate for the Value of a house with that many bathrooms? [Go to answer]

estimation output

11.) Here is the residual plot for this regression. What does this say about the fit of our model? [Go to answer]

residual plot

Now, letís try to improve our predictive power by including a few more explanatory variables. Letís predict Value using number of Bedrooms, number of Bathrooms, and Age of the house. The Excel/PhStat output is below.

multiple regression output

Note: Once again, there is a bit of a "trick" to get want you want from the software. When you specify the X variables cell range, youíll notice that all of the explanatory variables need to be in consecutive columns. Unused variables Ė in this case, Rooms Ė need to be moved or temporarily deleted. So just select the Rooms column (E), and choose Delete from the Edit menu. (You wonít be allowed to save it this way, so donít worry about contaminating the data set for the next person.) Then you can specify c1:e75 for the X variables. This will pick Bedrooms, Bathrooms, and Age as the explanatory variables.

12.) What is the regression equation? [Go to answer]

13.) Interpret the slope estimate for the Age variable. [Go to answer]

14.) Predict the average Value of 20 year-old houses with 3 Bedrooms and 1Ĺ Bathrooms. Is this extrapolation? [Go to answer]

15.) How much variation is accounted for by the three X variables? [Go to answer]

16.) What is the standard error of the estimate? [Go to answer]

17.) Determine if there is a significant relationship between Value and the three explanatory variables at a = .05. [Go to answer]

18.) At a = .05, determine whether each of the three X variables makes a significant contribution to the regression model or not. Be sure to write out your conclusions for all three Xís and what results (i.e., test statistics, p-values) you used to make each conclusion. [Go to answer]

19.) Give a 90% confidence interval for the true slope associated with the Bathroom variable. [Go to answer]

20.) Consider the residual plots for this regression problem. How does the model appear to fit the data? [Go to answer]

residual plots

residual plot

Note: This last plot, residuals vs. fitted values, was not automatically output by Excel/PhStat with the Multiple regression procedure. This plot had to be generated with the "Chart Wizard", and the details wonít be provided here.

 

21.) Before commercials are placed on national television, they undergo testing and modification. Marketing researchers often show one version of a commercial to half the broadcasting audience and a second version to the other half. Then a follow-up telephone survey is conducted to measure the impact of the ad. For the following example, are the two versions of the commercial equally remembered? Use a = .05. [Go to answer]

Commercial

Don't remember

Remember seeing

Remember key point

Version A

19

24

37

Version B

24

28

18

 

 22.) Imported goods can be challenged for infringement of a U.S. patent, copyright, or trademark under Section 337 of a tariff act. Once a section 337 challenge is brought to the International Trade Commission, it results in one of three decisions. The results for 190 challenges, involving three countries, are given below, along with the Excel/PhStat c 2 test output. Is there evidence that trade violation results depend on the country in question? Use a = .10. [Go to answer]

chi-square output

 

23.) The Equal Credit Opportunity Act forbids the lenders in the US from asking the marital status of women who are applying for personal loans. Many women feel that this act should be extended to include business loans, citing instances where women received business loans only after the lender determined that they were married to men who had good credit ratings (Business Week, 27 May 1985). Suppose that a womenís group has collected data on the business loan applications of 600 women, and that the results are as summarized below. Is there evidence of bias on the part of lenders regarding marital status? Use a = .05. [Go to answer]

Loan

Marital Status

Granted

Denied

Single

253

119

Married

181

47

 

 24.) A sample of 400 union labor contracts was selected and classified according to two characteristics: duration of contract and type of industry. Based on the Excel/PhStat output, is the duration of union contracts independent of type of industry? Use a = .01. [Go to answer]

chi-square output


ANSWERS

1.) Value is the response, or Y variable.
So Bathrooms is the explanatory, or X variable.

2.) We have a positive (or increasing) relationship here. As the number of Bathrooms increases, Value increases.

3.) The regression equation is Value = 135.97 + 43.93Bathrooms.
Yhat = 135.97 + 43.93(2) = 223.83, or $223,830.

4.) As the number of Bathrooms increases by one, Value increases by 43.93 units (actually $43,930).

5.) R2 = 48.9%

6.) s = 27.7 ($27,700)

7.)
H0: b1 = 0
H1: b1 not= 0
t = 8.30
p-value = approximately zero < .05
Reject H0. Yes, Bathrooms is a significant predictor of Value.

8.) b1 +/- t.025(n-2) sb1
43.93 +/- 1.9935(5.2924)
43.93 +/- 10.55
(33.38, 54.48)
This is also provided in the output.

9.) This is a confidence interval. From the output, we have
(165.783, 194.025), or ($165783, $194025).

10.) This is a prediction interval. From the output, we have
(192.086, 343.446), or ($192086, $343446).

11.) The plot does not show any curves or patterns, so the model fits the data adequately.

12.) Value = 154.067 + 6.591Bedrooms + 37.484Bathrooms - 0.914Age

13.) As Age increases by one year, Value decreases by 0.914 units ($914), as long as we hold Bedrooms and Bathrooms constant.

14.) Yhat = 154.067 + 6.591(3) + 37.484(1.5) - 0.914(20) = 211.786 ($211,786)
To determine if this is extrapolation or not, we need to look at the range of values for the three X variables. One way is to get the descriptive statistics for Age, Bedrooms and Bathrooms, in order to see what the minimum and maximum values are for each.

descriptive statistics output

As you can see, Bedrooms range from 2 to 6, so a value of 3 is perfectly legal to plug in. In other words, it's not extrapolation. Bathrooms range from 1 to 3.5, so 1.5 is allowed. Age ranges from 14 to 50, so plugging in 20 is okay. None of the three values are outside the valid ranges, therefore this prediction was not extrapolation. Note that it only takes one of the X values outside it's respective range to make the whole prediction extrapolation. All of the X values must be within the range of data for that variable for the prediction to be valid.

15.) R2 = 51.86%

16.) s = 27.275 ($27,275)

17.) F = 25.14
p-value = approximately zero < .05
Reject H0: b1 = b2 = b3 = 0.
There is a significant relationship between Value and the three X variables.

18.)

Bedrooms:
H0: b1 = 0
H1: b1 not= 0
t = 1.52
p-value = .1337 > .05
Do not reject H0. No, Bedrooms is not a significant predictor of Value.

Bathrooms:
H0: b2 = 0
H1: b2 not= 0
t = 6.10
p-value = approx. zero < .05
Reject H0. Yes, Bathrooms is a significant predictor of Value.

Age:
H0: b3 = 0
H1: b3 not= 0
t = -1.54
p-value = .1276 > .05
Do not reject H0. No, Age is not a significant predictor of Value.

19.) The output is for 95%!! We'll have to do it by hand:
b2 +/- t.05(n-4) sb2
37.48 +/- 1.6669(6.1418)
37.48 +/- 10.238
(27.242, 47.718)

20.) None of the four residual plots have any pattern, so the fit of the model seems to be adequate.

21.) Here is the table with totals and expected frequencies added:

Commercial

Don't remember

Remember seeing

Remember key point

Version A

19 (22.93)

24 (27.73)

37 (29.33)

80

Version B

24 (20.07)

28 (24.27)

18 (25.67)

70

43

52

55

150

Test statistic:
c2 = (19 - 22.93)2/22.93 + (24 - 20.07)2/20.07 + (24 - 27.73)2/27.73 + (28 - 24.27)2/24.27 + (37 - 29.33)2/29.33 + (18 - 25.67)2/25.67 = 6.816,
which has df = (2-1)(3-1) = 2.
p-value = P(c2 > 6.816) = .03311 < .05.
Reject H0. The two commercials are not remembered the same.

22.)
H0: Decision & Country are independent
H1: Decision depends on Country
c2 = 18.9223
p-value = .00081 < .10
Reject H0. Trade violation results seems to depend on which country it is.

23.) Here is the table with totals and expected frequencies added:

Loan

Marital Status

Granted

Denied

Single

253 (269.08)

119 (102.92)

372

Married

181 (164.92)

47 (63.08)

228

434

166

600

Test statistic:
c2 = (253 - 269.08)2/269.08 + (181 - 164.92)2/164.92 + (119 - 102.92)2/102.92 + (47 - 63.08)2/63.08 = 9.140,
which has df = (2-1)(2-1) = 1.
p-value = P(c2 > 9.140) = .00250 < .05.
Reject H0. There is evidence of a bias.

24.)
H0: p1 = p2 = p3
H1: At least one pair pi not= pj
c2 = 4.1566
p-value = .125144 > .10
Do not reject H0. The duration of contracts seems to be independent of industry type.