Controlled Regression Design . We want to investigate a response
over several different levels of an independent variable. Randomly select
n
experimental units and randomly assign a preassigned number to each level
of the independent variable. Keep all other variables which could influence
the response at a predetermined fixed level. At the end of the experiment
time period measure the responses.
Suds Example. Here is another simple example (From, Draper and Smith (1966), Applied Regression Analysis, New York: Wiley): For a manufacturer of dishwasher detergent, the height of soap suds in the dishpan is important, even though it is a psychological factor. The suds height should depend on the amount of detergent used. So 7 pans of water were prepared. To each (by random assignment) an amount of dishwasher detergent was added. Then the dishpan was agitated for a set amount of time and the height of the suds was measured. Some of the variables controlled here were: temperature of water, time of agitation, type of dishpan, and measurement of the height conducted in the same way. The data are:
Grams of Product (X): 4 4.5 5.0 5.5 6.0 6.5 7.0 Height of Suds mm(Y): 33 42 45 51 53 61 62The plot of interest is a scatter plot of Height versus Grams:
- * *
60+
-
Height -
- *
- *
50+
-
- *
-
- *
40+
-
-
- *
-
30+
------+---------+---------+---------+---------+---------+ Grams
4.20 4.80 5.40 6.00 6.60 7.20
There is an increasing relationship between height of suds and grams of
detergent. It looks fairly linear except it seems to taper off for the
high suds levels. Using the regression module, we fit the linear model:
Height of Suds = a + b*(Grams of detergent) + errorWe used the Wilcoxon option. The prediction equation is
Predict Height = -3.33 + 9.67*(Grams of detergent)The estimate of slope is 9.67, that is we estimate the height of suds to increase 9.67 mm for each additional gram of detergent. We could also use the equation to predict the height of the suds level for values of grams of detergent. For instance, for 6 gm of detergent we predict the suds level to be
Predicted height = -3.33 + 9.67*6 = 54.69Inference. The only inference we will consider is a confidence interval for the slope parameter. The estimation of slope is just that, an estimate. We need to estimate how much it missed the true slope by. We will also use this confidence interval to test the hypotheses:
We will use a Central Limit Theorem confidence interval for b. Besides the estimation class code prints out the standard errors of the estimates. These are in the table which follows the regression equation. The first numerical column gives the estimate and the second column gives the estimated standard deviation of the estimate (i.e., the standard error). Our confidence interval is then of the form:
Suds Example, continued. From the class code, the estimated slope was 9.67 with Stdev = 1.21. Hence the confidence interval is:Concrete Example . (From Vardeman (1994), Statistics for Engineering Problem Solving, Boston: PWS.) A study was performed to investigate the relationship between the strength (psi) of concrete and water/cement ratio. Three settings of water to cement were chosen (.45, .50, .55). For each setting 3 batches of concrete were made. Each batch was measured for strength 14 days later. All other variables were kept constant (mix time, quantity of batch, same mixer used (which was cleaned after every use), etc.). Here's the data:
Water/cement 0.45 0.45 0.45 0.50 0.50 0.50 0.55 0.55 0.55 Strength 2954 2913 2923 2743 2779 2739 2652 2607 2583Here's a scatter plot:
3000+
-
Strength - *
- 2
-
2850+
-
- *
-
- 2
2700+
-
- *
- *
- *
2550+
-
--------+---------+---------+---------+---------+-------- water/cement
0.460 0.480 0.500 0.520 0.540
The plot indicates a decreasing relationship between strength of concrete
and water to cement ratio,; i.e., the more water one uses, the weaker the
cement. Clicking on regression module, and using the Wilcoxon estimate, we obtain the prediction
equation
What does the estimate of the slope mean?
Keeping the range of x in mind (.1), it is best to phrase this as for each additional tenth of water to cement, we estimate the strength of the concrete to drop by 316 psi. From the class code, we form a confidence interval for slope by:
There is a lot more to experimental designs than we have covered in
this chapter. The effects of more than one variable at a time changing
on the response can be analyzed. These variables are set at certain values
(the design of the experiment) and other variables are controlled. If they
cannot be controlled then they are recorded. These will be used as covariates to adjust
the analysis. These items are beyond this course. In fact there are several
courses you can take at Western on experimental design.
There are many situations, though, where we can not design an experiment, (set the levels of the independent variables). These are basically observational studies which we discuss in the next section.
Speed (X) : 20 20 30 30 30 40 40 50 50 60 Distance (Y): 16.3 26.7 39.2 63.5 51.3 98.4 65.7 104.1 155.6 217.2
-
25+ *
- *
Ehat -
- * * *
-
0+ * *
-
-
- *
-
-25+
-
- *
- *
-
-50+
+---------+---------+---------+---------+---------+------ Yhat
0 35 70 105 140 175
It is not a random scatter. Sometimes a simple transformation will help. Consider the square root of the stopping distances. These are given by:
Speed (X) : 20 20 30 30 30 40 40 50 50 60
SqrtDistance 4.03 5.16 6.26 7.96 7.16 9.91 8.10 10.20 12.47 14.73
Repeat the last problem using these responses. Notice interpretation changes. As you will see, the residual plot improves considerably but there are still problems with it.
Jet Size 76 68 70 72 74 76
Time 15.08 14.60 14.50 14.53 14.79 15.02