next up previous contents index
Next: Observational Studies Up: Regression : Second Pass Previous: Regression Experimental Designs: A

Regression Experimental Designs

Consider a response which is related to an independent variable. For the last example, suppose we measure the LDL level of a quail given a specific dose level of the drug. Now we vary the dose level for different quail. This would be an example of a regression experimental design. Instead of introducing a lot of notation, here's a simple definition of such a design. Read through it and then read the examples that follow.

Controlled Regression Design . We want to investigate a response over several different levels of an independent variable. Randomly select n experimental units  and randomly assign a preassigned number to each level of the independent variable. Keep all other variables which could influence the response at a predetermined fixed level. At the end of the experiment time period measure the responses.

Suds Example.  Here is another simple example (From, Draper and Smith (1966), Applied Regression Analysis, New York: Wiley): For a manufacturer of dishwasher detergent, the height of soap suds in the dishpan is important, even though it is a psychological factor. The suds height should depend on the amount of detergent used. So 7 pans of water were prepared. To each (by random assignment) an amount of dishwasher detergent was added. Then the dishpan was agitated for a set amount of time and the height of the suds was measured. Some of the variables controlled here were: temperature of water, time of agitation, type of dishpan, and measurement of the height conducted in the same way. The data are:

Grams of Product (X):     4    4.5   5.0   5.5    6.0   6.5   7.0
Height of Suds mm(Y):    33     42    45    51     53    61    62
The plot of interest is a scatter plot of Height versus Grams:
          -                                             *        *
        60+
          -
  Height  -
          -                                     *
          -                             *
        50+
          -
          -                    *
          -
          -            *
        40+
          -
          -
          -    *
          -
        30+
            ------+---------+---------+---------+---------+---------+ Grams
               4.20      4.80      5.40      6.00      6.60      7.20
There is an increasing relationship between height of suds and grams of detergent. It looks fairly linear except it seems to taper off for the high suds levels. Using the regression module, we fit the linear model:
Height of Suds = a + b*(Grams of detergent) + error
We used the Wilcoxon option. The prediction equation is
Predict Height = -3.33 + 9.67*(Grams of detergent)
The estimate of slope is 9.67, that is we estimate the height of suds to increase 9.67 mm for each additional gram of detergent. We could also use the equation to predict the height of the suds level for values of grams of detergent. For instance, for 6 gm of detergent we predict the suds level to be
Predicted height = -3.33 + 9.67*6 = 54.69
Inference. The only inference we will consider is a confidence interval for the slope parameter. The estimation of slope is just that, an estimate. We need to estimate how much it missed the true slope by. We will also use this confidence interval to test the hypotheses: Our decision rule is simple, we reject H0 in favor of HA if 0 is not in the confidence interval for b.

We will use a Central Limit Theorem confidence interval for b. Besides the estimation class code prints out the standard errors of the estimates. These are in the table which follows the regression equation. The first numerical column gives the estimate and the second column gives the estimated standard deviation of the estimate (i.e., the standard error). Our confidence interval is then of the form:

Suds Example, continued. From the class code, the estimated slope was 9.67 with Stdev = 1.21. Hence the confidence interval is: Hence we estimate the height of the suds to increase 7 to 12 mm in height for every gram of additional detergent. The confidence interval does not include 0 so we reject H0 in favor of HA and we conclude that there is a positive linear relationship between the height of suds and the amount of detergent.

Concrete Example . (From Vardeman (1994), Statistics for Engineering Problem Solving, Boston: PWS.) A study was performed to investigate the relationship between the strength (psi) of concrete and water/cement ratio. Three settings of water to cement were chosen (.45, .50, .55). For each setting 3 batches of concrete were made. Each batch was measured for strength 14 days later. All other variables were kept constant (mix time, quantity of batch, same mixer used (which was cleaned after every use), etc.). Here's the data:

Water/cement  0.45   0.45   0.45   0.50   0.50   0.50   0.55   0.55   0.55
Strength      2954   2913   2923   2743   2779   2739   2652   2607   2583
Here's a scatter plot:
      3000+
          -
 Strength -    *
          -    2
          -
      2850+
          -
          -                             *
          -
          -                             2
      2700+
          -
          -                                                      *
          -                                                      *
          -                                                      *
      2550+
          -
            --------+---------+---------+---------+---------+-------- water/cement
                0.460     0.480     0.500     0.520     0.540
The plot indicates a decreasing relationship between strength of concrete and water to cement ratio,; i.e., the more water one uses, the weaker the cement. Clicking on regression module, and using the Wilcoxon estimate, we obtain the prediction equation

What does the estimate of the slope mean?

Keeping the range of x in mind (.1), it is best to phrase this as for each additional tenth of water to cement, we estimate the strength of the concrete to drop by 316 psi. From the class code, we form a confidence interval for slope by:

Since 0 is not in the confidence interval we reject H0. One way of concluding would be: for each additional tenth of water to cement, we estimate the strength of the concrete to drop from 262 to 370 psi.

There is a lot more to experimental designs than we have covered in this chapter. The effects of more than one variable at a time changing on the response can be analyzed. These variables are set at certain values (the design of the experiment) and other variables are controlled. If they cannot be controlled then they are recorded. These will be used as covariates to adjust the analysis. These items are beyond this course. In fact there are several courses you can take at Western on experimental design.

There are many situations, though, where we can not design an experiment, (set the levels of the independent variables). These are basically observational studies which we discuss in the next section.


Exercise 12.3.1  
1.
(From Bhattacharyya and Johnson (1977), Statistical Concepts and Methods, New York: Wiley). A study was performed to investigate the relationship between speed and stopping distance for an automobile. 10 cars were selected (same year, model, etc.). Each was driven at preassigned speed and when the driver attained that speed the he applied the brakes. The distance to a complete stop was then measured. The data are:
  Speed (X)   :  20   20   30   30   30   40    40    50    50    60
  Distance (Y): 16.3 26.7 39.2 63.5 51.3 98.4  65.7  104.1 155.6  217.2
(a)
Assuming this was a designed experiment what other variables besides car were controlled?
(b)
Scatter plot this data (Y versus X). Comment on the plot. Does it look linear?
(c)
Regardless of your discussion in the last part, use the regression module to fit the model. Predict the stopping distance for an initial speed of 35. Predict the stopping distance for an initial speed of 55.
(d)
Use your predictions in the last part to plot your fit on the scatter plot. Comment? Interpret the estimate of slope.
(e)
Obtain a confidence interval for the slope parameter. What does it mean in terms of the problem? Use it to test H0 . Conclude in terms of the problem.
(f)
Determine the fit and the residual for the response 98.4 at x = 40.
(g)
Next obtain the residual plot. Does the observation (40, 98.4) seem to be an outlier? Is the scatter random? See the next problem for the answer.
2.
Here is the residual plot for the last problem:
   
          -
        25+                                                       *
          -   *
  Ehat    -
          -   *            *                         *
          -
         0+                *            *
          -
          -
          -                *
          -
       -25+
          -
          -                             *
          -                                          *
          -
       -50+
            +---------+---------+---------+---------+---------+------ Yhat
            0        35        70       105       140       175
It is not a random scatter. Sometimes a simple transformation will help. Consider the square root of the stopping distances. These are given by:
    Speed (X)   :  20    20   30    30    30    40     40    50    50    60
    SqrtDistance  4.03  5.16  6.26  7.96  7.16  9.91  8.10 10.20 12.47 14.73

Repeat the last problem using these responses. Notice interpretation changes. As you will see, the residual plot improves considerably but there are still problems with it.

3.
(From Vardeman (1994), Statistics for Engineering Problem Solving, Boston: PWS.) A study was performed to investigate the relationship between the carburetor jetting size and the time of a Camaro for a quarter-mile run. The data are:
     Jet Size     76      68      70      72      74     76
     Time        15.08   14.60   14.50   14.53   14.79   15.02
(a)
Assuming this was a designed experiment what other variables besides car model were controlled?
(b)
Scatter plot this data. Comment on the plot. Does it look linear?
(c)
Regardless of your discussion in the last part, use the regression module to fit the model. Predict the time for a jet size of 76. Predict the time for a jet size of 68.
(d)
Use your predictions in the last part to plot your fit on the scatter plot. Comment? Interpret the estimate of slope.
(e)
Obtain a confidence interval for the slope parameter. What does it mean in terms of the problem? Use it to test H0. Conclude in terms of the problem.


next up previous contents index
Next: Observational Studies Up: Regression : Second Pass Previous: Regression Experimental Designs: A

2001-01-01