Lets begin with an example of a completely randomized design and set it
up as a regression study. Then the generalization is easy.
Recall the cholesterol study on quail discussed in the last chapter: The Experiment: 30 quail were randomly selected (these are the experimental units) from a reference population. 20 were randomly assign to Treatment 1 (a placebo) and the other 10 to Treatment 2. For those on Treatment 2 a active drug compound was mixed with their diet. Those on Treatment 1 had the same diet without the drug compound. Over the course of the experiment, the quail were treated the same. Same amount of exercise, same types of pens, etc. At the end of the time period their LDL cholesterol levels were measured. The data are:
Placebo: 64 49 54 64 97 66 76 44 71 89 70 72 71 55 60 62 46 77 86 71 Treatment2: 40 31 50 48 152 44 74 38 81 64This doesn't look like a regression problem but it is. Set the independent variable to x=0 if the response (LDL) level is from a quail in the placebo group, and set the independent variable to x=1 if the response (LDL) level is from a quail in the active drug group. Thus we have 20 x's set at 0 and 10 x's set at 1. Our scatter plot would be 64 versus 0, ... , 71 versus 0, 40 versus 1, ..., 64 versus 1. Hence, the plot is
- * LDL - - - 120+ - - - * - 2 80+ 2 * - 5 * - 5 * - 2 - 3 3 40+ 2 - * - +---------+---------+---------+---------+---------+------ X 0 1The numbers stand for how many points are at that location; i.e., the 5 means that there are 5 points at that location. The * means that there is one point at that location. So count them to see that indeed there are 20 points over x=0 and 10 points over x=1. Note the huge outlier in the treated group that we talked about in the last chapter.
Now eyeball a line through the points, ignoring the outlier (a robust
eyeball fit). Here's what I did: I chose the line that goes through (0,77) (that's between the 2 and the top 5 over x=0) and the point (64,1) (that's the * above the 3 over x=1). NOW TRACE THIS LINE IN!!!!!!!!!!!!!
What's the slope of my eyeball fit? That's easy. The change in Y over the change in x is: (77-64)/(0-1) = -13. Now more importantly, what does this slope mean? If you think about it, it is an estimate of the change in centers of the two groups. That is, it is an estimate of the effect between the two treatment groups. Recall from the last chapter that the Wilcoxon estimate of the effect was -14. Hence all completely randomized designs can be put into a regression context. This is true of paired designs too but we will not go into it, (you can always take additional statistics courses).