# One-Way ANOVA, #1

## using R

> rust <- data.frame(y=scan(), brand=rep(LETTERS[1:4],c(10,10,10,10)),

+ replication=rep(1:10,4))

1: 43.9 39.0 46.7 43.8 44.2 47.7 43.6 38.9 43.6 40.0

11: 89.8 87.1 92.7 90.6 87.7 92.4 86.1 88.1 90.8 89.1

21: 68.4 69.3 68.5 66.4 70.0 68.1 70.6 65.2 63.8 69.2

31: 36.2 45.2 40.7 40.5 39.3 40.3 43.2 38.7 40.9 39.7

41:

```Read 40 items
```

> # Note that brand is naturally of factor mode (default behavior:

> # R converts character columns in a data frame to factor).

> # Variable with factor mode can be used in ANOVA.

> rust[sample(40,5),] # randomly shows 5 obs.

```      y brand replication
15 87.7     B           5
36 40.3     D           6
9  43.6     A           9
28 65.2     C           8
39 40.9     D           9
```

>

> windows(width=6,height=5,pointsize=12)

> with(data=rust,

+ stripchart(y~brand, method="stack", vertical=TRUE, pch=1,

+ cex=1.5, xlab="brand", ylab="y",

+ main="Dotplots by Treatments\nrust inhibitor example")

+ )

> dev.off()

```null device
1
```

>

>

> # To get test result for equal means (variances not equal)

> oneway.test(y~brand, data=rust)

```        One-way analysis of means (not assuming equal variances)

data:  y and brand
F = 863.3314, num df = 3.000, denom df = 19.884, p-value < 2.2e-16
```

> # For test under equal variance assumption

> oneway.test(y~brand, data=rust, var.equal=TRUE)

```        One-way analysis of means

data:  y and brand
F = 866.1183, num df = 3, denom df = 36, p-value < 2.2e-16
```

> # The results are quite consistent with/without equal variance

> # assumption.

>

> anova( rust.lm <- lm(y~brand, data=rust) )

```Analysis of Variance Table

Response: y
Df  Sum Sq Mean Sq F value    Pr(>F)
brand      3 15953.5  5317.8  866.12 < 2.2e-16 ***
Residuals 36   221.0     6.1
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

>

> # Note that, if brand is not a factor mode object (i.e.,

> # if it is numeric), then use lm(y~factor(brand), data=rust).

>

> summary(rust.lm)

```Call:
lm(formula = y ~ brand, data = rust)

Residuals:
Min     1Q Median     3Q    Max
-4.270 -1.597  0.395  1.275  4.730

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  43.1400     0.7836  55.056   <2e-16 ***
brandB       46.3000     1.1081  41.782   <2e-16 ***
brandC       24.8100     1.1081  22.389   <2e-16 ***
brandD       -2.6700     1.1081  -2.409   0.0212 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.478 on 36 degrees of freedom
Multiple R-Squared: 0.9863,     Adjusted R-squared: 0.9852
F-statistic: 866.1 on 3 and 36 DF,  p-value: < 2.2e-16
```

>

> # Note that, the first level (here A) is the 'reference'

> # level. That is, indicator variables are created toward

> # all other levels. Hence, the intercept estimate above gives the

> # mean of A (i.e., 43.1400). The brandB estimate gives the

> # difference (B - A). Other terms are like-wise.

>

>

>

> windows(width=6,height=6,pointsize=10)

> par(mfrow=c(2,2))

> plot(rust.lm)

> dev.off()

```null device
1
```

> # The plot above generate default 4-pack residual plots.

> # For more detail, see help(plot.lm)

>

> # One can also produce custom residual plots. Below, 3

> # plots will be generated. Can use par(mfrow=c(1,3))

> # to generate 1 by 3 multiple figures. Here I use

> # different approach to show the use of layout

> windows(width=7,height=4,pointsize=10)

>

> # The command below create multiple figures of 3 (plus 2

> # used for mimicking title and footnote) with figure 4

> # occupying entire top row, figure 5 entire bottom row,

> # and the actual figures 1, 2, and 3 occupying the

> # middle row. The respective widths and heights are

> # as specified.

> layout(matrix(c(4,4,4,1,2,3,5,5,5),3,3,byrow=T),

+ width=c(.36,.32,.32),height=c(.2,.6,.2))

>

> # start 1st plot

> par(mar=c(4,4,4,0)) # 4 lines on bottom, left, top; 0 on right

> plot(residuals(rust.lm)~fitted(rust.lm),xlab="fitted values",

+ ylab="residuals",main="Residuals vs Fits")

> abline(h=0,lty=3,col="red") # horizontal reference line

>

> # start 2nd plot

> par(mar=c(4,0,4,0), yaxt="n") # margins as set, and no y-axis

> stripchart(residuals(rust.lm)~rust\$brand,method="stack",

+ vertical=TRUE,jitter=0,xlab="brand",ylab="",pch=1,cex=1.5,

+ main="Residuals by Treatments")

> abline(h=0,lty=3,col="red")

>

> # start 3rd plot

> qqnorm(resid(rust.lm),ylab="",yaxt="n",

+ main="Residual Q-Q Plot")

> qqline(resid(rust.lm),lty=3,col="red")

>

> # First set zero top and bottom margins, and restore

> # user coordinates to default.

> par(mar=c(0,4,0,0), usr=c(0,1,0,1))

> # Now start 'figure 4'.

> frame() # move to new plot region, i.e., figure 4.

> text(.5,.5,label="Residual Plots",cex=2) # 'title'

>

> frame() # move to new plot region, i.e., figure 5.

> text(0,.5,label="rust inhibitor example",pos=4,cex=1) # 'footnote'

> dev.off()

```null device
1
```

>

>

>

>

> ## Bartlett Test

> bartlett.test(y~brand, data=rust)

```        Bartlett test of homogeneity of variances

data:  y by brand
Bartlett's K-squared = 1.199, df = 3, p-value = 0.7533
```

> # Require that package car (i.e., cLASSIFICATION aND rEGRESSION)

> # to reside in your R library. Use library()

> # to see if it's there.

> car::leveneTest(y~brand, data=rust)

```Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group  3  0.2262 0.8775
36
```

> # An approximate procedure exists by:

> oneway.test(abs(residuals(rust.lm)) ~ rust\$brand)

```        One-way analysis of means (not assuming equal variances)

data:  abs(residuals(rust.lm)) and rust\$brand
F = 0.3079, num df = 3.000, denom df = 19.663, p-value = 0.8194
```

>

> # It appears that the variance is constant.

>

>

>

>

> ## Tukey Honest Significant Differences

> rust.Tukey <- TukeyHSD(aov(y~brand, data=rust),conf.level=.90)

> rust.Tukey

```  Tukey multiple comparisons of means
90% family-wise confidence level

Fit: aov(formula = y ~ brand, data = rust)

\$brand
B-A  46.30  43.6666  48.93339972 0.0000000
C-A  24.81  22.1766  27.44339972 0.0000000
D-A  -2.67  -5.3034  -0.03660028 0.0933303
C-B -21.49 -24.1234 -18.85660028 0.0000000
D-B -48.97 -51.6034 -46.33660028 0.0000000
D-C -27.48 -30.1134 -24.84660028 0.0000000
```

>

> # The comparisons can be graphed.

> windows(width=5,height=3,pointsize=10)

> par(las=1) # request that axis labels be horizontal

> plot(rust.Tukey, sub="Tukey Honest Significant Differences")

> dev.off()

```null device
1
```

> ## Fisher LSD, P values only

> with(data=rust,

+ )

```        Pairwise comparisons using t tests with pooled SD

data:  y and brand

A      B      C
B <2e-16 -      -
C <2e-16 <2e-16 -
D 0.021  <2e-16 <2e-16

```

> ## Fisher LSD, P values only, no pooling SD

> with(data=rust,

+ )

```        Pairwise comparisons using t tests with non-pooled SD

data:  y and brand

A       B       C
B < 2e-16 -       -
C 2.4e-13 2.0e-14 -
D 0.043   < 2e-16 9.0e-16

```

> ## Bonferroni, P values only

> with(data=rust,

+ )

```        Pairwise comparisons using t tests with pooled SD

data:  y and brand

A      B      C
B <2e-16 -      -
C <2e-16 <2e-16 -
D 0.13   <2e-16 <2e-16

```

> ## Holm's, the default

> with(data=rust,

+ )

```        Pairwise comparisons using t tests with pooled SD

data:  y and brand

A      B      C
B <2e-16 -      -
C <2e-16 <2e-16 -
D 0.021  <2e-16 <2e-16

```

>

>

>

>

> ## Constructing contrasts. Note: R comes with some commonly

> # used contrast methods.

> rust\$Named.vs.Generic <- c(A=-.5,B=.5,C=.5,D=-.5)[rust\$brand]

> rust\$A.vs.D <- c(A=1,B=0,C=0,D=-1)[rust\$brand]

> rust\$B.vs.C <- c(A=0,B=1,C=-1,D=0)[rust\$brand]

>

> ## Check orthogonality

> crossprod(data.matrix(rust[,4:6]))

```                 Named.vs.Generic A.vs.D B.vs.C
Named.vs.Generic               10      0      0
A.vs.D                          0     20      0
B.vs.C                          0      0     20
```

>

> # Note that the contrasts are orthogonal. The 2nd and 3rd

> # contrasts have squared length of 2 × n, where

> # n is the common sample size, 10. Hence, in the output

> # below, the coefficients to these two contrasts give half

> # of their respective estimates. (Hint: think about the

> # coefficient as rise per unit change while these contrasts

> # measure twice the rise (from -1 to 1: 2 units change).

>

> rust.oc <- lm(y~Named.vs.Generic+A.vs.D+B.vs.C, data=rust)

> anova(rust.oc)

```Analysis of Variance Table

Response: y
Df  Sum Sq Mean Sq   F value  Pr(>F)
Named.vs.Generic  1 13608.7 13608.7 2216.4642 < 2e-16 ***
A.vs.D            1    35.6    35.6    5.8055 0.02122 *
B.vs.C            1  2309.1  2309.1  376.0852 < 2e-16 ***
Residuals        36   221.0     6.1
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

>

> # Sums of squares above for contrasts are Components.

>

> summary(rust.oc)

```Call:
lm(formula = y ~ Named.vs.Generic + A.vs.D + B.vs.C, data = rust)

Residuals:
Min     1Q Median     3Q    Max
-4.270 -1.597  0.395  1.275  4.730

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)       60.2500     0.3918 153.783   <2e-16 ***
Named.vs.Generic  36.8900     0.7836  47.079   <2e-16 ***
A.vs.D             1.3350     0.5541   2.409   0.0212 *
B.vs.C            10.7450     0.5541  19.393   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.478 on 36 degrees of freedom
Multiple R-Squared: 0.9863,     Adjusted R-squared: 0.9852
F-statistic: 866.1 on 3 and 36 DF,  p-value: < 2.2e-16
```

>

> # Hence the estimates of the contrasts are

> # 36.8900, 2.6700 (= 2 × 1.3350), and

> # 21.4900 (= 2 × 10.7450). See the reasoning above.

>

> ## Can also manually compute estimates for contrasts.

> sum(rust\$y*rust[[4]])/10

```[1] 36.89
```

> sum(rust\$y*rust[[5]])/10

```[1] 2.67
```

> sum(rust\$y*rust[[6]])/10

```[1] 21.49
```