Build Factorial Designs for ANOVA

Provides helper functions to build factorial design for easily computing ANOVA using the Anova() function. This might be very useful for repeated measures ANOVA, which is hard to set up with the car package.

factorial_design(data, dv, wid, between, within, covariate)

Arguments

data: a data frame containing the variables
dv: (numeric) dependent variable name.
wid: (factor) column name containing individuals/subjects identifier. Should be unique per individual.
between: (optional) between-subject factor variables.
within: (optional) within-subjects factor variables
covariate: (optional) covariate names (for ANCOVA)

Value

a list with the following components:

the specified arguments: dv, wid, between, within
data: the original data (long format) or independent ANOVA. The wide format is returned for repeated measures ANOVA.
idata: an optional data frame giving the levels of factors defining the intra-subject model for multivariate repeated-measures data.
idesign: a one-sided model formula using the “data” in idata and specifying the intra-subject design.
repeated: logical. Value is TRUE when the data is a repeated design.
lm_formula: the formula used to build the lm model.
lm_data: the data used to build the lm model. Can be either in a long format (i.e., the original data for independent measures ANOVA) or in a wide format (case of repeated measures ANOVA).
model: the lm model

Author

Alboukadel Kassambara, alboukadel.kassambara@gmail.com

Examples

# Load data
#:::::::::::::::::::::::::::::::::::::::
data("ToothGrowth")
df <- ToothGrowth
head(df)
#>    len supp dose
#> 1  4.2   VC  0.5
#> 2 11.5   VC  0.5
#> 3  7.3   VC  0.5
#> 4  5.8   VC  0.5
#> 5  6.4   VC  0.5
#> 6 10.0   VC  0.5

# Repeated measures designs
#:::::::::::::::::::::::::::::::::::::::::
# Prepare the data
df$id <- rep(1:10, 6) # Add individuals id
head(df)
#>    len supp dose id
#> 1  4.2   VC  0.5  1
#> 2 11.5   VC  0.5  2
#> 3  7.3   VC  0.5  3
#> 4  5.8   VC  0.5  4
#> 5  6.4   VC  0.5  5
#> 6 10.0   VC  0.5  6
# Build factorial designs
design <- factorial_design(df, dv = len, wid = id, within = c(supp, dose))
design
#> $dv
#> [1] "len"
#> 
#> $wid
#> [1] "id"
#> 
#> $within
#> [1] "supp" "dose"
#> 
#> $data
#> # A tibble: 60 × 4
#>      len supp  dose  id   
#>    <dbl> <fct> <fct> <fct>
#>  1   4.2 VC    X0.5  1    
#>  2  11.5 VC    X0.5  2    
#>  3   7.3 VC    X0.5  3    
#>  4   5.8 VC    X0.5  4    
#>  5   6.4 VC    X0.5  5    
#>  6  10   VC    X0.5  6    
#>  7  11.2 VC    X0.5  7    
#>  8  11.2 VC    X0.5  8    
#>  9   5.2 VC    X0.5  9    
#> 10   7   VC    X0.5  10   
#> # … with 50 more rows
#> 
#> $idata
#>   supp dose
#> 1   OJ X0.5
#> 2   OJ   X1
#> 3   OJ   X2
#> 4   VC X0.5
#> 5   VC   X1
#> 6   VC   X2
#> 
#> $idesign
#> ~supp * dose
#> <environment: 0x7fb8b3117708>
#> 
#> $lm_data
#> # A tibble: 10 × 7
#>    id    OJ_X0.5 OJ_X1 OJ_X2 VC_X0.5 VC_X1 VC_X2
#>    <fct>   <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>
#>  1 1        15.2  19.7  25.5     4.2  16.5  23.6
#>  2 2        21.5  23.3  26.4    11.5  16.5  18.5
#>  3 3        17.6  23.6  22.4     7.3  15.2  33.9
#>  4 4         9.7  26.4  24.5     5.8  17.3  25.5
#>  5 5        14.5  20    24.8     6.4  22.5  26.4
#>  6 6        10    25.2  30.9    10    17.3  32.5
#>  7 7         8.2  25.8  26.4    11.2  13.6  26.7
#>  8 8         9.4  21.2  27.3    11.2  14.5  21.5
#>  9 9        16.5  14.5  29.4     5.2  18.8  23.3
#> 10 10        9.7  27.3  23       7    15.5  29.5
#> 
#> $repeated
#> [1] TRUE
#> 
#> $lm_formula
#> cbind(OJ_X0.5, OJ_X1, OJ_X2, VC_X0.5, VC_X1, VC_X2) ~ 1
#> <environment: 0x7fb8a693d6d0>
#> 
#> $model
#> 
#> Call:
#> stats::lm(formula = lm_formula, data = data)
#> 
#> Coefficients:
#>              OJ_X0.5  OJ_X1  OJ_X2  VC_X0.5  VC_X1  VC_X2
#> (Intercept)  13.23    22.70  26.06   7.98    16.77  26.14
#> 
#> 
# Easily perform repeated measures ANOVA using the car package
res.anova <- Anova(design$model, idata = design$idata, idesign = design$idesign, type = 3)
summary(res.anova, multivariate = FALSE)
#> Warning: HF eps > 1 treated as 1
#> 
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#> 
#>              Sum Sq num Df Error SS den Df   F value    Pr(>F)    
#> (Intercept) 21236.5      1    69.33      9 2756.9514 1.656e-12 ***
#> supp          205.3      1    53.01      9   34.8664 0.0002277 ***
#> dose         2426.4      2   205.11     18  106.4698 1.062e-10 ***
#> supp:dose     108.3      2   384.66     18    2.5343 0.1072129    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> 
#> Mauchly Tests for Sphericity
#> 
#>           Test statistic p-value
#> dose             0.80739 0.42495
#> supp:dose        0.93390 0.76068
#> 
#> 
#> Greenhouse-Geisser and Huynh-Feldt Corrections
#>  for Departure from Sphericity
#> 
#>           GG eps Pr(>F[GG])    
#> dose      0.8385   2.79e-09 ***
#> supp:dose 0.9380     0.1115    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#>             HF eps   Pr(>F[HF])
#> dose      1.008462 1.061617e-10
#> supp:dose 1.176302 1.072129e-01

# Independent measures designs
#:::::::::::::::::::::::::::::::::::::::::
# Build factorial designs
df$id <- 1:nrow(df)
design <- factorial_design(df, dv = len, wid = id, between = c(supp, dose))
design
#> $dv
#> [1] "len"
#> 
#> $between
#> [1] "supp" "dose"
#> 
#> $wid
#> [1] "id"
#> 
#> $data
#> # A tibble: 60 × 4
#>      len supp  dose  id   
#>    <dbl> <fct> <fct> <fct>
#>  1   4.2 VC    0.5   1    
#>  2  11.5 VC    0.5   2    
#>  3   7.3 VC    0.5   3    
#>  4   5.8 VC    0.5   4    
#>  5   6.4 VC    0.5   5    
#>  6  10   VC    0.5   6    
#>  7  11.2 VC    0.5   7    
#>  8  11.2 VC    0.5   8    
#>  9   5.2 VC    0.5   9    
#> 10   7   VC    0.5   10   
#> # … with 50 more rows
#> 
#> $lm_data
#> # A tibble: 60 × 4
#>      len supp  dose  id   
#>    <dbl> <fct> <fct> <fct>
#>  1   4.2 VC    0.5   1    
#>  2  11.5 VC    0.5   2    
#>  3   7.3 VC    0.5   3    
#>  4   5.8 VC    0.5   4    
#>  5   6.4 VC    0.5   5    
#>  6  10   VC    0.5   6    
#>  7  11.2 VC    0.5   7    
#>  8  11.2 VC    0.5   8    
#>  9   5.2 VC    0.5   9    
#> 10   7   VC    0.5   10   
#> # … with 50 more rows
#> 
#> $repeated
#> [1] FALSE
#> 
#> $lm_formula
#> len ~ supp * dose
#> <environment: 0x7fb8b65ceb00>
#> 
#> $model
#> 
#> Call:
#> stats::lm(formula = lm_formula, data = data)
#> 
#> Coefficients:
#> (Intercept)        supp1        dose1        dose2  supp1:dose1  supp1:dose2  
#>     18.8133       1.8500      -8.2083       0.9217       0.7750       1.1150  
#> 
#> 
# Perform ANOVA
Anova(design$model, type = 3)
#> Anova Table (Type III tests)
#> 
#> Response: len
#>              Sum Sq Df  F value    Pr(>F)    
#> (Intercept) 21236.5  1 1610.393 < 2.2e-16 ***
#> supp          205.4  1   15.572 0.0002312 ***
#> dose         2426.4  2   92.000 < 2.2e-16 ***
#> supp:dose     108.3  2    4.107 0.0218603 *  
#> Residuals     712.1 54                       
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Arguments

Value

See also

Author

Examples