Performs one or multiple mean comparisons.

compare_means(
  formula,
  data,
  method = "wilcox.test",
  paired = FALSE,
  group.by = NULL,
  ref.group = NULL,
  symnum.args = list(),
  p.adjust.method = "holm",
  ...
)

Arguments

formula

a formula of the form x ~ group where x is a numeric variable giving the data values and group is a factor with one or multiple levels giving the corresponding groups. For example, formula = TP53 ~ cancer_group.

It's also possible to perform the test for multiple response variables at the same time. For example, formula = c(TP53, PTEN) ~ cancer_group.

data

a data.frame containing the variables in the formula.

method

the type of test. Default is wilcox.test. Allowed values include:

  • t.test (parametric) and wilcox.test (non-parametric). Perform comparison between two groups of samples. If the grouping variable contains more than two levels, then a pairwise comparison is performed.

  • anova (parametric) and kruskal.test (non-parametric). Perform one-way ANOVA test comparing multiple groups.

paired

a logical indicating whether you want a paired test. Used only in t.test and in wilcox.test.

group.by

a character vector containing the name of grouping variables.

ref.group

a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group).

ref.group can be also ".all.". In this case, each of the grouping variable levels is compared to all (i.e. basemean).

symnum.args

a list of arguments to pass to the function symnum for symbolic number coding of p-values. For example, symnum.args <- list(cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, Inf), symbols = c("****", "***", "**", "*", "ns")).

In other words, we use the following convention for symbols indicating statistical significance:

  • ns: p > 0.05

  • *: p <= 0.05

  • **: p <= 0.01

  • ***: p <= 0.001

  • ****: p <= 0.0001

p.adjust.method

method for adjusting p values (see p.adjust). Has impact only in a situation, where multiple pairwise tests are performed; or when there are multiple grouping variables. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none".

Note that, when the formula contains multiple variables, the p-value adjustment is done independently for each variable.

...

Other arguments to be passed to the test function.

Value

return a data frame with the following columns:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests. Available only when method = "t.test" or method = "wilcox.test".

  • p: the p-value.

  • p.adj: the adjusted p-value. Default for p.adjust.method = "holm".

  • p.format: the formatted p-value.

  • p.signif: the significance level.

  • method: the statistical test used to compare groups.

Examples

# Load data
#:::::::::::::::::::::::::::::::::::::::
data("ToothGrowth")
df <- ToothGrowth

# One-sample test
#:::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ 1, df, mu = 0)
#> # A tibble: 1 × 8
#>   .y.   group1 group2            p   p.adj p.format p.signif method  
#>   <chr>  <dbl> <chr>         <dbl>   <dbl> <chr>    <chr>    <chr>   
#> 1 len        1 null model 1.66e-11 1.7e-11 1.7e-11  ****     Wilcoxon

# Two-samples unpaired test
#:::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ supp, df)
#> # A tibble: 1 × 8
#>   .y.   group1 group2      p p.adj p.format p.signif method  
#>   <chr> <chr>  <chr>   <dbl> <dbl> <chr>    <chr>    <chr>   
#> 1 len   OJ     VC     0.0645 0.064 0.064    ns       Wilcoxon

# Two-samples paired test
#:::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ supp, df, paired = TRUE)
#> # A tibble: 1 × 8
#>   .y.   group1 group2       p  p.adj p.format p.signif method  
#>   <chr> <chr>  <chr>    <dbl>  <dbl> <chr>    <chr>    <chr>   
#> 1 len   OJ     VC     0.00431 0.0043 0.0043   **       Wilcoxon

# Compare supp levels after grouping the data by "dose"
#::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ supp, df, group.by = "dose")
#> # A tibble: 3 × 9
#>    dose .y.   group1 group2       p p.adj p.format p.signif method  
#>   <dbl> <chr> <chr>  <chr>    <dbl> <dbl> <chr>    <chr>    <chr>   
#> 1   0.5 len   OJ     VC     0.0232  0.046 0.023    *        Wilcoxon
#> 2   1   len   OJ     VC     0.00403 0.012 0.004    **       Wilcoxon
#> 3   2   len   OJ     VC     1       1     1.000    ns       Wilcoxon

# pairwise comparisons
#::::::::::::::::::::::::::::::::::::::::
# As dose contains more thant two levels ==>
# pairwise test is automatically performed.
compare_means(len ~ dose, df)
#> # A tibble: 3 × 8
#>   .y.   group1 group2            p      p.adj p.format p.signif method  
#>   <chr> <chr>  <chr>         <dbl>      <dbl> <chr>    <chr>    <chr>   
#> 1 len   0.5    1      0.00000702   0.000014   7.0e-06  ****     Wilcoxon
#> 2 len   0.5    2      0.0000000841 0.00000025 8.4e-08  ****     Wilcoxon
#> 3 len   1      2      0.000177     0.00018    0.00018  ***      Wilcoxon

# Comparison against reference group
#::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ dose, df, ref.group = "0.5")
#> # A tibble: 2 × 8
#>   .y.   group1 group2            p      p.adj p.format p.signif method  
#>   <chr> <chr>  <chr>         <dbl>      <dbl> <chr>    <chr>    <chr>   
#> 1 len   0.5    1      0.00000702   0.000007   7.0e-06  ****     Wilcoxon
#> 2 len   0.5    2      0.0000000841 0.00000017 8.4e-08  ****     Wilcoxon

# Comparison against all
#::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ dose, df, ref.group = ".all.")
#> # A tibble: 3 × 8
#>   .y.   group1 group2         p   p.adj p.format p.signif method  
#>   <chr> <chr>  <chr>      <dbl>   <dbl> <chr>    <chr>    <chr>   
#> 1 len   .all.  0.5    0.0000508 0.00015 5.1e-05  ****     Wilcoxon
#> 2 len   .all.  1      0.764     0.76    0.76404  ns       Wilcoxon
#> 3 len   .all.  2      0.000179  0.00036 0.00018  ***      Wilcoxon

# Anova and kruskal.test
#::::::::::::::::::::::::::::::::::::::::
compare_means(len ~ dose, df, method = "anova")
#> # A tibble: 1 × 6
#>   .y.          p   p.adj p.format p.signif method
#>   <chr>    <dbl>   <dbl> <chr>    <chr>    <chr> 
#> 1 len   9.53e-16 9.5e-16 9.5e-16  ****     Anova 
compare_means(len ~ dose, df, method = "kruskal.test")
#> # A tibble: 1 × 6
#>   .y.               p        p.adj p.format p.signif method        
#>   <chr>         <dbl>        <dbl> <chr>    <chr>    <chr>         
#> 1 len   0.00000000148 0.0000000015 1.5e-09  ****     Kruskal-Wallis