Performs Games-Howell test, which is used to compare all possible combinations of group differences when the assumption of homogeneity of variances is violated. This post hoc test provides confidence intervals for the differences between group means and shows whether the differences are statistically significant.

The test is based on Welch’s degrees of freedom correction and uses Tukey’s studentized range distribution for computing the p-values. The test compares the difference between each pair of means with appropriate adjustment for the multiple testing. So there is no need to apply additional p-value corrections.

games_howell_test(data, formula, conf.level = 0.95, detailed = FALSE)



a data.frame containing the variables in the formula.


a formula of the form x ~ group where x is a numeric variable giving the data values and group is a factor with one or multiple levels giving the corresponding groups. For example, formula = TP53 ~ cancer_group.


confidence level of the interval.


logical value. Default is FALSE. If TRUE, a detailed result is shown.


return a data frame with some of the following columns:

  • .y.: the y (outcome) variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • n1,n2: Sample counts.

  • estimate, conf.low, conf.high: mean difference and its confidence intervals.

  • statistic: Test statistic (t-value) used to compute the p-value.

  • df: degrees of freedom calculated using Welch’s correction.

  • p.adj: adjusted p-value using Tukey's method.

  • method: the statistical test used to compare groups.

  • p.adj.signif: the significance level of p-values.

The returned object has an attribute called args, which is a list holding the test arguments.


The Games-Howell method is an improved version of the Tukey-Kramer method and is applicable in cases where the equivalence of variance assumption is violated. It is a t-test using Welch’s degree of freedom. This method uses a strategy for controlling the type I error for the entire comparison and is known to maintain the preset significance level even when the size of the sample is different. However, the smaller the number of samples in each group, the it is more tolerant the type I error control. Thus, this method can be applied when the number of samples is six or more.


  • Aaron Schlege,

  • Sangseok Lee, Dong Kyu Lee. What is the proper way to apply the multiple comparison test?. Korean J Anesthesiol. 2018;71(5):353-360.


ToothGrowth %>% games_howell_test(len ~ dose)
#> # A tibble: 3 x 8 #> .y. group1 group2 estimate conf.low conf.high p.adj p.adj.signif #> * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 len 0.5 1 9.13 5.69 12.6 0.000000376 **** #> 2 len 0.5 2 15.5 12.3 18.7 0 **** #> 3 len 1 2 6.37 3.19 9.54 0.0000557 ****