Specifiyng weights in Logrank comparisons
Marcin Kosinski
created 29012017, revised 22082018
Source:vignettes/Specifiying_weights_in_logrank_comparisons.Rmd
Specifiying_weights_in_logrank_comparisons.Rmd
This vignette covers changes between versions 0.2.4 and 0.2.5 for specifiyng weights in the logrank comparisons done in
ggsurvplot()
.
Logrank statistic for 2 groups
As it is stated in the literature, the Logrank test for comparing survival (estimates of survival curves) in 2 groups ($A$ and $B$) is based on the below statistic
$LR = \frac{U^2}{V} \sim \chi(1),$
where $U = \sum_{i=1}^{T}w_{t_i}(o_{t_i}^Ae_{t_i}^A), \ \ \ \ \ \ \ \ V = Var(U) = \sum_{i=1}^{T}(w_{t_i}^2\frac{n_{t_i}^An_{t_i}^Bo_{t_i}(n_{t_i}o_{t_i})}{n_{t_i}^2(n_{t_i}1)})$ and
 $t_i$ for $i=1, \dots, T$ are possible event times,
 $n_{t_i}$ is the overall risk set size on the time $t_i$ ($n_{t_i} = n_{t_i}^A+n_{t_i}^B$),
 $n_{t_i}^A$ is the risk set size on the time $t_i$ in group $A$,
 $n_{t_i}^B$ is the risk set size on the time $t_i$ in group $B$,
 $o_{t_i}$ overall observed events in the time $t_i$ ($o_{t_i} = o_{t_i}^A+o_{t_i}^B$),
 $o_{t_i}^A$ observed events in the time $t_i$ in group $A$,
 $o_{t_i}^B$ observed events in the time $t_i$ in group $B$,
 $e_{t_i}$ number of overall expected events in the time $t_i$ ($e_{t_i} = e_{t_i}^A+e_{t_i}^B$),
 $e_{t_i}^A$ number of expected events in the time $t_i$ in group $A$,
 $e_{t_i}^B$ number of expected events in the time $t_i$ in group $B$,
 $w_{t_i}$ is a weight for the statistic,
also remember about few notes
$e_{t_i}^A = n_{t_i}^A \frac{o_{t_i}}{n_{t_i}}, \ \ \ \ \ \ \ \ \ \ e_{t_i}^B = n_{t_i}^B \frac{o_{t_i}}{n_{t_i}},$$e_{t_i}^A + e_{t_i}^B = o_{t_i}^A + o_{t_i}^B$
that’s why we can substitute group $A$ with $B$ in $U$ and receive same results.
Weighted Logrank extensions
Regular Logrank comparison uses
$w_{t_i} = 1$
but many modifications to that approach have been proposed. The most
popular modifications, called weighted Logrank tests, are available in
?survMisc::comp

n
Gehan and Breslow proposed to use $w_{t_i} = n_{t_i}$ (this is also called generalized Wilcoxon), 
srqtN
Tharone and Ware proposed to use $w_{t_i} = \sqrt{n_{t_i}}$, 
S1
PetoPeto’s modified survival estimate $w_{t_i} = S1({t_i}) = \prod_{i=1}^{T}(1\frac{e_{t_i}}{n_{t_i}+1})$, 
S2
modified PetoPeto (by Andersen) $w_{t_i} = S2({t_i}) = \frac{S1({t_i})n_{t_i}}{n_{t_i}+1}$, 
FH
FlemingHarrington $w_{t_i} = S(t_i)^p(1  S(t_i))^q$.
Watch out for
FH
as I submitted an info on survMisc repository where I think their mathematical notation is misleading for FlemingHarrington.
Why are they useful?
The regular Logrank test is sensitive to detect differences in late
survival times, where GehanBreslow and TharoneWare propositions might
be used if one is interested in early differences in survival times.
PetoPeto modifications are also useful in early differences and are
more robust (than TharoneWhare or GehanBreslow) for situations where
many observations are censored. The most flexible is FlemingHarrington
method for weights, where high p
indicates detecting early
differences and high q
indicates detecting differences in
late survival times. But there is always an issue on how to detect
p
and q
.
Remember that test selection should be performed at the research design level! Not after looking in the dataset.
Plots
library("survival")
data("lung")
fit < survfit(Surv(time, status) ~ sex, data = lung)
After preparing a functionality for this GitHub’s issue Other tests than logrank for testing survival curves and Logrank test for trend we are now able to compute pvalues for various Logrank test in survminer package. Let as see below examples on executing all possible tests.
Logrank (survdiff)
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE)
Logrank (comp)
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE,
log.rank.weights = "1")
GehanBreslow (generalized Wilcoxon)
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE,
log.rank.weights = "n", pval.method.coord = c(5, 0.1),
pval.method.size = 3)
TharoneWare
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE,
log.rank.weights = "sqrtN", pval.method.coord = c(3, 0.1),
pval.method.size = 4)
PetoPeto’s modified survival estimate
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE,
log.rank.weights = "S1", pval.method.coord = c(5, 0.1),
pval.method.size = 3)
modified PetoPeto’s (by Andersen)
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE,
log.rank.weights = "S2", pval.method.coord = c(5, 0.1),
pval.method.size = 3)
FlemingHarrington (p=1, q=1)
ggsurvplot(fit, data = lung, pval = TRUE, pval.method = TRUE,
log.rank.weights = "FH_p=1_q=1",
pval.method.coord = c(5, 0.1),
pval.method.size = 4)
References
Gehan A. A Generalized Wilcoxon Test for Comparing Arbitrarily SinglyCensored Samples. Biometrika 1965 Jun. 52(1/2):20323.
Tarone RE, Ware J 1977 On DistributionFree Tests for Equality of Survival Distributions. Biometrika;64(1):15660.
Peto R, Peto J 1972 Asymptotically Efficient Rank Invariant Test Procedures. J Royal Statistical Society 135(2):186207.
Fleming TR, Harrington DP, O’Sullivan M 1987 Supremum Versions of the LogRank and Generalized Wilcoxon Statistics. J American Statistical Association 82(397):31220.
Billingsly P 1999 Convergence of Probability Measures. New York: John Wiley & Sons.