Create Survival Curves — surv

Wrapper arround the standard survfit() function to create survival curves. Compared to the standard survfit() function, it supports also:

a list of data sets and/or a list of formulas,
a grouped data sets as generated by the function surv_group_by,
group.by option

There are many cases, where this function might be useful:

Case 1: One formula and One data set. Example: You want to fit the survival curves of one biomarker/gene in a given data set. This is the same as the standard survfit() function. Returns one survfit object.
Case 2: List of formulas and One data set. Example: You want to fit the survival curves of a list of biormarkers/genes in the same data set. Returns a named list of survfit objects in the same order as formulas.
Case 3: One formula and List of data sets. Example: You want to fit survival curves of one biomarker/gene in multiple cohort of patients (colon, lung, breast). Returns a named list of survfit objects in the same order as the data sets.
Case 4: List of formulas and List of data sets. Example: You want to fit survival curves of multiple biomarkers/genes in multiple cohort of patients (colon, lung, breast). Each formula will be applied to each of the data set in the data list. Returns a named list of survfit objects.
Case 5: One formula and grouped data sets by one or two variables. Example: One might like to plot the survival curves of patients treated by drug A vs patients treated by drug B in a dataset grouped by TP53 and/or RAS mutations. In this case use the argument group.by. Returns a named list of survfit objects.
Case 6. In a rare case you might have a list of formulas and a list of data sets, and you might want to apply each formula to the mathcing data set with the same index/position in the list. For example formula1 is applied to data 1, formula2 is applied to data 2, and so on ... In this case formula and data lists should have the same length and you should specify the argument match.fd = TRUE ( stands for match formula and data). Returns a named list of survfit objects.

The output of the surv_fit() function can be directly handled by the following functions:

These functions return one element or a list of elements depending on the format of the input.

Usage

surv_fit(formula, data, group.by = NULL, match.fd = FALSE, ...)

Arguments

formula: survival formula. See survfit.formula. Can be a list of formula. Named lists are recommended.
data: a data frame in which to interpret the variables named in the formula. Can be a list of data sets. Named lists are recommended. Can be also a grouped dataset as generated by the function surv_group_by().
group.by: a grouping variables to group the data set by. A character vector containing the name of grouping variables. Should be of length <= 2.
match.fd: logical value. Default is FALSE. Stands for "match formula and data". Useful only when you have a list of formulas and a list of data sets, and you want to apply each formula to the matching data set with the same index/position in the list. For example formula1 is applied to data 1, formula2 is applied to data 2, and so on .... In this case use match.fd = TRUE.
...: Other arguments passed to the survfit.formula function.

Value

Returns an object of class survfit if one formula and one data set provided.
Returns a named list of survfit objects when input is a list of formulas and/or data sets. The same holds true when grouped data sets are provided or when the argument group.by is specified.
- If the names of formula and data lists are available, the names of the resulting survfit objects list are obtained by collapsing the names of formula and data lists.
- If the formula names are not available, the variables in the formulas are extracted and used to build the name of survfit object.
- In the case of grouped data sets, the names of survfit object list are obtained by collapsing the levels of grouping variables and the names of variables in the survival curve formulas.

Examples


library("survival")
library("magrittr")

# Case 1: One formula and One data set
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- surv_fit(Surv(time, status) ~ sex,
               data = colon)
surv_pvalue(fit)
#>   variable      pval   method pval.txt
#> 1      sex 0.6107936 Log-rank p = 0.61


# Case 2: List of formulas and One data set.
#   - Different formulas are applied to the same data set
#   - Returns a (named) list of survfit objects
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# Create a named list of formulas
formulas <- list(
 sex = Surv(time, status) ~ sex,
 rx = Surv(time, status) ~ rx
)

# Fit survival curves for each formula
fit <- surv_fit(formulas, data = colon)
surv_pvalue(fit)
#> $`colon::sex`
#>   variable      pval   method pval.txt
#> 1      sex 0.6107936 Log-rank p = 0.61
#> 
#> $`colon::rx`
#>   variable         pval   method   pval.txt
#> 1       rx 4.990735e-08 Log-rank p < 0.0001
#> 

# Case 3: One formula and List of data sets
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- surv_fit(Surv(time, status) ~ sex,
               data = list(colon, lung))
surv_pvalue(fit)
#> $`colon::sex`
#>   variable      pval   method pval.txt
#> 1      sex 0.6107936 Log-rank p = 0.61
#> 
#> $`lung::sex`
#>   variable        pval   method   pval.txt
#> 1      sex 0.001311165 Log-rank p = 0.0013
#> 


# Case 4: List of formulas and List of data sets
#  - Each formula is applied to each of the data in the data list
#  - argument: match.fd = FALSE
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

# Create two data sets
set.seed(123)
colon1 <- dplyr::sample_frac(colon, 1/2)
set.seed(1234)
colon2 <- dplyr::sample_frac(colon, 1/2)

# Create a named list of formulas
formula.list <- list(
 sex = Surv(time, status) ~ sex,
 adhere = Surv(time, status) ~ adhere,
 rx = Surv(time, status) ~ rx
)

# Fit survival curves
fit <- surv_fit(formula.list, data = list(colon1, colon2),
               match.fd = FALSE)
#> Warning: `combine()` was deprecated in dplyr 1.0.0.
#> ℹ Please use `vctrs::vec_c()` instead.
#> ℹ The deprecated feature was likely used in the survminer package.
#>   Please report the issue at <https://github.com/kassambara/survminer/issues>.
surv_pvalue(fit)
#> $`colon1::sex`
#>   variable      pval   method pval.txt
#> 1      sex 0.8372769 Log-rank p = 0.84
#> 
#> $`colon2::sex`
#>   variable      pval   method pval.txt
#> 1      sex 0.3901548 Log-rank p = 0.39
#> 
#> $`colon1::adhere`
#>   variable      pval   method  pval.txt
#> 1   adhere 0.0125047 Log-rank p = 0.013
#> 
#> $`colon2::adhere`
#>   variable       pval   method  pval.txt
#> 1   adhere 0.02104745 Log-rank p = 0.021
#> 
#> $`colon1::rx`
#>   variable        pval   method   pval.txt
#> 1       rx 0.001173476 Log-rank p = 0.0012
#> 
#> $`colon2::rx`
#>   variable         pval   method   pval.txt
#> 1       rx 4.449283e-05 Log-rank p < 0.0001
#> 


# Grouped survfit
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# - Group by the treatment "rx" and fit survival curves on each subset
# - Returns a list of survfit objects
fit <- surv_fit(Surv(time, status) ~ sex,
               data = colon, group.by = "rx")

# Alternatively, do this
fit <- colon %>%
 surv_group_by("rx") %>%
 surv_fit(Surv(time, status) ~ sex, data = .)

surv_pvalue(fit)
#> $`rx.Obs::sex`
#>   variable      pval   method pval.txt
#> 1      sex 0.5337304 Log-rank p = 0.53
#> 
#> $`rx.Lev::sex`
#>   variable      pval   method pval.txt
#> 1      sex 0.2928911 Log-rank p = 0.29
#> 
#> $`rx.Lev+5FU::sex`
#>   variable         pval   method    pval.txt
#> 1      sex 0.0005623961 Log-rank p = 0.00056
#>