Skip to contents

The function surv_adjustedcurves() calculates while the function ggadjustedcurves() plots adjusted survival curves for the coxph model. The main idea behind this function is to present expected survival curves calculated based on Cox model separately for subpopulations. The very detailed description and interesting discussion of adjusted curves is presented in 'Adjusted Survival Curves' by Terry Therneau, Cynthia Crowson, Elizabeth Atkinson (2015) https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf. Many approaches are discussed in this article. Currently four approaches (two unbalanced, one conditional and one marginal) are implemented in the ggadjustedcurves() function. See the section Details.

Usage

ggadjustedcurves(
  fit,
  variable = NULL,
  data = NULL,
  reference = NULL,
  method = "conditional",
  fun = NULL,
  palette = "hue",
  ylab = "Survival rate",
  size = 1,
  ggtheme = theme_survminer(),
  ...
)

surv_adjustedcurves(
  fit,
  variable = NULL,
  data = NULL,
  reference = NULL,
  method = "conditional",
  size = 1,
  ...
)

Arguments

fit

an object of class coxph.object - created with coxph function.

variable

a character, name of the grouping variable to be plotted. If not supplied then it will be extracted from the model formula from the strata() component. If there is no strata() component then only a single curve will be plotted - average for the thole population.

data

a dataset for predictions. If not supplied then data will be extracted from the fit object.

reference

a dataset for reference population, to which dependent variables should be balanced. If not specified, then the data will be used instead. Note that the reference dataset should contain all variables used in fit object.

method

a character, describes how the expected survival curves shall be calculated. Possible options: 'single' (average for population), 'average' (averages for subpopulations), 'marginal', 'conditional' (averages for subpopulations after rebalancing). See the Details section for further information.

fun

an arbitrary function defining a transformation of the survival curve. Often used transformations can be specified with a character argument: "event" plots cumulative events (f(y) = 1-y), "cumhaz" plots the cumulative hazard function (f(y) = -log(y)), and "pct" for survival probability in percentage.

palette

the color palette to be used. Allowed values include "hue" for the default hue color scale; "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". See details section for more information. Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette.

ylab

a label for oy axis.

size

the curve size.

ggtheme

function, ggplot2 theme name. Allowed values include ggplot2 official themes: see theme.

...

further arguments passed to the function ggpar for customizing the plot.

Value

Returns an object of class gg.

Details

Currently four approaches are implemented in the ggadjustedcurves() function.

For method = "single" a single survival curve is calculated and plotted. The curve presents an expected survival calculated for population data calculated based on the Cox model fit.

For method = "average" a separate survival curve is plotted for each level of a variable listed as variable. If this argument is not specified, then it will be extracted from the strata component of fit argument. Each curve presents an expected survival calculated for subpopulation from data based on a Cox model fit. Note that in this method subpopulations are NOT balanced.

For method = "marginal" a survival curve is plotted for each level of a grouping variable selected by variable argument. If this argument is not specified, then it will be extracted from the strata component of fit object. Subpopulations are balanced with respect to variables in the fit formula to keep distributions similar to these in the reference population. If no reference population is specified, then the whole data is used as a reference population instead. The balancing is performed in a following way: (1) for each subpopulation a logistic regression model is created to model the odds of being in the subpopulation against the reference population given the other variables listed in a fit object, (2) reverse probabilities of belonging to a specified subpopulation are used as weights in the Cox model, (3) the Cox model is refitted with weights taken into account, (4) expected survival curves are calculated for each subpopulation based on a refitted weighted model.

For method = "conditional" a separate survival curve is plotted for each level of a grouping variable selected by variable argument. If this argument is not specified, then it will be extracted from the strata component of fit object. Subpopulations are balanced in a following way: (1) the data is replicated as many times as many subpopulations are considered (say k), (2) for each row in original data a set of k copies are created and for every copy a different value of a grouping variable is assigned, this will create a new dataset balanced in terms of grouping variables, (3) expected survival is calculated for each subpopulation based on the new artificial dataset. Here the model fit is not refitted.

Note that surv_adjustedcurves function calculates survival curves and based on this function one can calculate median survival.

Author

Przemyslaw Biecek, przemyslaw.biecek@gmail.com

Examples


library(survival)
fit2 <- coxph( Surv(stop, event) ~ size, data = bladder )
# single curve
ggadjustedcurves(fit2, data = bladder)

curve <- surv_adjustedcurves(fit2, data = bladder)

fit2 <- coxph( Surv(stop, event) ~ size + strata(rx), data = bladder )
# average in groups
ggadjustedcurves(fit2, data = bladder, method = "average", variable = "rx")

curve <- surv_adjustedcurves(fit2, data = bladder, method = "average", variable = "rx")

# conditional balancing in groups
ggadjustedcurves(fit2, data = bladder, method = "marginal", variable = "rx")

curve <- surv_adjustedcurves(fit2, data = bladder, method = "marginal", variable = "rx")

# selected reference population
ggadjustedcurves(fit2, data = bladder, method = "marginal", variable = "rx",
    reference = bladder[bladder$rx == "1",])


# conditional balancing in groups
ggadjustedcurves(fit2, data = bladder, method = "conditional", variable = "rx")

curve <- surv_adjustedcurves(fit2, data = bladder, method = "conditional", variable = "rx")

if (FALSE) { # \dontrun{
# this will take some time
fdata <- flchain[flchain$futime >=7,]
fdata$age2 <- cut(fdata$age, c(0,54, 59,64, 69,74,79, 89, 110),
                  labels = c(paste(c(50,55,60,65,70,75,80),
                                   c(54,59,64,69,74,79,89), sep='-'), "90+"))
fdata$group <- factor(1+ 1*(fdata$flc.grp >7) + 1*(fdata$flc.grp >9),
                      levels=1:3,
                      labels=c("FLC < 3.38", "3.38 - 4.71", "FLC > 4.71"))
# single curve
fit <- coxph( Surv(futime, death) ~ age*sex, data = fdata)
ggadjustedcurves(fit, data = fdata, method = "single")

# average in groups
fit <- coxph( Surv(futime, death) ~ age*sex + strata(group), data = fdata)
ggadjustedcurves(fit, data = fdata, method = "average")

# conditional balancing in groups
ggadjustedcurves(fit, data = fdata, method = "conditional")

# marginal balancing in groups
ggadjustedcurves(fit, data = fdata, method = "marginal", reference = fdata)
} # }