Skip to contents

Determine the optimal cutpoint for one or multiple continuous variables at once, using the maximally selected rank statistics from the 'maxstat' R package. This is an outcome-oriented methods providing a value of a cutpoint that correspond to the most significant relation with outcome (here, survival).

  • surv_cutpoint(): Determine the optimal cutpoint for each variable using 'maxstat'.

  • surv_categorize(): Divide each variable values based on the cutpoint returned by surv_cutpoint().

Usage

surv_cutpoint(
  data,
  time = "time",
  event = "event",
  variables,
  minprop = 0.1,
  progressbar = TRUE
)

surv_categorize(x, variables = NULL, labels = c("low", "high"))

# S3 method for class 'surv_cutpoint'
summary(object, ...)

# S3 method for class 'surv_cutpoint'
print(x, ...)

# S3 method for class 'surv_cutpoint'
plot(x, variables = NULL, ggtheme = theme_classic(), bins = 30, ...)

# S3 method for class 'plot_surv_cutpoint'
print(x, ..., newpage = TRUE)

Arguments

data

a data frame containing survival information (time, event) and continuous variables (e.g.: gene expression data).

time, event

column names containing time and event data, respectively. Event values sould be 0 or 1.

variables

a character vector containing the names of variables of interest, for wich we want to estimate the optimal cutpoint.

minprop

the minimal proportion of observations per group.

progressbar

logical value. If TRUE, show progress bar. Progressbar is shown only, when the number of variables > 5.

x, object

an object of class surv_cutpoint

labels

labels for the levels of the resulting category.

...

other arguments. For plots, see ?ggpubr::ggpar

ggtheme

function, ggplot2 theme name. Default value is theme_classic. Allowed values include ggplot2 official themes. See theme().

bins

Number of bins for histogram. Defaults to 30.

newpage

open a new page. See grid.arrange.

Value

  • surv_cutpoint(): returns an object of class 'surv_cutpoint', which is a list with the following components:

    • maxstat results for each variable (see ?maxstat::maxstat)

    • cutpoint: a data frame containing the optimal cutpoint of each variable. Rows are variable names and columns are c("cutpoint", "statistic").

    • data: a data frame containing the survival data and the original data for the specified variables.

    • minprop: the minimal proportion of observations per group.

    • not_numeric: contains data for non-numeric variables, in the context where the user provided categorical variable names in the argument variables.

    Methods defined for surv_cutpoint object are summary, print and plot.
  • surv_categorize(): returns an object of class 'surv_categorize', which is a data frame containing the survival data and the categorized variables.

Author

Alboukadel Kassambara, alboukadel.kassambara@gmail.com

Examples

# 0. Load some data
data(myeloma)
head(myeloma)
#>          molecular_group chr1q21_status treatment event  time   CCND1 CRIM1
#> GSM50986      Cyclin D-1       3 copies       TT2     0 69.24  9908.4 420.9
#> GSM50988      Cyclin D-2       2 copies       TT2     0 66.43 16698.8  52.0
#> GSM50989           MMSET       2 copies       TT2     0 66.50   294.5 617.9
#> GSM50990           MMSET       3 copies       TT2     1 42.67   241.9  11.9
#> GSM50991             MAF           <NA>       TT2     0 65.00   472.6  38.8
#> GSM50992    Hyperdiploid       2 copies       TT2     0 65.20   664.1  16.9
#>          DEPDC1    IRF4   TP53   WHSC1
#> GSM50986  523.5 16156.5   10.0   261.9
#> GSM50988   21.1 16946.2 1056.9   363.8
#> GSM50989  192.9  8903.9 1762.8 10042.9
#> GSM50990  184.7 11894.7  946.8  4931.0
#> GSM50991  212.0  7563.1  361.4   165.0
#> GSM50992  341.6 16023.4 2096.3   569.2

# 1. Determine the optimal cutpoint of variables
res.cut <- surv_cutpoint(myeloma, time = "time", event = "event",
   variables = c("DEPDC1", "WHSC1", "CRIM1"))

summary(res.cut)
#>        cutpoint statistic
#> DEPDC1    279.8  4.275452
#> WHSC1    3205.6  3.361330
#> CRIM1      82.3  1.968317

# 2. Plot cutpoint for DEPDC1
# palette = "npg" (nature publishing group), see ?ggpubr::ggpar
plot(res.cut, "DEPDC1", palette = "npg")
#> $DEPDC1

#> 

# 3. Categorize variables
res.cat <- surv_categorize(res.cut)
head(res.cat)
#>           time event DEPDC1 WHSC1 CRIM1
#> GSM50986 69.24     0   high   low  high
#> GSM50988 66.43     0    low   low   low
#> GSM50989 66.50     0    low  high  high
#> GSM50990 42.67     1    low  high   low
#> GSM50991 65.00     0    low   low   low
#> GSM50992 65.20     0   high   low   low

# 4. Fit survival curves and visualize
library("survival")
fit <- survfit(Surv(time, event) ~DEPDC1, data = res.cat)
ggsurvplot(fit, data = res.cat, risk.table = TRUE, conf.int = TRUE)