unnamed-chunk-3

Creating good looking survival curves – the ‘ggsurv’ function

This is a guest post by Edwin Thoen

Currently I am doing my master thesis on multi-state models. Survival analysis was my favourite course in the masters program, partly because of the great survival package which is maintained by Terry Therneau. The only thing I am not so keen on are the default plots created by this package, by using plot.survfit. Although the plots are very easy to produce, they are not that attractive (as are most R default plots) and legends has to be added manually. I come across them all the time in the literature and wondered whether there was a better way to display survival. Since I was getting the grips of ggplot2 recently I decided to write my own function, with the same functionality as plot.survfitbut with a result that is much better looking. I stuck to the defaults of plot.survfit as much as possible, for instance by default plotting confidence intervals for single-stratum survival curves, but not for multi-stratum curves. Below you’ll find the code of the ggsurv function. Just as plot.survfit it only requires a fitted survival object to produce a default plot. We’ll use the lung data set from the survival package for illustration. First we load in the function to the console (see at the end of this post).

Once the function is loaded, we can get going, we use the lung data set from the survival package for illustration.

library(survival)
data(lung)
lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)
ggsurv(lung.surv)

unnamed-chunk-2

Censored observations are denoted by red crosses, by default a confidence interval is plotted and the axes are labeled. Everything can be easily adjusted by setting the function parameters. Now lets look at differences in survival between men and women, creating a multi-stratum survival curve.

lung.surv2 <- survfit(Surv(time,status) ~ sex, data = lung)
(pl2 <- ggsurv(lung.surv2))

unnamed-chunk-3

The multi-stratum curves are by default of different colors, the standard ggplot colours. You can set them to your favourite color of course. As always with ggplots a legend is created by default. However we note that levels of the variable sex are called 1 and 2, not very informative. Fortunately the output of ggsurv can still be modified by adding layers after using the function, it is just an ordinary ggplot object.

(pl2 <- pl2 + guides(linetype = F) + 
 scale_colour_discrete(name = 'Sex', breaks = c(1,2), labels=c('Male', 'Female')))

unnamed-chunk-4

That’s better. Note that the function had also created a legend for linetype, that was non-informative in this case because the linetypes are the same. We removed the legend for linetype before adjusting the one for color.

Finally we can also adjust the plot itself. Maybe the oncologist is very interested in median survival of men and women. Lets help her by showing this on the plot.

lung.surv2
med.surv <- data.frame(time = c(270,270, 426,426), quant = c(.5,0,.5,0),
                       sex = c('M', 'M', 'F', 'F'))
pl2 + geom_line(data = med.surv, aes(time, quant, group = sex), 
      col = 'darkblue', linetype = 3) +
      geom_point(data = med.surv, aes(time, quant, group =sex), col = 'darkblue')

unnamed-chunk-5

I hope survival researchers will take the effort to produce better looking plots after reading this post, although copy pasting the code won’t be too much of an effort I guess.

 
ggsurv <- function(s, CI = 'def', plot.cens = T, surv.col = 'gg.def',
                   cens.col = 'red', lty.est = 1, lty.ci = 2,
                   cens.shape = 3, back.white = F, xlab = 'Time',
                   ylab = 'Survival', main = ''){
 
  library(ggplot2)  
  strata <- ifelse(is.null(s$strata) ==T, 1, length(s$strata))
  stopifnot(length(surv.col) == 1 | length(surv.col) == strata)
  stopifnot(length(lty.est) == 1 | length(lty.est) == strata)
 
  ggsurv.s <- function(s, CI = 'def', plot.cens = T, surv.col = 'gg.def',
                       cens.col = 'red', lty.est = 1, lty.ci = 2,
                       cens.shape = 3, back.white = F, xlab = 'Time',
                       ylab = 'Survival', main = ''){
 
    dat <- data.frame(time = c(0, s$time),
                      surv = c(1, s$surv),
                      up = c(1, s$upper),
                      low = c(1, s$lower),
                      cens = c(0, s$n.censor))
    dat.cens <- subset(dat, cens != 0)
 
    col <- ifelse(surv.col == 'gg.def', 'black', surv.col)
 
    pl <- ggplot(dat, aes(x = time, y = surv)) + 
      xlab(xlab) + ylab(ylab) + ggtitle(main) + 
      geom_step(col = col, lty = lty.est)
 
    pl <- if(CI == T | CI == 'def') {
      pl + geom_step(aes(y = up), color = col, lty = lty.ci) +
        geom_step(aes(y = low), color = col, lty = lty.ci)
    } else (pl)
 
    pl <- if(plot.cens == T & length(dat.cens) > 0){
      pl + geom_point(data = dat.cens, aes(y = surv), shape = cens.shape,
                       col = cens.col)
    } else if (plot.cens == T & length(dat.cens) == 0){
      stop ('There are no censored observations') 
    } else(pl)
 
    pl <- if(back.white == T) {pl + theme_bw()
    } else (pl)
    pl
  }
 
  ggsurv.m <- function(s, CI = 'def', plot.cens = T, surv.col = 'gg.def',
                       cens.col = 'red', lty.est = 1, lty.ci = 2,
                       cens.shape = 3, back.white = F, xlab = 'Time',
                       ylab = 'Survival', main = '') {
    n <- s$strata
 
    groups <- factor(unlist(strsplit(names
                                     (s$strata), '='))[seq(2, 2*strata, by = 2)])
    gr.name <-  unlist(strsplit(names(s$strata), '='))[1]
    gr.df <- vector('list', strata)
    ind <- vector('list', strata)
    n.ind <- c(0,n); n.ind <- cumsum(n.ind)
    for(i in 1:strata) ind[[i]] <- (n.ind[i]+1):n.ind[i+1]
 
    for(i in 1:strata){
      gr.df[[i]] <- data.frame(
        time = c(0, s$time[ ind[[i]] ]),
        surv = c(1, s$surv[ ind[[i]] ]),
        up = c(1, s$upper[ ind[[i]] ]), 
        low = c(1, s$lower[ ind[[i]] ]),
        cens = c(0, s$n.censor[ ind[[i]] ]),
        group = rep(groups[i], n[i] + 1)) 
    }
 
    dat <- do.call(rbind, gr.df)
    dat.cens <- subset(dat, cens != 0)
 
    pl <- ggplot(dat, aes(x = time, y = surv, group = group)) + 
      xlab(xlab) + ylab(ylab) + ggtitle(main) + 
      geom_step(aes(col = group, lty = group))
 
    col <- if(length(surv.col == 1)){
      scale_colour_manual(name = gr.name, values = rep(surv.col, strata))
    } else{
      scale_colour_manual(name = gr.name, values = surv.col)
    }
 
    pl <- if(surv.col[1] != 'gg.def'){
      pl + col
    } else {pl + scale_colour_discrete(name = gr.name)}
 
    line <- if(length(lty.est) == 1){
      scale_linetype_manual(name = gr.name, values = rep(lty.est, strata))
    } else {scale_linetype_manual(name = gr.name, values = lty.est)}
 
    pl <- pl + line
 
    pl <- if(CI == T) {
      if(length(surv.col) > 1 && length(lty.est) > 1){
        stop('Either surv.col or lty.est should be of length 1 in order
             to plot 95% CI with multiple strata')
      }else if((length(surv.col) > 1 | surv.col == 'gg.def')[1]){
        pl + geom_step(aes(y = up, color = group), lty = lty.ci) +
          geom_step(aes(y = low, color = group), lty = lty.ci)
      } else{pl +  geom_step(aes(y = up, lty = group), col = surv.col) +
               geom_step(aes(y = low,lty = group), col = surv.col)}   
    } else {pl}
 
 
    pl <- if(plot.cens == T & length(dat.cens) > 0){
      pl + geom_point(data = dat.cens, aes(y = surv), shape = cens.shape,
                      col = cens.col)
    } else if (plot.cens == T & length(dat.cens) == 0){
      stop ('There are no censored observations') 
    } else(pl)
 
    pl <- if(back.white == T) {pl + theme_bw()
    } else (pl) 
    pl
  } 
  pl <- if(strata == 1) {ggsurv.s(s, CI , plot.cens, surv.col ,
                                  cens.col, lty.est, lty.ci,
                                  cens.shape, back.white, xlab,
                                  ylab, main) 
  } else {ggsurv.m(s, CI, plot.cens, surv.col ,
                   cens.col, lty.est, lty.ci,
                   cens.shape, back.white, xlab,
                   ylab, main)}
  pl
}
  • majom

    Great post. Is it also possible to use ggsurf if I use glm or glmer to estimate a (multi-level) discrete-time hazard model (see http://www.ats.ucla.edu/stat/r/examples/alda/ch12.htm)?

    • Edwin Thoen

      The function only works if it is used on an object of class survfit. If you want it to work on a different object you should tweak the code a bit. Note that the first part of the function is creating data frames that are fed to the ggplot code below. If you can turn your fit into a data frame just alike you can readily use the code that produces the plots. Good luck!

  • Tim Churches

    See also slide 68 onwards in http://timchurches.github.io/ggplot2er/ which illustrates the use of a similar approach by Ramon Saccilotto of the Basel Institute for Clinical Epidemiology and Biostatistics (links to Ramon’s work are in the presentation).

    • Edwin Thoen

      Didn’t see that one before, thanks. Decided to write the function because I couldn’t find any function or code. Nice slides by the way, Lung data set is popular!

      • Fr.

        Either your function or Ramon’s (no idea which one works best) should get submitted to the GGally package, or perhaps even to the autoplot package.

        • Edwin Thoen

          Thanks for the suggestion, I will look into the options.

          • Fr.

            No worries. I’ll suggest little things if you submit it to GGally, like vectorizing the strata loops or leaving out the xlab and ylab arguments to encourage the use of the labs() function in ggplot2.

  • andrew beckerman

    While these are a nice example of using ggplot syntax, you might want to also look at the features associated with survplot() in the rms package by Frank Harrell – all based in base graphics. The confidence bands are worth their weight in gold.

  • Sandy

    The survival rate in my data set is 20%. Following your syntax, the range of the y-axis was constrained between 0.75 and 1.00. So what if I want the y-axis ranges from 0 to 1?Thank you!

    • Edwin Thoen

      You can just use the ggplot2 function ylim to adjust the y-axis

      ggsurv(my.survfit) + ylim(0, 1)

  • le_hk

    Although non-R, there is also a win-based tool to draw survival time plots:
    http://www.plosone.org/article/info:doi/10.1371/journal.pone.0038960

    Holger

  • Bob McDonald

    Hoping to solicit some help re: two issues with ggsurv. My plot contains two strata similar to the second example where you stratify by sex.

    1: how can you redefine the linetype of the KM curves using ggsurv? I want to make them dashed (for example)/ Using linetype to modify these curves does not seem to work.

    2: I can overlay a smoothed line on top of the KM step-plot by adding: + geom_line() to the code. However, the colors do not correspond to the colors of the step curve. How do I force ggplot to make them the same color as their respective step curve?

    Thanks for your help,

    Bob

    • Edwin Thoen

      Hi Bob,

      Thanks for your questions.

      1) There is a built in option for the line type of the estimates

      ggsurv(s, lty.est = 2)

      would give a dashed line for the survival curves.

      2) Within the ggsurv function the different strata levels are stored in the variable called group. This variable is used to create the different colors for the strata. You can use this variable also if you want to make additions to the plot. (I would suggest to use the smooth geom instead of the line geom, if you want to add a smoother to the plot)

      survPlot <- ggsurv(s)
      survPlot + geom_line(aes(color = group))
      survPlot + geom_smooth(aes(color = group), se = F)

      Good luck,
      Edwin

  • Jin

    Can you help me with adjusting the size of curves?
    Also, how can I set the colors of multi-stratum curves?

    Thank you for your help.

    Jin.

    • Edwin Thoen

      Hi Jin,

      You can specify the colors at the surv.col argument. Just enter a vector with color names of the same length as the number of strata, for example c(“red”, “green”). There is no argument to specify line width, but you are free to add that to the function in the code above of course!

      Good luck,
      Edwin

      • Jin

        Thank you very much for your help.

  • Roman

    Thanks for sharing this general solution for plotting survival curves with multiple strata. I just want to suggest a couple things about the code. It seems cleaner to separate the helper functions ggsurv.s and ggsurv.m instead of redefining them every time ggsurv is called; this can be done cleanly without relying on the currently shared variable “strata” in the parent function (e.g., just define it again in ggsurv.m where it’s used). It also seems cleaner to use a consistent type for function arguments (e.g., “CI” could be logical or character) to simplify checks (e.g., “if(CI)”). It’s completely unnecessary to compare logical variables with truth (e.g., “if(CI)” is simpler and clearer than “if(CI == TRUE)”). It seems you are mixing curly braces and parentheses in your if-then statements. I think it’s easier to read assignments inside if-then statements instead of assigning the result of the entire if-then statement to an object. It’s simpler and less error-prone to rely on data.frame in-built mechanism to repeat an atomic value the appropriate number of times (e.g., “data.frame(x=1:10, group=groups[i])” instead of “data.frame(x=1:10, group=rep(groups[i], n[i]+1))”). Using “with” avoids repeatedly accessing various slots from within the same object (e.g., “s$time”, “s$surv”, “s$upper”, etc.). It seems preferable to write out “TRUE” and “FALSE” instead of relying on “T” and “F” which can be overridden in some environments. And I think you mean “black.white” instead of “back.white”. Again, thanks for sharing your code. I intend to use it following a bit of clean up.

    • thekalaban

      Hello! Can you share your cleaned up code to us? Thank you!

  • Edwin Thoen

    Thanks for your suggestions Roman, I certainly don’t regard myself as a professional R programmer, so all suggested improvements to my coding are more than welcome! Currently I am working on getting the function to the GGally package and adjustments to the original code already have been made. I will keep your comments in mind when further improving the code. Thanks again!

  • thekalaban

    How do I make the axes’ values more discrete?
    I have my y-axis only display 1.0, 0.9, 0.8 and I would like it to be more specific ie. 1.0, 0.95, 0.9, 0.85, 0.8 at least.

    • Edwin Thoen

      I am sorry I didn’t notice your comment earlier. The plots produced by ggsurv() are just ggplot opbjects, so you can apply the ggplot function scale_y_continous() in this case. Assuming that you have saved your plot in an object called p:

      p + scale_y_continuous(breaks = seq(0, 1, by = .1))

      Good luck,
      Edwin

  • Bryant

    Is there a way to add shading to each of the strata’s confidence intervals. Perhaps, taking it a bit farther, applying a different shade color to the regions of overlap, i.e one strata will be blue and another red then the overlap region could be purple? Just some thoughts

    • Edwin Thoen

      Thanks for the suggestion, I would imagine we would use the “ribbon” geom for this. Unfortunately I have no time to look into it right now but you are absolutely welcome to tweak the code to do this.

  • Sophia

    hi, quick question, why you have red dots on your male line (which is green)?

    • Edwin Thoen

      The default color for the censored observations is red, irrespective of the color of the line. This can be adjusted by entering the color of your liking at cens.col.

  • Edwin Thoen

    I am happy to inform you all that the ggsurv function is now available in the GGally package.

  • dscience

    install GGally but ggsurv is still not available any reason?

    • Edwin Thoen

      Thanks for notifying. I just contacted the package admin Barret, something went wrong with the publication of the latest package version. It will be back in at the next version, which will be released soon. Sorry for this…