Introduction to the R package rsurv

In this document we show how to use the rsurv package to simulate different types of right-censored survival data.

Type I right-censored survival data

In the code below we show how to simulate a sample of Type I right-censored survival data, assuming that the failure times are generated from an accelerated failure time model with loglogistic baseline distribution. In addition, assume that we wish to consider two exploratory variables, say age and sex, and we want to include an interaction effect between them. Such a task can be easily accomplished by using the function raftreg() along with the function qllogis() available in the package flexsurv.

library(rsurv)
library(dplyr)
library(survstan)

set.seed(1234567890)

n <-  1000
tau <- 10  # maximum follow up time
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = raftreg(runif(n), ~ age*sex, beta = c(1, 2, -0.5), 
                dist = "llogis", shape = 1.5, scale = 1, package = "flexsurv"),
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, tau),
    status = as.numeric(time == t)
  ) 

# visualizing the simulated data:
glimpse(simdata)
#> Rows: 1,000
#> Columns: 5
#> Rowwise: 
#> $ age    <dbl> 1.34592454, 0.99527131, 0.54622688, -1.91272392, 1.92128431, 1.…
#> $ sex    <chr> "m", "f", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m", "m"…
#> $ t      <dbl> 15.2363453, 1.5259533, 2.1783746, 2.4354995, 58.7932958, 16.714…
#> $ time   <dbl> 10.0000000, 1.5259533, 2.1783746, 2.4354995, 10.0000000, 10.000…
#> $ status <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, …

fit <- aftreg(
  Surv(time, status) ~ age*sex,
  data = simdata, dist = "loglogistic"
)
estimates(fit)
#>        age       sexm   age:sexm      alpha      gamma 
#>  0.9494630  2.0094422 -0.4812641  1.4946497  1.0226847

Type II right-censored survival data

Suppose one wishes to simulate from accelerated failure test. In the example presented below we consider n = 20 test units, with a pre-specified number of m = 15 failures (Type II right-censoring). A lognormal distributions with parameters μ = −1 and σ = 1 is assumed for the baseline distribution. Temperature is considered as the acceleration factor. Specifically, we consider 4 levels of temperature (30, 60, 90, 120 oC), and the following linear predictor is: xβ = −0.05 × temperature.

library(rsurv)
library(dplyr)

set.seed(1234567890)
n <- 20 # number of test units
m <- 15 # number of failures

#simulating the failure times:
simdata <- data.frame(
  temperature = rep(c(30, 60, 90, 120), each = 5)    
) |>                                                                   
  mutate(                                                            
    t = raftreg(runif(n), ~ temperature, beta = -0.05, 
                dist = "lnorm", meanlog = -1, sdlog = 1)                                          
  ) |>
  arrange(t) # sorting the generated data

# obtaining 15-th failure time:
tau <- simdata$t[m]

# including type II censoring:
simdata <- simdata |>
  rowwise() |>
  mutate(
    time = min(t, tau),
    status = as.numeric(time == t)
  )
simdata
#> # A tibble: 20 × 4
#> # Rowwise: 
#>    temperature        t     time status
#>          <dbl>    <dbl>    <dbl>  <dbl>
#>  1         120 0.000314 0.000314      1
#>  2         120 0.00101  0.00101       1
#>  3          90 0.00104  0.00104       1
#>  4         120 0.00110  0.00110       1
#>  5         120 0.00169  0.00169       1
#>  6          60 0.00195  0.00195       1
#>  7          90 0.00214  0.00214       1
#>  8          90 0.00253  0.00253       1
#>  9          60 0.00268  0.00268       1
#> 10          60 0.00367  0.00367       1
#> 11         120 0.00503  0.00503       1
#> 12          60 0.00913  0.00913       1
#> 13          90 0.0125   0.0125        1
#> 14          90 0.0143   0.0143        1
#> 15          30 0.0214   0.0214        1
#> 16          30 0.0303   0.0214        0
#> 17          30 0.0422   0.0214        0
#> 18          30 0.0475   0.0214        0
#> 19          30 0.110    0.0214        0
#> 20          60 0.124    0.0214        0

Type III (random censoring) right-censored survival data

library(rsurv)
library(dplyr)

set.seed(1234567890)
n <- 20
covariates <- data.frame(
  stage = sample(c("I", "II", "III", "IV"), size = n, replace = TRUE)
) 

simdata1 <- covariates |>
  mutate(
    time = rphreg(runif(n), ~stage, beta = c(0.2, 0.35, 1.2), dist = "weibull", shape = 2.3, scale = 1)
  )
glimpse(simdata1)
#> Rows: 20
#> Columns: 2
#> $ stage <chr> "I", "III", "IV", "II", "II", "I", "IV", "III", "IV", "III", "II…
#> $ time  <dbl> 1.4704064, 0.5454478, 0.6579358, 0.7999922, 0.8717459, 0.6200115…

att <- attributes(simdata1$time)
str(att)
#> List of 3
#>  $ call        : language rphreg(u = runif(n), formula = ~stage, beta = c(0.2, 0.35, 1.2), dist = "weibull",      shape = 2.3, scale = 1)
#>  $ model.matrix: num [1:20, 1:3] 0 0 0 1 1 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:20] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:3] "stageII" "stageIII" "stageIV"
#>  $ beta        : num [1:3] 0.2 0.35 1.2
unique(att$model.matrix)
#>   stageII stageIII stageIV
#> 1       0        0       0
#> 2       0        1       0
#> 3       0        0       1
#> 4       1        0       0