Package 'rsurv'

Title: Random Generation of Survival Data
Description: Random generation of survival data from a wide range of regression models, including accelerated failure time (AFT), proportional hazards (PH), proportional odds (PO), accelerated hazard (AH), Yang and Prentice (YP), and extended hazard (EH) models. The package 'rsurv' also stands out by its ability to generate survival data from an unlimited number of baseline distributions provided that an implementation of the quantile function of the chosen baseline distribution is available in R. Another nice feature of the package 'rsurv' lies in the fact that linear predictors are specified via a formula-based approach, facilitating the inclusion of categorical variables and interaction terms. The functions implemented in the package 'rsurv' can also be employed to simulate survival data with more complex structures, such as survival data with different types of censoring mechanisms, left-, right-, and double-truncated survival data, survival data with cure fraction, survival data with random effects (frailties), multivariate survival data, and competing risks survival data. Details about the R package 'rsurv' can be found in Demarqui (2024) <doi:10.48550/arXiv.2406.01750>.
Authors: Fabio Demarqui [aut, cre, cph]
Maintainer: Fabio Demarqui <[email protected]>
License: GPL (>= 3)
Version: 0.0.3
Built: 2026-05-10 08:54:53 UTC
Source: https://github.com/fndemarqui/rsurv

Help Index


The 'rsurv' package

Description

Random generation of survival data based on different survival regression models available in the literature, including Accelerated Failure Time (AFT) model, Proportional Hazard (PH) model, Proportional Odds (PO) model and the Yang & Prentice (YP) model.

_PACKAGE

References

Demarqui FN, Mayrink VD (2021). “Yang and Prentice model with piecewise exponential baseline distribution for modeling lifetime data with crossing survival curves.” Brazilian Journal of Probability and Statistics, 35(1), 172 – 186. doi:10.1214/20-BJPS471.

Yang S, Prentice RL (2005). “Semiparametric analysis of short-term and long-term hazard ratios with two-sample survival data.” Biometrika, 92(1), 1-17.


Implemented link functions for the mixture cure rate model

Description

This function is used to specify different link functions for the count component of the mixture cure rate model.

Usage

bernoulli(link = "logit")

Arguments

link

desired link function; currently implemented links are: logit, probit, cloglog and cauchy.

Value

A list containing the codes associated with the count distribution assumed for the latent variable N and the chosen link.


The Birnbaum-Saunders (Fatigue Life) Distribution

Description

Density, distribution function, quantile function and random generation for the Birnbaum-Saunders distribution with shape parameter alpha, scale parameter gamma, and location parameter mu.

Usage

dfatigue(x, alpha, gamma, mu = 0, log = FALSE)

pfatigue(q, alpha, gamma, mu = 0, lower.tail = TRUE, log.p = FALSE)

qfatigue(p, alpha, gamma, mu = 0, lower.tail = TRUE, log.p = FALSE)

rfatigue(n, alpha, gamma, mu = 0)

Arguments

x, q

vector of quantiles.

alpha

shape parameter. Must be positive.

gamma

scale parameter. Must be positive.

mu

location parameter (default is 0).

log, log.p

logical; if TRUE, probabilities/densities p are returned as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length is taken to be the number required.

Details

The Birnbaum-Saunders distribution, also known as the fatigue life distribution, is commonly used in reliability and survival analysis. It was originally proposed to model fatigue failure times of materials subjected to cyclic stress.

The probability density function is given by:

f(x)=(xμ)/γ+γ/(xμ)2α(xμ)ϕ((xμ)/γγ/(xμ)α)f(x) = \frac{\sqrt{(x-\mu)/\gamma} + \sqrt{\gamma/(x-\mu)}}{2\alpha(x-\mu)} \phi\left(\frac{\sqrt{(x-\mu)/\gamma} - \sqrt{\gamma/(x-\mu)}}{\alpha}\right)

for x>μx > \mu, where ϕ\phi is the standard normal density function.

Value

dfatigue gives the density, pfatigue gives the distribution function, qfatigue gives the quantile function, and rfatigue generates random deviates.

The length of the result is determined by n for rfatigue, and is the maximum of the lengths of the numerical arguments for the other functions.

References

Birnbaum, Z. W., & Saunders, S. C. (1969). A new family of life distributions. Journal of Applied Probability, 6(2), 319-327.

Examples

# Density at x = 2 with alpha = 0.5 and gamma = 1
dfatigue(2, alpha = 0.5, gamma = 1)

# CDF at x = 2
pfatigue(2, alpha = 0.5, gamma = 1)

# Quantile for p = 0.5 (median)
qfatigue(0.5, alpha = 0.5, gamma = 1)

# Generate 10 random values
rfatigue(10, alpha = 0.5, gamma = 1)

Inverse of the probability generating function

Description

This function is used to specify different link functions for the count component of the promotion time cure rate model

Usage

inv_pgf(formula, incidence = "bernoulli", kappa = NULL, zeta = NULL, data, ...)

Arguments

formula

formula specifying the linear predictor for the incidence sub-model.

incidence

the desired incidence model.

kappa

vector of regression coefficients associated with the incidence sub-model.

zeta

extra negative-binomial parameter.

data

a data.frame containing the explanatory covariates passed to the formula.

...

further arguments passed to other methods.

Value

A vector with the values of the inverse of the desired probability generating function.


The Log-Logistic Distribution

Description

Density, distribution function, quantile function and random generation for the log-logistic distribution with shape and scale parameters.

Usage

dloglogistic(x, shape, scale, log = FALSE)

ploglogistic(q, shape, scale, lower.tail = TRUE, log.p = FALSE)

qloglogistic(p, shape, scale, lower.tail = TRUE, log.p = FALSE)

rloglogistic(n, shape, scale)

Arguments

x, q

vector of quantiles.

shape

shape parameter. Must be positive.

scale

scale parameter. Must be positive.

log, log.p

logical; if TRUE, probabilities/densities p are returned as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length is taken to be the number required.

Details

The log-logistic distribution is a continuous probability distribution for a non-negative random variable. It is the probability distribution of a random variable whose logarithm has a logistic distribution.

The probability density function is given by:

f(x)=(α/γ)(x/γ)α1(1+(x/γ)α)2f(x) = \frac{(\alpha/\gamma)(x/\gamma)^{\alpha-1}}{(1+(x/\gamma)^\alpha)^2}

for x0x \geq 0, where α\alpha is the shape parameter and γ\gamma is the scale parameter.

The cumulative distribution function is:

F(x)=11+(x/γ)αF(x) = \frac{1}{1+(x/\gamma)^{-\alpha}}

Value

dloglogistic gives the density, ploglogistic gives the distribution function, qloglogistic gives the quantile function, and rloglogistic generates random deviates.

The length of the result is determined by n for rloglogistic, and is the maximum of the lengths of the numerical arguments for the other functions.

Examples

# Density at x = 1 with shape = 2 and scale = 1
dloglogistic(1, shape = 2, scale = 1)

# CDF at x = 1
ploglogistic(1, shape = 2, scale = 1)

# Quantile for p = 0.5 (median)
qloglogistic(0.5, shape = 2, scale = 1)

# Generate 10 random values
rloglogistic(10, shape = 2, scale = 1)

Linear predictors

Description

Function to construct linear predictors.

Usage

lp(formula, coefs, data, ...)

Arguments

formula

formula specifying the linear predictors.

coefs

vector of regression coefficients.

data

data frame containing the covariates used to construct the linear predictors.

...

further arguments passed to other methods.

Value

a vector containing the linear predictors.

Examples

library(rsurv)
library(dplyr)

n <- 100
coefs <- c(1, 0.7, 2.3)

simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("male", "female"), size = n, replace = TRUE)
) %>%
  mutate(
    lp = lp(~age+sex, coefs)
  )
glimpse(simdata)

Implemented link functions for the promotion time cure rate model with negative binomial distribution

Description

This function is used to specify different link functions for the count component of the promotion time cure rate model.

Usage

negbin(zeta = stop("'theta' must be specified"), link = "log")

Arguments

zeta

The known value of the additional parameter.

link

desired link function; currently implemented links are: log, identity and sqrt.

Value

A list containing the codes associated with the count distribution assumed for the latent variable N and the chosen link.


Random generation from accelerated failure time models

Description

Function to generate a random sample of survival data from accelerated failure time models.

Usage

raftreg(
  u,
  formula,
  baseline,
  beta,
  dist = NULL,
  package = NULL,
  lwr = 0,
  upr = Inf,
  data,
  ...
)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

lwr

left-truncation time (default to 0 in the absence of left-truncation).

upr

right-truncation time (default to Inf in the absence of right-truncation).

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
set.seed(123)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = raftreg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from accelerated hazard models

Description

Function to generate a random sample of survival data from accelerated hazard models.

Usage

rahreg(
  u,
  formula,
  baseline,
  beta,
  dist = NULL,
  package = NULL,
  lwr = 0,
  upr = Inf,
  data,
  ...
)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

lwr

left-truncation time (default to 0 in the absence of left-truncation).

upr

right-truncation time (default to Inf in the absence of right-truncation).

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
set.seed(123)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rahreg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

The Rayleigh Distribution

Description

Density, distribution function, quantile function and random generation for the Rayleigh distribution with scale parameter sigma.

Usage

drayleigh(x, sigma, log = FALSE)

prayleigh(q, sigma, lower.tail = TRUE, log.p = FALSE)

qrayleigh(p, sigma, lower.tail = TRUE, log.p = FALSE)

rrayleigh(n, sigma)

Arguments

x, q

vector of quantiles.

sigma

scale parameter. Must be positive.

log, log.p

logical; if TRUE, probabilities/densities p are returned as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length is taken to be the number required.

Details

The Rayleigh distribution is a continuous probability distribution for non-negative random variables. It arises as the distribution of the magnitude of a two-dimensional vector whose components are independent, identically distributed Gaussian random variables with zero mean.

The probability density function is given by:

f(x)=xσ2exp(x22σ2)f(x) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right)

for x0x \geq 0 and σ>0\sigma > 0.

The cumulative distribution function is:

F(x)=1exp(x22σ2)F(x) = 1 - \exp\left(-\frac{x^2}{2\sigma^2}\right)

Value

drayleigh gives the density, prayleigh gives the distribution function, qrayleigh gives the quantile function, and rrayleigh generates random deviates.

The length of the result is determined by n for rrayleigh, and is the maximum of the lengths of the numerical arguments for the other functions.

References

Rayleigh, Lord (1880). On the resultant of a large number of vibrations of the same pitch and of arbitrary phase. Philosophical Magazine, 10(60), 73-78.

Examples

# Density at x = 1 with sigma = 1
drayleigh(1, sigma = 1)

# CDF at x = 1
prayleigh(1, sigma = 1)

# Quantile for p = 0.5 (median)
qrayleigh(0.5, sigma = 1)

# Generate 10 random values
rrayleigh(10, sigma = 1)

Random generation from extended hazard models

Description

Function to generate a random sample of survival data from extended hazard models.

Usage

rehreg(
  u,
  formula,
  baseline,
  beta,
  phi,
  dist = NULL,
  package = NULL,
  lwr = 0,
  upr = Inf,
  data,
  ...
)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

phi

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

lwr

left-truncation time (default to 0 in the absence of left-truncation).

upr

right-truncation time (default to Inf in the absence of right-truncation).

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
set.seed(123)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rehreg(runif(n), ~ age+sex, beta = c(1, 2), phi = c(-1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Frailties random generation

Description

The frailty function for adding a simple random effects term to the linear predictor of a given survival regression model.

Usage

rfrailty(
  cluster,
  frailty = c("gamma", "gaussian", "ps"),
  sigma = 1,
  alpha = NULL,
  ...
)

Arguments

cluster

a vector determining the grouping of subjects (always converted to a factor object internally.

frailty

the frailty distribution; current implementation includes the gamma (default), lognormal and positive stable (ps) distributions.

sigma

standard deviation assumed for the frailty distribution; sigma = 1 by default; this value is ignored for positive stable (ps) distribution.

alpha

stability parameter of the positive stable distribution; alpha must lie in (0,1) interval and an NA is return otherwise.

...

further arguments passed to other methods.

Value

a vector with the generated frailties.


Random generation of type I and type II interval censored survival data

Description

Function to generate a random sample of type I and type II interval censored survival data.

Usage

rinterval(time, tau, type = c("I", "II"), prob)

Arguments

time

a numeric vector of survival times.

tau

either a vector of censoring times (for type I interval-censored survival data) or time grid of scheduled visits (for type II interval censored survival data).

type

type of interval-censored survival data (I or II).

prob

= 0.5 attendance probability of scheduled visit; ignored when type = I.

Value

a data.frame containing the generated random sample.


Random generation from proportional hazards models

Description

Function to generate a random sample of survival data from proportional hazards models.

Usage

rphreg(
  u,
  formula,
  baseline,
  beta,
  dist = NULL,
  package = NULL,
  lwr = 0,
  upr = Inf,
  data,
  ...
)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

lwr

left-truncation time (default to 0 in the absence of left-truncation).

upr

right-truncation time (default to Inf in the absence of right-truncation).

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
set.seed(123)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rphreg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from proportional odds models

Description

Function to generate a random sample of survival data from proportional odds models.

Usage

rporeg(
  u,
  formula,
  baseline,
  beta,
  dist = NULL,
  package = NULL,
  lwr = 0,
  upr = Inf,
  data,
  ...
)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

lwr

left-truncation time (default to 0 in the absence of left-truncation).

upr

right-truncation time (default to Inf in the absence of right-truncation).

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
set.seed(123)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rporeg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from Yang and Prentice models

Description

Function to generate a random sample of survival data from Yang and Prentice models.

Usage

rypreg(
  u,
  formula,
  baseline,
  beta,
  phi,
  dist = NULL,
  package = NULL,
  lwr = 0,
  upr = Inf,
  data,
  ...
)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of short-term regression coefficients.

phi

vector of long-term regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

lwr

left-truncation time (default to 0 in the absence of left-truncation).

upr

right-truncation time (default to Inf in the absence of right-truncation).

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
set.seed(123)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rypreg(runif(n), ~ age+sex, beta = c(1, 2), phi = c(-1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)