Concept

Survival Analysis and the Cox Model

Survival analysis studies time-to-event data when some observations are censored, meaning the event is not seen within the study window. The Kaplan-Meier product-limit estimator gives a nonparametric survival curve, the Nelson-Aalen estimator gives the cumulative hazard, and the Cox proportional-hazards model regresses the hazard on covariates through a partial likelihood that does not require specifying the baseline hazard.

Censoring And The Survival And Hazard Functions

Right-censoring occurs when a subject is observed for a while and the event of interest has not yet happened by the end of observation, so the recorded time is a lower bound on the true event time. Survival analysis is the set of methods that use censored observations without discarding them. Klein and Moeschberger, Survival Analysis, 2nd ed., Ch. 3 to 4 is the standard reference for the actuarial and biostatistical treatment.

The survival function gives the probability the event occurs after time t. The hazard function is the instantaneous event rate given survival to t, the same object actuaries call the force of mortality. The cumulative hazard is the integral of the hazard, and survival equals the exponential of the negative cumulative hazard.

Survival, hazard, cumulative hazard
S(t)=P(T>t),h(t)=f(t)S(t),S(t)=e0th(u)duS(t)=P(T>t),\qquad h(t)=\frac{f(t)}{S(t)},\qquad S(t)=e^{-\int_{0}^{t} h(u)\,du}

Kaplan-Meier And Nelson-Aalen Estimators

The Kaplan-Meier product-limit estimator multiplies, over each observed event time, the conditional probability of surviving that instant given survival up to it. At an event time with d deaths among n at risk, that conditional factor is one minus d over n. Censored observations leave the curve flat but reduce the risk set for later times.

The Nelson-Aalen estimator sums d over n across event times to estimate the cumulative hazard, and is the preferred hazard estimator in small samples. Greenwood's formula supplies the variance of the Kaplan-Meier estimator for confidence bands. Kaplan and Meier introduced the product-limit estimator in 1958; Nelson and Aalen developed the cumulative-hazard estimator in the early 1970s.

Kaplan-Meier product-limit estimator
S^(t)=tit(1dini)\hat{S}(t)=\prod_{t_i\le t}\left(1-\frac{d_i}{n_i}\right)
Nelson-Aalen cumulative hazard
H^(t)=titdini\hat{H}(t)=\sum_{t_i\le t}\frac{d_i}{n_i}

The Cox Proportional-Hazards Model

The Cox model writes the hazard for a subject with covariate vector z as a common baseline hazard multiplied by the exponential of a linear predictor. The proportional-hazards assumption is that the ratio of hazards between two subjects is constant in time. The model is semiparametric because the baseline hazard is left unspecified.

Cox proposed estimating the coefficients by maximizing a partial likelihood that conditions on the ordered event times and so cancels the baseline hazard. The exponentiated coefficient is a hazard ratio, the multiplicative effect of a one-unit change in a covariate. Cox introduced the model and the partial likelihood in 1972.

Cox proportional-hazards model
h(tz)=h0(t)eβzh(t\mid z)=h_0(t)\,e^{\beta^{\top} z}
Partial likelihood
L(β)=i:eventeβzijR(ti)eβzjL(\beta)=\prod_{i:\,\text{event}}\frac{e^{\beta^{\top} z_i}}{\sum_{j\in R(t_i)} e^{\beta^{\top} z_j}}

Worked Example: Kaplan-Meier On Six Lives

Six lives are observed. Event times in months are 3, 5, and 8. A plus sign marks a censored time: the data are 3, 5, 5 plus, 8, 8 plus, and 10 plus. At month 3, six are at risk and one event occurs, so the factor is 1 minus 1 over 6, which is 5 over 6, about 0.833.

At month 5, five are at risk and one event occurs, factor 1 minus 1 over 5, so the survival estimate becomes 0.833 times 0.8, about 0.667; the 5 plus censored life then leaves the risk set. At month 8, three are at risk and one event occurs, factor 1 minus 1 over 3, so survival becomes 0.667 times two-thirds, about 0.444. The estimate stays at 0.444 after month 8 because the remaining observations are censored, which is why a product-limit curve can end above zero.

Stepwise Kaplan-Meier estimate
S^(3)=560.833,S^(5)0.667,S^(8)0.444\hat{S}(3)=\tfrac{5}{6}\approx 0.833,\quad \hat{S}(5)\approx 0.667,\quad \hat{S}(8)\approx 0.444

References and official sources