Concept

Generalized Linear Models in Actuarial Practice

GLMs extend ordinary linear regression to non-Gaussian response distributions in the exponential family. In actuarial work, the three standard cases are Poisson with a log link for claim frequency, Gamma (or Lognormal) with a log link for severity, and Tweedie compound Poisson-Gamma with 1 < p < 2 for pure premium directly. Estimation is by iteratively reweighted least squares, inference uses deviance and quasi-deviance, and diagnostics rely on deviance residuals and Cook-style influence measures. The framework underlies modern ratemaking and most of the predictive-analytics syllabus on PA, ATPA, and ASTAM.

Page Contract
Role
Concept
Level
Advanced
Time
Reference
Freshness
Stable
Search Intent
generalized linear models actuarial

Plain-English Definition

Ordinary linear regression assumes a normal response and an identity link. GLMs relax both. The response distribution can be any member of the exponential family, the mean of the response is connected to the linear predictor through a link function g, and estimation is by maximum likelihood (or quasi-likelihood when dispersion is unknown).

The three pieces of a GLM are the random component (response distribution), the systematic component (linear predictor X beta), and the link function (the function g that ties them together by g(mu) = X beta). Choosing all three turns a GLM into a specific actuarial model.

GLMs are the standard ratemaking tool because the random component handles the actual distribution of insurance data (counts for frequency, positive skewed continuous for severity, point-mass-at-zero plus positive continuous for pure premium), the link function gives multiplicative rating relativities under a log link, and the framework supports straightforward inference and diagnostics.

The Exponential Family

A distribution is in the exponential family in canonical form if its density can be written as f(y; theta, phi) = exp((y theta - b(theta)) / a(phi) + c(y, phi)). Here theta is the natural parameter, phi is the dispersion parameter, and b, a, c are known functions.

Two cumulant identities follow from this form. The mean is E[Y] = b'(theta) = mu and the variance is Var[Y] = b''(theta) a(phi). The function V(mu) = b''(theta(mu)) is the variance function, and it characterizes the family up to scale.

Common cases: Normal with theta = mu, b(theta) = theta^2/2, V(mu) = 1; Poisson with theta = log mu, b(theta) = exp(theta), V(mu) = mu; Gamma with theta = -1/mu, b(theta) = -log(-theta), V(mu) = mu^2; Tweedie with V(mu) = mu^p for p in (1, 2) compound Poisson-Gamma, and other values of p for other family members.

Exponential-family density
f(y;θ,ϕ)=exp(yθb(θ)a(ϕ)+c(y,ϕ))f(y; \theta, \phi) = \exp\Bigl(\frac{y \theta - b(\theta)}{a(\phi)} + c(y, \phi)\Bigr)
Mean and variance
E[Y]=b(θ)=μ,Var(Y)=b(θ)a(ϕ)=V(μ)a(ϕ)E[Y] = b'(\theta) = \mu, \quad \operatorname{Var}(Y) = b''(\theta)\, a(\phi) = V(\mu)\, a(\phi)
Variance functions
VNormal=1,  VPoisson=μ,  VGamma=μ2,  VTweedie=μpV_{\text{Normal}} = 1, \; V_{\text{Poisson}} = \mu, \; V_{\text{Gamma}} = \mu^2, \; V_{\text{Tweedie}} = \mu^p

Link Functions and the Linear Predictor

The linear predictor is eta = X beta. The link function g connects eta to the mean by g(mu) = eta, equivalently mu = g^{-1}(eta). Common choices: identity link for Gaussian models, log link for Poisson and Gamma, logit link for logistic regression (Bernoulli random component), and inverse link for some Gamma applications.

The canonical link is the function such that theta = eta. For Poisson the canonical link is log; for Gamma it is the reciprocal; for Normal it is identity. The canonical link gives convenient mathematical properties (sufficient statistic X^T y, orthogonality of the score, simpler Newton-Raphson updates), but it is not always the best modeling choice.

Actuarial work almost universally uses the log link for Poisson frequency and Gamma severity GLMs because the log link gives multiplicative rating relativities (exp of a coefficient is the rate ratio), which is what regulators and rating manuals expect.

Linear predictor and link
ηi=Xiβ,g(μi)=ηi    μi=g1(ηi)\eta_i = X_i^{\top} \beta, \quad g(\mu_i) = \eta_i \iff \mu_i = g^{-1}(\eta_i)
Log link gives multiplicative relativities
logμi=β0+β1xi,1+    μ(x1+1)μ(x1)=eβ1\log \mu_i = \beta_0 + \beta_1 x_{i,1} + \dots \implies \frac{\mu(x_1+1)}{\mu(x_1)} = e^{\beta_1}

IRLS Estimation

GLM coefficients are estimated by maximum likelihood. The Newton-Raphson algorithm applied to the GLM log-likelihood reduces to iteratively reweighted least squares (IRLS): each iteration constructs a working response z_i and a working weight w_i, then runs an ordinary weighted least-squares regression of z on X to update beta.

The working response at iteration t is z_i^{(t)} = eta_i^{(t)} + (y_i - mu_i^{(t)}) g'(mu_i^{(t)}). The working weight is w_i^{(t)} = 1 / (g'(mu_i^{(t)})^2 V(mu_i^{(t)})).

IRLS converges quickly (often within five to ten iterations) for well-specified actuarial GLMs because the log-likelihood is concave under the canonical link and remains well-behaved for the non-canonical log link on Gamma. Standard implementations (R glm(), Python statsmodels, SAS GENMOD) all use IRLS.

IRLS working response
zi=ηi+(yiμi)g(μi)z_i = \eta_i + (y_i - \mu_i)\, g'(\mu_i)
IRLS working weight
wi=1g(μi)2V(μi)w_i = \frac{1}{g'(\mu_i)^2\, V(\mu_i)}
WLS update
β^(t+1)=(XW(t)X)1XW(t)z(t)\widehat\beta^{(t+1)} = (X^{\top} W^{(t)} X)^{-1} X^{\top} W^{(t)} z^{(t)}

Frequency Models: Poisson with Log Link

Claim counts per policy-year are typically modeled as Poisson with a log link and an offset for exposure: log mu_i = log E_i + X_i^T beta. The offset turns the Poisson rate per unit exposure into a Poisson mean per policy.

Over-dispersion (Var > mean) is common in real frequency data because of heterogeneity in policyholder risk that is not captured by the covariates. The fix is either a quasi-Poisson model (Var = phi mu, with phi estimated) or a Negative Binomial GLM, which adds a gamma-distributed random effect to the Poisson rate.

Exam treatment usually starts with quasi-Poisson. The dispersion phi is estimated from the Pearson chi-square divided by residual degrees of freedom, and Wald standard errors are inflated by sqrt(phi). Negative Binomial appears as the next step when phi differs from 1 by a wide margin.

Poisson GLM with exposure offset
YiPoisson(μi),logμi=logEi+XiβY_i \sim \operatorname{Poisson}(\mu_i), \quad \log \mu_i = \log E_i + X_i^{\top} \beta
Quasi-Poisson dispersion
ϕ^=1npi=1n(Yiμ^i)2μ^i\widehat\phi = \frac{1}{n-p} \sum_{i=1}^{n} \frac{(Y_i - \widehat\mu_i)^2}{\widehat\mu_i}

Severity Models: Gamma and Lognormal

Claim sizes (severity) are typically modeled as Gamma with a log link, with one record per claim. The Gamma family handles positive, right-skewed continuous data well. The log link gives multiplicative relativities, matching the rating-factor format used in tariffs.

An alternative is a Lognormal model, which corresponds to a Gaussian GLM on log Y with identity link. The Lognormal mean on the original scale is exp(mu + sigma^2/2), which means the linear predictor must be back-transformed with the bias correction. Forgetting this correction underestimates the mean.

Choosing between Gamma and Lognormal: Gamma fits well when the coefficient of variation is constant across covariate levels (Var proportional to mean squared). Lognormal fits well when the log-scale variance is constant. In practice both are tried and chosen on AIC or deviance grounds.

Gamma GLM (log link)
YiGamma(α,βi),logμi=Xiβ,V(μ)=μ2Y_i \sim \operatorname{Gamma}(\alpha, \beta_i), \quad \log \mu_i = X_i^{\top} \beta, \quad V(\mu) = \mu^2
Lognormal mean back-transform
E[Y]=exp(μ+σ2/2)E[Y] = \exp(\mu + \sigma^2 / 2)

Pure Premium: Tweedie Compound Poisson-Gamma

Pure premium per exposure is the product of frequency and severity. Modeling it directly with a single GLM avoids the two-step frequency-times-severity workflow and gives a single likelihood for inference. The Tweedie compound Poisson-Gamma family is the standard choice: Y is a Poisson sum of iid Gamma claim sizes, with V(mu) = mu^p for some p in (1, 2).

The Tweedie family handles the point mass at zero (no claims) and the continuous positive distribution (positive claim totals) in one model. The parameter p indexes the shape: p close to 1 is mostly Poisson-like, p close to 2 is mostly Gamma-like. Real auto-liability data is often near p = 1.5.

Estimation uses IRLS with the Tweedie variance function. The likelihood at exactly zero requires special handling (the density at zero involves an infinite series), but standard software (Python tweedie package, R statmod and glmnet, SAS GENMOD with DIST=TWEEDIE) handles it transparently.

Tweedie compound Poisson-Gamma
Y=j=1NXj,NPoisson,  XjGamma,  V(μ)=μp,  1<p<2Y = \sum_{j=1}^{N} X_j, \quad N \sim \operatorname{Poisson}, \; X_j \sim \operatorname{Gamma}, \; V(\mu) = \mu^p, \; 1 < p < 2
Tweedie GLM mean
logμi=logEi+Xiβ,E[Yi]=EieXiβ\log \mu_i = \log E_i + X_i^{\top} \beta, \quad E[Y_i] = E_i e^{X_i^{\top} \beta}

Deviance and Residual Diagnostics

Deviance is the GLM analogue of the residual sum of squares. It is D = 2 (l_sat - l_fit), where l_sat is the log-likelihood of the saturated model (one parameter per observation) and l_fit is the log-likelihood of the fitted model. For Poisson, D = 2 sum (y log(y / mu_hat) - (y - mu_hat)). For Gamma, D = 2 sum ((y - mu_hat) / mu_hat - log(y / mu_hat)).

Two GLMs nested under chi-square (or quasi-likelihood F under unknown dispersion) test for inclusion of covariates. The likelihood-ratio test statistic is the difference in deviances; under the null, it is asymptotically chi-square with df equal to the parameter difference.

Deviance residuals d_i = sign(y_i - mu_hat_i) sqrt(d_i^*), where d_i^* is the unit deviance contribution, are the GLM analogue of standardized residuals. Plot them against fitted values, against linear-predictor levels, and against omitted candidate covariates to surface model misfit.

Poisson deviance
DPoisson=2i(yilogyiμ^i(yiμ^i))D_{\text{Poisson}} = 2 \sum_{i} \Bigl(y_i \log \frac{y_i}{\widehat\mu_i} - (y_i - \widehat\mu_i)\Bigr)
Gamma deviance
DGamma=2i(yiμ^iμ^ilogyiμ^i)D_{\text{Gamma}} = 2 \sum_{i} \Bigl(\frac{y_i - \widehat\mu_i}{\widehat\mu_i} - \log \frac{y_i}{\widehat\mu_i}\Bigr)
Deviance residual
di=sign(yiμ^i)did_i = \operatorname{sign}(y_i - \widehat\mu_i) \sqrt{d_i^{*}}

Worked Example: Frequency GLM on a Small Portfolio

Fit a Poisson GLM with log link to a 5000-policy auto portfolio. Response is claim count per policy-year; exposure is policy-year. Covariates: vehicle age (continuous), driver age band (categorical with three levels), territory (categorical with two levels).

Coefficient interpretation under the log link is multiplicative. If the fitted coefficient on vehicle age is -0.04, then each additional year of vehicle age multiplies the claim rate by exp(-0.04) ≈ 0.961, a 3.9 percent reduction. If the coefficient on the urban-territory indicator is +0.30, urban policies have rate exp(0.30) ≈ 1.35 times the rural rate.

Dispersion check: compute Pearson chi-square divided by residual df. A value near 1 confirms Poisson; a value near 2 to 4 is moderate over-dispersion and a quasi-Poisson treatment is appropriate. Above 4, switch to Negative Binomial or revisit the linear predictor for omitted heterogeneity.

Diagnostic plot: deviance residuals against vehicle age. A funnel shape that widens with vehicle age is a signal that the variance function does not match the data (in which case Negative Binomial or a different link helps), or that an interaction with another covariate is missing.

Worked Example: Tweedie Pure-Premium GLM

Fit a Tweedie GLM with log link and p = 1.5 (a common starting point for auto-liability pure premium) to the same 5000-policy portfolio. Response is annualized loss per policy; exposure is policy-year (as offset).

Tweedie handles the zero observations (about 90 percent of policies report no loss in a year) and the positive continuous observations in one fit. The estimated coefficients are directly the pure-premium relativities used for rating.

Validation: compute the Tweedie deviance on a held-out fold and compare to a Frequency × Severity two-step model. Tweedie is competitive when the joint frequency-severity surface is well-approximated by a single linear predictor; the two-step model wins when frequency and severity respond differently to covariates.

Tariff Structure and Ratemaking Applications

Personal-lines ratemaking historically used one-way and minimum-bias methods. GLMs replaced both in modern practice because they handle correlated covariates, give standard errors, and produce diagnostic residuals.

A typical tariff workflow fits a frequency GLM and a severity GLM with a log link to historical experience, combines them into a pure-premium relativity grid, applies expense and profit loadings, and then runs a regulatory filing with supporting deviance and lift-curve exhibits. ATPA and the legacy CAS Exam 5 cover this workflow end-to-end.

Constraints arise. Regulators may forbid certain covariates (e.g. some jurisdictions limit credit-score or gender), require monotone relativities on certain factors, or impose maximum and minimum relativity caps. These constraints can be enforced inside the GLM via penalized estimation, offsets, or post-hoc smoothing.

Common Misconceptions

GLMs are not the same as ordinary linear regression. The response is non-Gaussian in general, the link function maps the mean to the linear predictor, and the residuals are not iid Gaussian. Inferences that copy linear-regression standard errors are wrong for non-canonical or non-identity links.

The dispersion parameter phi is not always 1. Poisson with phi = 1 is the textbook case, but real frequency data is over-dispersed and a quasi-Poisson or Negative Binomial fit gives the right inference.

Coefficient sign and magnitude under a log link describe multiplicative effects, not additive. A coefficient of 0.5 means a 1.65x rate ratio, not a 0.5 rate change. This is a common misread in client-facing summaries.

Deviance residuals are not the same as Pearson residuals. Each has its own diagnostic strength. Deviance residuals are closer to symmetric for non-Gaussian fits; Pearson residuals are easier to interpret as standardized errors but can be skewed for Gamma or Tweedie.

Tweedie p is not identifiable from data with too few observations or weak coverage in the moderate-loss range. Estimating p alongside beta requires a profile likelihood or a separate likelihood maximization and is unstable on small portfolios. Standard practice is to fix p from prior experience (1.5 is a common default for auto liability) and check sensitivity post-fit.

Cross-Exam Map

GLMs sit at the intersection of traditional actuarial mathematics, statistics, and machine learning, which is why they appear on multiple exams and across the FSA tracks.

  • PA (SOA Predictive Analytics): full GLM workflow on a real dataset, including model selection, regularization, and communication. Closed-book project-style exam.
  • ATPA (SOA Advanced Topics in Predictive Analytics): deeper GLM theory and extensions (penalized regression, GAMs, generalized boosted models), plus the documented modeling write-up.
  • ASTAM: GLMs as one tool among several for severity and aggregate-loss modeling. Less depth than PA on GLM mechanics; more depth on the connection to compound distributions and risk measures.
  • GI-301 (SOA FSA General Insurance track 301): ratemaking with GLMs at production-relevant complexity, including territorial smoothing and tariff structure.
  • CAS Exams 5 and 6 (US, Canada, International): GLMs in personal-lines and commercial ratemaking, primarily in the context of regulatory filings and loss-development integration.

Textbook Citations and Further Reading

de Jong and Heller 2008: Piet de Jong and Gillian Z. Heller, Generalized Linear Models for Insurance Data, Cambridge University Press. The standard actuarial GLM textbook. Chapters 4 through 7 cover Poisson, Gamma, and Tweedie applications with real datasets.

McCullagh and Nelder 1989: Peter McCullagh and John Nelder, Generalized Linear Models, 2nd edition, Chapman and Hall / CRC. The foundational theory text. Chapter 2 (exponential family) and Chapter 6 (binary data) are the canonical references.

Klugman, Panjer, and Willmot 2019: Loss Models: From Data to Decisions, 5th edition, Wiley. Chapter 20 covers GLM-style models in the actuarial register used on ASTAM.

Frees 2009: Edward Frees, Regression Modeling with Actuarial and Financial Applications, Cambridge University Press. Covers GLMs alongside other regression methods in an actuarial framing; chapters 11 and 12 are the GLM core.

References And Official Sources