Geometric and Negative Binomial Distributions
The geometric distribution counts failures before the first success; the negative binomial counts failures before the r-th success. The SOA Loss Models parameterization (r, β) is convenient for the gamma-mixed-Poisson identity, which is the cleanest route into overdispersed count models.
- Role
- Concept
- Level
- Core
- Time
- Reference
- Freshness
- Stable
Two Parameterizations To Know
Probability courses usually introduce the geometric and negative binomial in their classical forms: X is the trial on which the first (or r-th) success occurs. The SOA Loss Models tables use a re-parameterized form in which N counts the number of failures before the r-th success, and the success probability is rewritten as 1 / (1 + β).
The two forms are equivalent but produce different PMFs. Always confirm which form a problem expects. Klugman Loss Models uses (r, β) throughout the (a, b, 0) class material, so FAM and ASTAM expect that form. Exam P often uses the classical (p, r) form.
Geometric Distribution
The geometric is the simplest waiting-time count: how many trials until the first success, or in the SOA form, how many failures occur before the first success. It is the discrete analogue of the exponential distribution and inherits a memoryless property.
In the classical form, the mean is 1 / p and the variance is (1 - p) / p^2. In the SOA form, the mean is β and the variance is β(1 + β). Note that variance exceeds the mean whenever β > 0, which is what makes the geometric, and more generally the negative binomial, useful for overdispersed counts.
Negative Binomial Distribution
The negative binomial counts the number of failures before the r-th success. In the SOA form, the parameter r need not be an integer; non-integer r is allowed and matters in actuarial practice because the gamma-mixed-Poisson identity produces non-integer r.
The negative binomial nests the geometric as r = 1. As r grows large with rβ held fixed, the negative binomial converges to Poisson; this is the limit of vanishing overdispersion.
The Gamma-Mixed-Poisson Identity
If the count N given a latent rate Λ is Poisson with mean Λ, and Λ itself is gamma-distributed with shape α and scale θ, then the unconditional distribution of N is negative binomial with r = α and β = θ.
This is the cleanest derivation of overdispersion in count data. Heterogeneity in the underlying claim rate across policies, where Λ has a gamma distribution, produces an unconditional NegBin even though every individual policy follows a Poisson. It is also why credibility-style mixed-effects models for counts so often default to the negative binomial.
Maximum Likelihood Estimation
For the negative binomial with r known and β unknown, the MLE for β is the sample mean divided by r. This is rare in practice; usually both r and β must be estimated.
With both parameters free, the likelihood equations have no closed form. Numerical optimization or the method of moments is used in practice. On ASTAM, the method of moments solution sets the sample mean equal to rβ and the sample variance equal to rβ(1 + β), which yields β̂ = (s^2 − x̄) / x̄ and r̂ = x̄ / β̂ when s^2 > x̄. If s^2 ≤ x̄ the negative binomial is the wrong family and Poisson should be considered instead.
Worked Example: Overdispersed Claim Counts
A portfolio reports 200 policy-years with sample mean of 0.45 claims and sample variance of 0.78. Because the variance exceeds the mean, a Poisson fit is inadequate. The method of moments for negative binomial gives β̂ = (0.78 − 0.45) / 0.45 = 0.733 and r̂ = 0.45 / 0.733 ≈ 0.614.
The fitted model has non-integer r̂, which is fine; in the gamma-mixed-Poisson interpretation this says the latent rate is gamma-distributed with shape 0.614 and scale 0.733, so the underlying rate is highly heterogeneous across policies.
Worked Example: Probability Of Zero Claims
Under NegBin(r, β), the probability of zero claims is (1 + β)^{-r}. With r = 2 and β = 0.4, P(N = 0) = (1.4)^{-2} ≈ 0.510. Under Poisson with mean rβ = 0.8 this would be e^{-0.8} ≈ 0.449.
The two models agree on the mean and disagree on the tail, which is the whole point. ASTAM goodness-of-fit testing exists to choose between the two when the data has been observed.