Chi-Squared Goodness-of-Fit Test
The Pearson chi-squared goodness-of-fit test compares observed counts in grouped cells against counts predicted by a fitted distribution. It is the simplest test to set up, the easiest to misuse on small expected cells, and the test whose degrees of freedom must be adjusted for estimated parameters.
- Role
- Concept
- Level
- Core
- Time
- Reference
- Freshness
- Stable
Statistic And Null Distribution
Group the data into k mutually exclusive bins with observed counts O_1, ..., O_k summing to n. Under the null hypothesis that the data come from a specified distribution F with probability p_i for bin i, the expected count is E_i = n × p_i.
The Pearson chi-squared statistic compares observed and expected. Under the null and with large enough expected counts, Q follows a chi-squared distribution. The degrees of freedom depend on whether parameters were estimated from the data.
Degrees-of-Freedom Adjustment
If the null distribution is fully specified (every parameter pre-set), then df = k − 1. If r parameters were estimated from the same data being tested, then df = k − 1 − r. This adjustment matters because parameter estimation reduces the available statistical freedom of the fit.
Failing to subtract r is the most common ASTAM error on this test. A Pareto fit with two estimated parameters and 10 bins has df = 7, not 9. Using df = 9 would understate the p-value and incorrectly fail to reject a poor fit.
Expected-Cell-Count Rule
The chi-squared approximation is valid only when every expected cell count is large enough. The standard rule of thumb is E_i ≥ 5 in every cell. Some texts allow E_i ≥ 1 in some cells provided no more than 20% of cells fall below 5.
If a cell has too small an expected count, merge it with an adjacent cell. Merging reduces k, which reduces df, so the test loses some power but stays valid. The alternative is to use Kolmogorov-Smirnov or Anderson-Darling, which do not require binning.
Decision Rule
Reject the null distribution at significance α when Q exceeds the (1 − α) quantile of the chi-squared distribution with df degrees of freedom. Equivalently, report the p-value as 1 − F_{χ²_{df}}(Q) and reject when p < α.
Large Q means observed counts deviate from expected by more than chance would produce. Small Q is consistent with the null and does not prove the null is correct — it only fails to reject it. This is the standard frequentist asymmetry and matters when describing a fit as adequate.
Worked Example: Poisson Fit With Estimated Mean
An auto portfolio reports 100 policy-years with the following claim counts: 0 claims in 60 years, 1 claim in 25 years, 2 claims in 10 years, 3+ claims in 5 years. Sample mean is 0.65. Fitted Poisson probabilities at λ̂ = 0.65 are 0.522, 0.339, 0.110, 0.029 (after pooling 3+).
Expected counts at n = 100 are 52.2, 33.9, 11.0, 2.9. The last cell falls below 5, so merge bins 2 and 3+ to get observed counts 60, 25, 15 and expected 52.2, 33.9, 13.9. Pearson Q = (60−52.2)^2/52.2 + (25−33.9)^2/33.9 + (15−13.9)^2/13.9 = 1.166 + 2.336 + 0.087 = 3.59. df = 3 − 1 − 1 = 1 (because λ was estimated). 95th percentile of χ²_1 is 3.84, so do not reject the Poisson fit at α = 0.05.
Worked Example: Severity Fit With Two Estimated Parameters
Group 200 claim amounts into 8 size bins. Fitted lognormal predicts expected counts (all ≥ 5 after one merge), so k = 7 after merging. Two parameters (μ, σ) were estimated, so df = 7 − 1 − 2 = 4.
Suppose Q = 11.2. The 95th percentile of χ²_4 is 9.49, so reject the lognormal fit at α = 0.05. The 99th percentile is 13.28, so the rejection holds at α = 0.05 but not at α = 0.01.
When Chi-Squared Is The Wrong Test
Chi-squared throws away information by grouping continuous data into bins. If the data are continuous and not naturally grouped, Kolmogorov-Smirnov or Anderson-Darling extract more signal. Chi-squared also gives equal weight to deviations in any cell, which means body deviations and tail deviations contribute equally — for actuarial severity fits, that is often the wrong trade-off.
Choose chi-squared when the data are naturally categorical or already grouped (claim-count cells, age buckets) and when no continuous test is available. Choose K-S or A-D for un-binned continuous data; see /concepts/kolmogorov-smirnov-anderson-darling/.