Concept

Chi-Squared Goodness-of-Fit Test

The Pearson chi-squared goodness-of-fit test compares observed counts in grouped cells against counts predicted by a fitted distribution. It is the simplest test to set up, the easiest to misuse on small expected cells, and the test whose degrees of freedom must be adjusted for estimated parameters.

Page Contract
Role
Concept
Level
Core
Time
Reference
Freshness
Stable
Search Intent
chi-squared goodness-of-fit

Statistic And Null Distribution

Group the data into k mutually exclusive bins with observed counts O_1, ..., O_k summing to n. Under the null hypothesis that the data come from a specified distribution F with probability p_i for bin i, the expected count is E_i = n × p_i.

The Pearson chi-squared statistic compares observed and expected. Under the null and with large enough expected counts, Q follows a chi-squared distribution. The degrees of freedom depend on whether parameters were estimated from the data.

Pearson statistic
Q=i=1k(OiEi)2EiQ=\sum_{i=1}^{k}\frac{(O_i-E_i)^{2}}{E_i}
Degrees of freedom
df=k1r,r=number of parameters estimated from the data\mathrm{df}=k-1-r,\quad r=\text{number of parameters estimated from the data}

Degrees-of-Freedom Adjustment

If the null distribution is fully specified (every parameter pre-set), then df = k − 1. If r parameters were estimated from the same data being tested, then df = k − 1 − r. This adjustment matters because parameter estimation reduces the available statistical freedom of the fit.

Failing to subtract r is the most common ASTAM error on this test. A Pareto fit with two estimated parameters and 10 bins has df = 7, not 9. Using df = 9 would understate the p-value and incorrectly fail to reject a poor fit.

Expected-Cell-Count Rule

The chi-squared approximation is valid only when every expected cell count is large enough. The standard rule of thumb is E_i ≥ 5 in every cell. Some texts allow E_i ≥ 1 in some cells provided no more than 20% of cells fall below 5.

If a cell has too small an expected count, merge it with an adjacent cell. Merging reduces k, which reduces df, so the test loses some power but stays valid. The alternative is to use Kolmogorov-Smirnov or Anderson-Darling, which do not require binning.

Decision Rule

Reject the null distribution at significance α when Q exceeds the (1 − α) quantile of the chi-squared distribution with df degrees of freedom. Equivalently, report the p-value as 1 − F_{χ²_{df}}(Q) and reject when p < α.

Large Q means observed counts deviate from expected by more than chance would produce. Small Q is consistent with the null and does not prove the null is correct — it only fails to reject it. This is the standard frequentist asymmetry and matters when describing a fit as adequate.

Worked Example: Poisson Fit With Estimated Mean

An auto portfolio reports 100 policy-years with the following claim counts: 0 claims in 60 years, 1 claim in 25 years, 2 claims in 10 years, 3+ claims in 5 years. Sample mean is 0.65. Fitted Poisson probabilities at λ̂ = 0.65 are 0.522, 0.339, 0.110, 0.029 (after pooling 3+).

Expected counts at n = 100 are 52.2, 33.9, 11.0, 2.9. The last cell falls below 5, so merge bins 2 and 3+ to get observed counts 60, 25, 15 and expected 52.2, 33.9, 13.9. Pearson Q = (60−52.2)^2/52.2 + (25−33.9)^2/33.9 + (15−13.9)^2/13.9 = 1.166 + 2.336 + 0.087 = 3.59. df = 3 − 1 − 1 = 1 (because λ was estimated). 95th percentile of χ²_1 is 3.84, so do not reject the Poisson fit at α = 0.05.

Worked Example: Severity Fit With Two Estimated Parameters

Group 200 claim amounts into 8 size bins. Fitted lognormal predicts expected counts (all ≥ 5 after one merge), so k = 7 after merging. Two parameters (μ, σ) were estimated, so df = 7 − 1 − 2 = 4.

Suppose Q = 11.2. The 95th percentile of χ²_4 is 9.49, so reject the lognormal fit at α = 0.05. The 99th percentile is 13.28, so the rejection holds at α = 0.05 but not at α = 0.01.

When Chi-Squared Is The Wrong Test

Chi-squared throws away information by grouping continuous data into bins. If the data are continuous and not naturally grouped, Kolmogorov-Smirnov or Anderson-Darling extract more signal. Chi-squared also gives equal weight to deviations in any cell, which means body deviations and tail deviations contribute equally — for actuarial severity fits, that is often the wrong trade-off.

Choose chi-squared when the data are naturally categorical or already grouped (claim-count cells, age buckets) and when no continuous test is available. Choose K-S or A-D for un-binned continuous data; see /concepts/kolmogorov-smirnov-anderson-darling/.

References And Official Sources