Concept

Maximum Likelihood Estimation

Maximum likelihood estimation chooses the parameter values that make the observed data look most plausible under a model. It is one of the main bridges from actuarial exam statistics into modern inference and machine learning.

Page Contract

Role: Concept
Level: Core
Time: Reference
Freshness: Stable

Search Intent

maximum likelihood estimation

Plain-English Definition

Maximum likelihood estimation chooses model parameters by asking a direct question: for which parameter values would the data we actually observed be most likely to appear?

That makes MLE a fitting procedure, not a guarantee that the model is true. You are selecting the best parameter values inside a model family you already decided to use.

Likelihood

L(\theta\mid x_1,\dots,x_n)=\prod_{i=1}^{n} f(x_i\mid \theta)

Maximum likelihood estimator

\hat\theta_{\mathrm{MLE}}=\arg\max_{\theta} L(\theta\mid x_1,\dots,x_n)

Equivalent log-likelihood form

\hat\theta_{\mathrm{MLE}}=\arg\max_{\theta} \sum_{i=1}^{n} \log f(x_i\mid \theta)

Why Actuaries Use It

Actuaries use MLE whenever a parametric distribution or statistical model has to be fitted to data: claim counts, severity distributions, GLMs, survival-related models, and many estimation problems in SRM and ASTAM-style material.

The reason it matters is not just computation. MLE gives a consistent language for estimation, standard errors, likelihood ratio tests, AIC or BIC, and model comparison.

Worked Example

Suppose monthly claim counts for five similar periods are 2, 1, 3, 0, and 4, and we model them as independent Poisson observations with parameter lambda. The Poisson MLE for lambda is the sample mean.

The sample mean here is (2 + 1 + 3 + 0 + 4) / 5 = 2. So the maximum likelihood estimate is lambda-hat = 2. Interpreted actuarially, the fitted model says the typical count rate is about two claims per period for this segment.

Poisson MLE

\hat\lambda_{\mathrm{MLE}}=\bar X

Common Mistakes

One common mistake is confusing the likelihood with a probability distribution in theta. For fixed observed data, the likelihood is a function of the parameter, not a probability law over parameter values.

Another mistake is maximizing mechanically without checking whether the fitted model itself is sensible. A clean MLE inside a bad model is still a bad modeling decision.

Exam Relevance

SRM uses maximum-likelihood ideas in regression, model comparison, and information criteria. ASTAM uses MLE more directly in parametric estimation for frequency and severity distributions. FAM touches the early parametric-estimation side before the advanced short-term material goes deeper.

ML And Statistics Connection

Many familiar machine-learning losses are negative log-likelihoods in disguise. Logistic regression, Poisson regression, and other GLMs all live naturally inside the MLE framework, which is why likelihood-based thinking travels well between actuarial exams and modern statistical modeling.