Scoring algorithm#Fisher scoring

Scoring algorithm, also known as Fisher's scoring,{{cite journal |first=Nicholas T. |last=Longford |title=A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects |journal=Biometrika |volume=74 |issue=4 |year=1987 |pages=817–827 |doi=10.1093/biomet/74.4.817 }} is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

Let Y_1,\ldots,Y_n be random variables, independent and identically distributed with twice differentiable p.d.f. f(y; \theta), and we wish to calculate the maximum likelihood estimator (M.L.E.) \theta^* of \theta. First, suppose we have a starting point for our algorithm \theta_0, and consider a Taylor expansion of the score function, V(\theta), about \theta_0:

: V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0), \,

where

: \mathcal{J}(\theta_0) = - \sum_{i=1}^n \left. \nabla \nabla^{\top} \right|_{\theta=\theta_0} \log f(Y_i ; \theta)

is the observed information matrix at \theta_0. Now, setting \theta = \theta^*, using that V(\theta^*) = 0 and rearranging gives us:

: \theta^* \approx \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}). \,

We therefore use the algorithm

: \theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m}), \,

and under certain regularity conditions, it can be shown that \theta_m \rightarrow \theta^*.

Fisher scoring

In practice, \mathcal{J}(\theta) is usually replaced by \mathcal{I}(\theta)= \mathrm{E}[\mathcal{J}(\theta)], the Fisher information, thus giving us the Fisher Scoring Algorithm:

: \theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m})..

Under some regularity conditions, if \theta_m is a consistent estimator, then \theta_{m+1} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.{{Citation |last1=Li |first1=Bing |title=Bayesian Inference |date=2019 |url=http://dx.doi.org/10.1007/978-1-4939-9761-9_6 |work=Springer Texts in Statistics |place=New York, NY |publisher=Springer New York |isbn=978-1-4939-9759-6 |access-date=2023-01-03 |last2=Babu |first2=G. Jogesh |doi=10.1007/978-1-4939-9761-9_6 |s2cid=239322258 |at=Theorem 9.4|url-access=subscription }}

See also

References

{{Reflist}}

Further reading

  • {{cite journal |last1=Jennrich |first1=R. I. |last2=Sampson |first2=P. F. |name-list-style=amp |year=1976 |title=Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation |journal=Technometrics |volume=18 |issue=1 |pages=11–17 |doi=10.1080/00401706.1976.10489395 |doi-broken-date=1 November 2024 |jstor=1267911 | url=https://www.tandfonline.com/doi/abs/10.1080/00401706.1976.10489395 }}

{{Optimization algorithms|unconstrained}}

Category:Maximum likelihood estimation