Hellinger distance

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.{{SpringerEOM|title=Hellinger distance|id=h/h046890|first=M.S. |last=Nikulin}}{{Citation

| last = Hellinger

| first = Ernst

| author-link = Ernst Hellinger

| title = Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen

| url = http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941

| year = 1909

| journal = Journal für die reine und angewandte Mathematik

| language = de

| volume = 1909

| issue = 136

| pages = 210–271

| jfm = 40.0393.01

| doi=10.1515/crll.1909.136.210

| s2cid = 121150138

}}

It is sometimes called the Jeffreys distance.{{Cite web |title=Jeffreys distance - Encyclopedia of Mathematics |url=https://encyclopediaofmath.org/wiki/Jeffreys_distance |access-date=2022-05-24 |website=encyclopediaofmath.org |language=en}}{{Cite journal |date=1946-09-24 |title=An invariant form for the prior probability in estimation problems |url=http://dx.doi.org/10.1098/rspa.1946.0056 |journal=Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences |volume=186 |issue=1007 |pages=453–461 |doi=10.1098/rspa.1946.0056 |pmid=20998741 |bibcode=1946RSPSA.186..453J |issn=0080-4630|last1=Jeffreys |first1=Harold |s2cid=19490929 |doi-access=free }}

Definition

=Measure theory=

To define the Hellinger distance in terms of measure theory, let $P$ and $Q$ denote two probability measures on a measure space $\mathcal{X}$ that are absolutely continuous with respect to an auxiliary measure $\lambda$ . Such a measure always exists, e.g $\lambda = (P + Q)$ . The square of the Hellinger distance between $P$ and $Q$ is defined as the quantity

: $H^2(P,Q) = \frac{1}{2}\displaystyle \int_{\mathcal{X}} \left(\sqrt{p(x)} - \sqrt{q(x)}\right)^2 \lambda(dx).$

Here, $P(dx) = p(x)\lambda(dx)$ and $Q(dx) = q(x) \lambda(dx)$ , i.e. $p$ and $q$ are the Radon–Nikodym derivatives of P and Q respectively with respect to $\lambda$ . This definition does not depend on $\lambda$ , i.e. the Hellinger distance between P and Q does not change if $\lambda$ is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as

: $H^2(P,Q) = \frac{1}{2}\int_{\mathcal{X}} \left(\sqrt{P(dx)} - \sqrt{Q(dx)}\right)^2.$

=Probability theory using Lebesgue measure=

To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure, so that dP / dλ and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral

: $H^2(f,g) =\frac{1}{2}\int \left(\sqrt{f(x)} - \sqrt{g(x)}\right)^2 \, dx = 1 - \int \sqrt{f(x) g(x)} \, dx,$

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.

The Hellinger distance H(P, Q) satisfies the property (derivable from the Cauchy–Schwarz inequality)

: $0\le H(P,Q) \le 1.$

=Discrete distributions=

For two discrete probability distributions $P=(p_1, \ldots, p_k)$ and $Q=(q_1, \ldots, q_k)$ ,

their Hellinger distance is defined as

: $H(P, Q) = \frac{1}{\sqrt{2}} \; \sqrt{\sum_{i=1}^k (\sqrt{p_i} - \sqrt{q_i})^2},$

which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.

: $H(P, Q) = \frac{1}{\sqrt{2}} \; \bigl\|\sqrt{P} - \sqrt{Q} \bigr\|_2 .$

Also, $1 - H^2(P,Q) = \sum_{i=1}^k \sqrt{p_i q_i}.$ {{Citation needed|date=September 2024}}

Properties

The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.

The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.

Sometimes the factor $1/\sqrt{2}$ in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.

The Hellinger distance is related to the Bhattacharyya coefficient $BC(P,Q)$ as it can be defined as

: $H(P,Q) = \sqrt{1 - BC(P,Q)}.$

Hellinger distances are used in the theory of sequential and asymptotic statistics.{{cite book |first=Erik |last=Torgerson |year=1991 |chapter=Comparison of Statistical Experiments |volume=36 |title=Encyclopedia of Mathematics |publisher=Cambridge University Press }}{{cite book

|author1=Liese, Friedrich |author2=Miescke, Klaus-J.

| title = Statistical Decision Theory: Estimation, Testing, and Selection

| year = 2008

| publisher = Springer

| isbn = 978-0-387-73193-3

}}

The squared Hellinger distance between two normal distributions $P \sim \mathcal{N}(\mu_1,\sigma_1^2)$ and $Q \sim \mathcal{N}(\mu_2,\sigma_2^2)$ is:

: $H^2(P, Q) = 1 - \sqrt{\frac{2\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \, e^{-\frac{1}{4}\frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}}.$

The squared Hellinger distance between two multivariate normal distributions $P \sim \mathcal{N}(\mu_1,\Sigma_1)$ and $Q \sim \mathcal{N}(\mu_2,\Sigma_2)$ is {{cite book |last=Pardo |first=L. |year=2006 |title=Statistical Inference Based on Divergence Measures |location=New York |publisher=Chapman and Hall/CRC |page=51 |isbn=1-58488-600-5 }}

: $H^2(P, Q) = 1 - \frac{ \det (\Sigma_1)^{1/4} \det (\Sigma_2) ^{1/4}} { \det \left( \frac{\Sigma_1 + \Sigma_2}{2}\right)^{1/2} }
\exp\left\{-\frac{1}{8}(\mu_1 - \mu_2)^T
\left(\frac{\Sigma_1 + \Sigma_2}{2}\right)^{-1}
(\mu_1 - \mu_2)
\right\}$

The squared Hellinger distance between two exponential distributions $P \sim \mathrm{Exp}(\alpha)$ and $Q \sim \mathrm{Exp}(\beta)$ is:

: $H^2(P, Q) = 1 - \frac{2 \sqrt{\alpha \beta}}{\alpha + \beta}.$

The squared Hellinger distance between two Weibull distributions $P \sim \mathrm{W}(k,\alpha)$ and $Q \sim \mathrm{W}(k,\beta)$ (where $k$ is a common shape parameter and $\alpha\, , \beta$ are the scale parameters respectively):

: $H^2(P, Q) = 1 - \frac{2 (\alpha \beta)^{k/2}}{\alpha^k + \beta^k}.$

The squared Hellinger distance between two Poisson distributions with rate parameters $\alpha$ and $\beta$ , so that $P \sim \mathrm{Poisson}(\alpha)$ and $Q \sim \mathrm{Poisson}(\beta)$ , is:

: $H^2(P,Q) = 1-e^{-\frac{1}{2} (\sqrt{\alpha} - \sqrt{\beta})^2}.$

The squared Hellinger distance between two beta distributions $P \sim \text{Beta}(a_1,b_1)$ and $Q \sim \text{Beta}(a_2, b_2)$ is:

: $H^2(P,Q) = 1 - \frac{B\left(\frac{a_1 + a_2}{2}, \frac{b_1 + b_2}{2}\right)}{\sqrt{B(a_1, b_1) B(a_2, b_2)}}$

where $B$ is the beta function.

The squared Hellinger distance between two gamma distributions $P \sim \text{Gamma}(a_1,b_1)$ and $Q \sim \text{Gamma}(a_2, b_2)$ is:

: $H^2(P,Q) = 1 - \Gamma\left({\scriptstyle\frac{a_1 + a_2}{2}}\right)\left(\frac{b_1+b_2}{2}\right)^{-(a_1+a_2)/2}\sqrt{\frac{b_1^{a_1}b_2^{a_2}}{\Gamma(a_1)\Gamma(a_2)}}$

where $\Gamma$ is the gamma function.

Connection with total variation distance

The Hellinger distance $H(P,Q)$ and the total variation distance (or statistical distance) $\delta(P,Q)$ are related as follows:{{cite web |url=https://www.tcs.tifr.res.in/~prahladh/teaching/2011-12/comm/lectures/l12.pdf |title=Lecture notes on communication complexity |date=September 23, 2011 |first=Prahladh |last=Harsha }}

: $H^2(P,Q) \leq \delta(P,Q) \leq \sqrt{2}H(P,Q)\,.$

The constants in this inequality may change depending on which renormalization you choose ( $1/2$ or $1/\sqrt{2}$ ).

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

Notes

References

{{cite book |author1=Yang, Grace Lo | author1-link = Grace Yang |author2=Le Cam, Lucien M. |title=Asymptotics in Statistics: Some Basic Concepts |publisher=Springer |location=Berlin |year=2000 |isbn=0-387-95036-2 }}
{{cite book |author=Vaart, A. W. van der |title=Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics) |date=19 June 2000 |publisher=Cambridge University Press |location=Cambridge, UK |isbn=0-521-78450-6 }}
{{cite book |author=Pollard, David E. |title=A user's guide to measure theoretic probability |publisher=Cambridge University Press |location=Cambridge, UK |year=2002 |isbn=0-521-00289-3 }}

Category:Theory of probability distributions

Category:F-divergences

Category:Statistical distance