Projected normal distribution#Wider application of the normalized linear transform

{{Short description|Probability distribution}}

{{Infobox probability distribution

| name = Projected normal distribution

| type = density

| notation = \mathcal{P N}_n(\boldsymbol\mu, \boldsymbol\Sigma)

| parameters = \boldsymbol\mu\in\R^n (location)
\boldsymbol\Sigma\in\R^{n \times n} (scale)

| support = \boldsymbol \theta \in [0, \pi]^{n - 2} \times [0, 2 \pi)

| pdf = complicated, see text

}}

In directional statistics, the projected normal distribution (also known as offset normal distribution, angular normal distribution or angular Gaussian distribution){{sfn|Wang|Gelfand|2013}}{{sfn|Pukkila|Rao|1988}} is a probability distribution over directions that describes the radial projection of a random variable with n-variate normal distribution over the unit (n-1)-sphere.

Definition and properties

Given a random variable \boldsymbol X \in \R^n that follows a multivariate normal distribution \mathcal{N}_n(\boldsymbol\mu,\, \boldsymbol\Sigma), the projected normal distribution \mathcal{PN}_n(\boldsymbol\mu, \boldsymbol\Sigma) represents the distribution of the random variable \boldsymbol Y = \frac{\boldsymbol X}{\lVert \boldsymbol X \rVert} obtained projecting \boldsymbol X over the unit sphere. In the general case, the projected normal distribution can be asymmetric and multimodal. In case \boldsymbol \mu is parallel to an eigenvector of \boldsymbol \Sigma, the distribution is symmetric.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=115}} The first version of such distribution was introduced in Pukkila and Rao (1988).{{sfn|Pukkila|Rao|1988|p=381}}

Density function

The density of the projected normal distribution \mathcal{P N}_n(\boldsymbol\mu, \boldsymbol\Sigma) can be constructed from the density of its generator n-variate normal distribution \mathcal{N}_n(\boldsymbol\mu, \boldsymbol\Sigma) by re-parametrising to n-dimensional spherical coordinates and then integrating over the radial coordinate.

In spherical coordinates with radial component r \in [0, \infty) and angles \boldsymbol \theta = (\theta_1, \dots, \theta_{n-1}) \in [0, \pi]^{n - 2} \times [0, 2 \pi), a point \boldsymbol x = (x_1, \dots, x_n) \in \R^n can be written as \boldsymbol x = r \boldsymbol v, with \lVert \boldsymbol v \rVert = 1. The joint density becomes

:

p(r, \boldsymbol \theta | \boldsymbol \mu, \boldsymbol \Sigma) =

\frac{r^{n-1}}{\sqrt

\boldsymbol \Sigma
(2 \pi)^{\frac{n}{2}}}

e^{-\frac{1}{2} (r \boldsymbol v - \boldsymbol \mu)^\top \Sigma^{-1} (r \boldsymbol v - \boldsymbol \mu)}

and the density of \mathcal{P N}_n(\boldsymbol\mu, \boldsymbol\Sigma) can then be obtained as{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=117}}

:

p(\boldsymbol \theta | \boldsymbol \mu, \boldsymbol \Sigma) = \int_0^\infty p(r, \boldsymbol \theta | \boldsymbol \mu, \boldsymbol \Sigma) dr .

The same density had been previously obtained in Pukkila and Rao (1988, Eq. (2.4)){{sfn|Pukkila|Rao|1988|p=381}} using a different notation.

= Circular distribution =

Parametrising the position on the unit circle in polar coordinates as \boldsymbol v = (\cos\theta, \sin\theta) , the density function can be written with respect to the parameters \boldsymbol\mu and \boldsymbol\Sigma of the initial normal distribution as

:

p(\theta | \boldsymbol\mu, \boldsymbol\Sigma) =

\frac{e^{-\frac{1}{2} \boldsymbol \mu^\top \boldsymbol \Sigma^{-1} \boldsymbol \mu}}{2 \pi \sqrt

\boldsymbol \Sigma
\boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol v}

\left( 1 + T(\theta) \frac{\Phi(T(\theta))}{\phi(T(\theta))} \right) I_{[0, 2\pi)}(\theta)

where \phi and \Phi are the density and cumulative distribution of a standard normal distribution, T(\theta) = \frac{\boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol \mu}{\sqrt{\boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol v}}, and I is the indicator function.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=115}}

In the circular case, if the mean vector \boldsymbol \mu is parallel to the eigenvector associated to the largest eigenvalue of the covariance, the distribution is symmetric and has a mode at \theta = \alpha and either a mode or an antimode at \theta = \alpha + \pi, where \alpha is the polar angle of \boldsymbol \mu = (r \cos\alpha, r \sin\alpha). If the mean is parallel to the eigenvector associated to the smallest eigenvalue instead, the distribution is also symmetric but has either a mode or an antimode at \theta = \alpha and an antimode at \theta = \alpha + \pi.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|ps=, Supplementary material, p. 1.}}

= Spherical distribution =

Parametrising the position on the unit sphere in spherical coordinates as \boldsymbol v = (\cos\theta_1 \sin\theta_2, \sin\theta_1 \sin\theta_2, \cos\theta_2) where \boldsymbol \theta = (\theta_1, \theta_2) are the azimuth \theta_1 \in [0, 2\pi) and inclination \theta_2 \in [0, \pi] angles respectively, the density function becomes

:

p(\boldsymbol \theta | \boldsymbol\mu, \boldsymbol\Sigma) =

\frac{e^{-\frac{1}{2} \boldsymbol \mu^\top \boldsymbol \Sigma^{-1} \boldsymbol \mu}}{\sqrt

\boldsymbol \Sigma
\left( 2 \pi \boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol v \right)^{\frac{3}{2}}}

\left(\frac{\Phi(T(\boldsymbol \theta))}{\phi(T(\boldsymbol \theta))} + T(\boldsymbol \theta) \left( 1 + T(\boldsymbol \theta) \frac{\Phi(T(\boldsymbol \theta))}{\phi(T(\boldsymbol \theta))} \right) \right)

I_{[0, 2\pi)}(\theta_1) I_{[0, \pi]}(\theta_2)

where \phi, \Phi, T, and I have the same meaning as the circular case.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=123}}

Angular Central Gaussian Distribution

In the special case, \boldsymbol\mu=\mathbf 0, the projected normal distribution, with n\ge2 is known as the angular central Gaussian (ACG){{sfn|Tyler|1987}} and in this case, the density function can be obtained in closed form as a function of Cartesian coordinates. Let \mathbf x\sim\mathcal N_n(\mathbf 0, \boldsymbol\Sigma) and project radially: \mathbf v = \lVert\mathbf x\rVert^{-1}\mathbf x so that \mathbf v\in\mathbb S^{n-1}=\{\mathbf z\in\mathbb R^n:\lVert \mathbf z\rVert=1\} (the unit hypersphere). We write \mathbf v\sim\operatorname{ACG}(\boldsymbol\Sigma), which as explained above, has density (with respect to Lebesgue measure pulled back to \mathbb S^{n-1}):

:

p_{\text{ACG}}(\mathbf v\mid\boldsymbol\Sigma)

= \int_0^\infty r^{n-1}\mathcal N_n(r\mathbf v\mid\mathbf 0, \boldsymbol\Sigma)\,dr

= \frac{\Gamma(\frac n2)}{2\pi^{\frac n2}}\left|\boldsymbol\Sigma\right|^{-\frac12}(\mathbf v'\boldsymbol\Sigma^{-1}\mathbf v)^{-\frac n2}

where the integral can be solved by a change of variables and then using the standard definition of the gamma function. Notice that:

  • For any k>0 there is the parameter indeterminacy:

:p_{\text{ACG}}(\mathbf v\mid k\boldsymbol\Sigma) = p_{\text{ACG}}(\mathbf v\mid\boldsymbol\Sigma).

  • If \boldsymbol\Sigma=k\mathbf I, the uniform distribution, \operatorname{ACG(\mathbf I_n)} results, with constant density equal to the reciprocal of the surface area of \mathbb S^{n-1}:

:

p_\text{ACG}(\mathbf v\mid\mathbf kI_n)=p_\text{uniform}=\frac{\Gamma(\frac n2)}{2\pi^\frac n2}

=ACG via transformation of normal or uniform variates=

Let \mathbf T be any n-by-n invertible matrix such that \mathbf T\mathbf T'=\boldsymbol\Sigma. Let \mathbf u\sim\operatorname{ACG}(\mathbf I_n) (uniform) and s\sim\chi(n) (chi distribution), so that: \mathbf x=s\mathbf{Tu}\sim\mathcal N_n(\mathbf 0, \boldsymbol\Sigma) (multivariate normal). Now consider:

:

\mathbf v = \frac{\mathbf{Tu}}{\lVert\mathbf{Tu}\rVert} = \frac{\mathbf x}{\lVert\mathbf x\rVert}\sim\operatorname{ACG}(\boldsymbol\Sigma)

which shows that the ACG distribution also results from applying, to uniform variates, the normalized linear transform:{{sfn|Tyler|1987}}

:f_{\mathbf T}(\mathbf u)=\frac{\mathbf{Tu}}{\lVert\mathbf{Tu}\rVert}

Some further explanation of these two ways to obtain \mathbf v\sim\operatorname{ACG}(\boldsymbol\Sigma) may be helpful:

  • If we start with \mathbf x\in\mathbb R^n, sampled from a multivariate normal, we can project radially onto \mathbb S^{n-1} to obtain ACG variates. To derive the ACG density, we first do a change of variables: \mathbf x\mapsto(r,\mathbf v), which is still an n-dimensional representation, and this transformation induces the differential volume change factor, r^{n-1}, which is proportional to volume in the (n-1)-dimensional tangent space perpendicular to \mathbf x. Then, to finally obtain the ACG density on the (n-1)-dimensional unitsphere, we need to marginalize over r.
  • If we start with \mathbf u\in\mathbb S^{n-1}, sampled from the uniform distribution, we do not need to marginalize, because we are already in n-1 dimensions. Instead, to obtain ACG variates (and the associated density), we can directly do the change of variables, \mathbf v=f_{\mathbf T}(\mathbf u), for which further details are given in the next subsection.

Caveat: when \boldsymbol\mu is nonzero, although s\mathbf{Tu}+\boldsymbol\mu\sim\mathcal N_d(\boldsymbol\mu,\boldsymbol\Sigma), a similar duality does not hold:

:

\frac{\mathbf {Tu} + \boldsymbol\mu}{\lVert\mathbf {Tu} + \boldsymbol\mu\rVert}

\ne\frac{s\mathbf {Tu} + \boldsymbol\mu}{\lVert s\mathbf {Tu} + \boldsymbol\mu\rVert}\sim\mathcal{PN}_n(\boldsymbol{\mu,\Sigma})

Although we can radially project affine-transformed normal variates to get \mathcal{PN}_n variates, this does not work for uniform variates.

=Wider application of the normalized linear transform=

The normalized linear transform, \mathbf v=f_{\mathbf T}(\mathbf u), is a bijection from the unitsphere to itself; the inverse is \mathbf u=f_{\mathbf T^{-1}}(\mathbf v). This transform is of independent interest, as it may be applied as a probabilistic flow on the hypersphere (similar to a normalizing flow) to generalize other (non-uniform) distributions on hyperspheres, for example the Von Mises-Fisher distribution. The fact that we have a closed form for the ACG density allows us to recover also in closed form the differential volume change induced by this transform.

For the change of variables, \mathbf v=f_{\mathbf T}(\mathbf u) on the manifold, \mathbb S^{n-1}, the uniform and ACG densities are related as:{{sfn|Sorrenson|Draxler|Rousselot|Hummerich|2024|ps=, Appendix A.}}

:

p_{\text{ACG}}(\mathbf v\mid\boldsymbol\Sigma) = \frac{p_{\text{uniform}}}{R(\mathbf v,\boldsymbol\Sigma)}

where the (constant) uniform density is p_{\text{uniform}}=\frac{\Gamma(n/2)}{2\pi^{n/2}} and where R(\mathbf v,\boldsymbol\Sigma) is the differential volume change factor from the input to the output of the transformation; specifically, it is given by the absolute value of the determinant of an (n-1)-by-(n-1) matrix:

:

R(\mathbf v,\boldsymbol\Sigma) = \operatorname{abs}\left|\mathbf Q_{\mathbf v}'\mathbf J_{\mathbf u}\mathbf Q_{\mathbf u}\right|

where \mathbf J_{\mathbf u} is the n-by-n Jacobian matrix of the transformation in Euclidean space, f_{\mathbf T}:\mathbb R^n\to\mathbb R^n, evaluated at \mathbf u. In Euclidean space, the transformation and its Jacobian are non-invertible, but when the domain and co-domain are restricted to \mathbb S^{n-1}, then f_{\mathbf T}:\mathbb S^{n-1}\to\mathbb S^{n-1} is a bijection and the induced differential volume ratio, R(\mathbf v,\boldsymbol\Sigma) is obtained by projecting \mathbf J_{\mathbf u} onto the (n-1)-dimensional tangent spaces at the transformation input and output: \mathbf Q_{\mathbf u}, \mathbf Q_{\mathbf v} are n-by-(n-1) matrices whose orthonormal columns span the tangent spaces. Although the above determinant formula is relatively easy to evaluate numerically on a software platform equipped with linear algebra and automatic differentiation, a simple closed form is hard to derive directly. However, since we already have p_{\text{ACG}}, we can recover:

:

R(\mathbf v, \boldsymbol\Sigma) = \left|\boldsymbol\Sigma\right|^{\frac12}(\mathbf v'\boldsymbol\Sigma^{-1}\mathbf v)^{\frac n2}

= \frac{\operatorname{abs}\left|\mathbf T\right|}{\lVert\mathbf{Tu}\rVert^n}

where in the final RHS it is understood that \boldsymbol\Sigma=\mathbf T\mathbf T' and \mathbf u=f_{\mathbf T^{-1}}(\mathbf v).

The normalized linear transform can now be used, for example, to give a closed-form density for a more flexible distribution on the hypersphere, that is generalized from the Von Mises-Fisher. Let \mathbf x\sim\text{VMF}(\boldsymbol\mu,\kappa) and \mathbf v = f_{\mathbf T}(\mathbf x); the resulting density is:

:

p(\mathbf v\mid\boldsymbol\mu,\kappa,\mathbf T) = \frac{p_\text{VMF}\bigl(\mathbf f_{T^{-1}}(\mathbf v)\mid\boldsymbol\mu,\kappa\bigr)}{R(\mathbf v,\mathbf T\mathbf T')}

See also

References

{{reflist}}

Sources

  • {{cite journal|title=Pattern recognition based on scale invariant discriminant functions|year=1988|journal=Information Sciences|volume=45|pages=379–389|issue=3|last1=Pukkila|first1=Tarmo M.|last2=Rao|first2=C. Radhakrishna|doi=10.1016/0020-0255(88)90012-6 }}
  • {{cite journal|title=The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference|year=2017|journal=Bayesian Analysis|volume=12|pages=113–133|issue=1|last1=Hernandez-Stumpfhauser|first1=Daniel|last2=Breidt|first2=F. Jay|last3=van der Woerd|first3=Mark J.|doi=10.1214/15-BA989 |doi-access=free}}
  • {{cite journal|title=Directional data analysis under the general projected normal distribution|last1=Wang|first1=Fangpo|last2=Gelfand|first2=Alan E|journal=Statistical Methodology|volume=10|number=1|pages=113–127|year=2013|publisher=Elsevier|doi=10.1016/j.stamet.2012.07.005 |pmid=24046539 |pmc=3773532 }}
  • {{cite journal|title=Statistical analysis for the angular central Gaussian distribution on the sphere|last1=Tyler|first1=David E|journal=Biometrika|volume=74|number=3|pages=579–589|year=1987|doi=10.2307/2336697}}
  • {{cite arxiv

| title = Learning Distributions on Manifolds with Free-Form Flows

| first1 = Peter | last1 = Sorrenson

| first2 = Felix | last2 = Draxler

| first3 = Armand | last3 = Rousselot

| first4 = Sander | last4 = Hummerich

| first5 = Ullrich | last5 = Köthe

| eprint = 2312.09852

| year = 2024

| class = cs.LG

}}

{{DEFAULTSORT:Projected normal distribution}}

Category:Normal distribution

Category:Continuous distributions

Category:Directional statistics