Projected normal distribution#Wider application of the normalized linear transform

{{Infobox probability distribution

| name = Projected normal distribution

| type = density

| notation = $\mathcal{P N}_n(\boldsymbol\mu, \boldsymbol\Sigma)$

| parameters = $\boldsymbol\mu\in\R^n$ (location)
$\boldsymbol\Sigma\in\R^{n \times n}$ (scale)

| support = $\boldsymbol \theta \in [0, \pi]^{n - 2} \times [0, 2 \pi)$

| pdf = complicated, see text

}}

In directional statistics, the projected normal distribution (also known as offset normal distribution, angular normal distribution or angular Gaussian distribution){{sfn|Wang|Gelfand|2013}}{{sfn|Pukkila|Rao|1988}} is a probability distribution over directions that describes the radial projection of a random variable with n-variate normal distribution over the unit (n-1)-sphere.

Definition and properties

Given a random variable $\boldsymbol X \in \R^n$ that follows a multivariate normal distribution $\mathcal{N}_n(\boldsymbol\mu,\, \boldsymbol\Sigma)$ , the projected normal distribution $\mathcal{PN}_n(\boldsymbol\mu, \boldsymbol\Sigma)$ represents the distribution of the random variable $\boldsymbol Y = \frac{\boldsymbol X}{\lVert \boldsymbol X \rVert}$ obtained projecting $\boldsymbol X$ over the unit sphere. In the general case, the projected normal distribution can be asymmetric and multimodal. In case $\boldsymbol \mu$ is parallel to an eigenvector of $\boldsymbol \Sigma$ , the distribution is symmetric.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=115}} The first version of such distribution was introduced in Pukkila and Rao (1988).{{sfn|Pukkila|Rao|1988|p=381}}

Density function

The density of the projected normal distribution $\mathcal{P N}_n(\boldsymbol\mu, \boldsymbol\Sigma)$ can be constructed from the density of its generator n-variate normal distribution $\mathcal{N}_n(\boldsymbol\mu, \boldsymbol\Sigma)$ by re-parametrising to n-dimensional spherical coordinates and then integrating over the radial coordinate.

In spherical coordinates with radial component $r \in [0, \infty)$ and angles $\boldsymbol \theta = (\theta_1, \dots, \theta_{n-1}) \in [0, \pi]^{n - 2} \times [0, 2 \pi)$ , a point $\boldsymbol x = (x_1, \dots, x_n) \in \R^n$ can be written as $\boldsymbol x = r \boldsymbol v$ , with $\lVert \boldsymbol v \rVert = 1$ . The joint density becomes

: $p(r, \boldsymbol \theta | \boldsymbol \mu, \boldsymbol \Sigma) =
\frac{r^{n-1}}{\sqrt$

\boldsymbol \Sigma

(2 \pi)^{\frac{n}{2}}} e^{-\frac{1}{2} (r \boldsymbol v - \boldsymbol \mu)^\top \Sigma^{-1} (r \boldsymbol v - \boldsymbol \mu)}

and the density of $\mathcal{P N}_n(\boldsymbol\mu, \boldsymbol\Sigma)$ can then be obtained as{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=117}}

: $p(\boldsymbol \theta | \boldsymbol \mu, \boldsymbol \Sigma) = \int_0^\infty p(r, \boldsymbol \theta | \boldsymbol \mu, \boldsymbol \Sigma) dr .$

The same density had been previously obtained in Pukkila and Rao (1988, Eq. (2.4)){{sfn|Pukkila|Rao|1988|p=381}} using a different notation.

= Circular distribution =

Parametrising the position on the unit circle in polar coordinates as $\boldsymbol v = (\cos\theta, \sin\theta)$ , the density function can be written with respect to the parameters $\boldsymbol\mu$ and $\boldsymbol\Sigma$ of the initial normal distribution as

: $p(\theta | \boldsymbol\mu, \boldsymbol\Sigma) =
\frac{e^{-\frac{1}{2} \boldsymbol \mu^\top \boldsymbol \Sigma^{-1} \boldsymbol \mu}}{2 \pi \sqrt$

\boldsymbol \Sigma

\boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol v} \left( 1 + T(\theta) \frac{\Phi(T(\theta))}{\phi(T(\theta))} \right) I_{[0, 2\pi)}(\theta)

where $\phi$ and $\Phi$ are the density and cumulative distribution of a standard normal distribution, $T(\theta) = \frac{\boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol \mu}{\sqrt{\boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol v}}$ , and $I$ is the indicator function.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=115}}

In the circular case, if the mean vector $\boldsymbol \mu$ is parallel to the eigenvector associated to the largest eigenvalue of the covariance, the distribution is symmetric and has a mode at $\theta = \alpha$ and either a mode or an antimode at $\theta = \alpha + \pi$ , where $\alpha$ is the polar angle of $\boldsymbol \mu = (r \cos\alpha, r \sin\alpha)$ . If the mean is parallel to the eigenvector associated to the smallest eigenvalue instead, the distribution is also symmetric but has either a mode or an antimode at $\theta = \alpha$ and an antimode at $\theta = \alpha + \pi$ .{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|ps=, Supplementary material, p. 1.}}

= Spherical distribution =

Parametrising the position on the unit sphere in spherical coordinates as $\boldsymbol v = (\cos\theta_1 \sin\theta_2, \sin\theta_1 \sin\theta_2, \cos\theta_2)$ where $\boldsymbol \theta = (\theta_1, \theta_2)$ are the azimuth $\theta_1 \in [0, 2\pi)$ and inclination $\theta_2 \in [0, \pi]$ angles respectively, the density function becomes

: $p(\boldsymbol \theta | \boldsymbol\mu, \boldsymbol\Sigma) =
\frac{e^{-\frac{1}{2} \boldsymbol \mu^\top \boldsymbol \Sigma^{-1} \boldsymbol \mu}}{\sqrt$

\boldsymbol \Sigma

\left( 2 \pi \boldsymbol v^\top \boldsymbol \Sigma^{-1} \boldsymbol v \right)^{\frac{3}{2}}} \left(\frac{\Phi(T(\boldsymbol \theta))}{\phi(T(\boldsymbol \theta))} + T(\boldsymbol \theta) \left( 1 + T(\boldsymbol \theta) \frac{\Phi(T(\boldsymbol \theta))}{\phi(T(\boldsymbol \theta))} \right) \right) I_{[0, 2\pi)}(\theta_1) I_{[0, \pi]}(\theta_2)

where $\phi$ , $\Phi$ , $T$ , and $I$ have the same meaning as the circular case.{{sfn|Hernandez-Stumpfhauser|Breidt|van der Woerd|2017|p=123}}

Angular Central Gaussian Distribution

In the special case, $\boldsymbol\mu=\mathbf 0$ , the projected normal distribution, with $n\ge2$ is known as the angular central Gaussian (ACG){{sfn|Tyler|1987}} and in this case, the density function can be obtained in closed form as a function of Cartesian coordinates. Let $\mathbf x\sim\mathcal N_n(\mathbf 0, \boldsymbol\Sigma)$ and project radially: $\mathbf v = \lVert\mathbf x\rVert^{-1}\mathbf x$ so that $\mathbf v\in\mathbb S^{n-1}=\{\mathbf z\in\mathbb R^n:\lVert \mathbf z\rVert=1\}$ (the unit hypersphere). We write $\mathbf v\sim\operatorname{ACG}(\boldsymbol\Sigma)$ , which as explained above, has density (with respect to Lebesgue measure pulled back to $\mathbb S^{n-1}$ ):

: $p_{\text{ACG}}(\mathbf v\mid\boldsymbol\Sigma)
= \int_0^\infty r^{n-1}\mathcal N_n(r\mathbf v\mid\mathbf 0, \boldsymbol\Sigma)\,dr
= \frac{\Gamma(\frac n2)}{2\pi^{\frac n2}}\left|\boldsymbol\Sigma\right|^{-\frac12}(\mathbf v'\boldsymbol\Sigma^{-1}\mathbf v)^{-\frac n2}$

where the integral can be solved by a change of variables and then using the standard definition of the gamma function. Notice that:

For any $k>0$ there is the parameter indeterminacy:

: $p_{\text{ACG}}(\mathbf v\mid k\boldsymbol\Sigma) = p_{\text{ACG}}(\mathbf v\mid\boldsymbol\Sigma)$ .

If $\boldsymbol\Sigma=k\mathbf I$ , the uniform distribution, $\operatorname{ACG(\mathbf I_n)}$ results, with constant density equal to the reciprocal of the surface area of $\mathbb S^{n-1}$ :

: $p_\text{ACG}(\mathbf v\mid\mathbf kI_n)=p_\text{uniform}=\frac{\Gamma(\frac n2)}{2\pi^\frac n2}$

=ACG via transformation of normal or uniform variates=

Let $\mathbf T$ be any $n$ -by- $n$ invertible matrix such that $\mathbf T\mathbf T'=\boldsymbol\Sigma$ . Let $\mathbf u\sim\operatorname{ACG}(\mathbf I_n)$ (uniform) and $s\sim\chi(n)$ (chi distribution), so that: $\mathbf x=s\mathbf{Tu}\sim\mathcal N_n(\mathbf 0, \boldsymbol\Sigma)$ (multivariate normal). Now consider:

: $\mathbf v = \frac{\mathbf{Tu}}{\lVert\mathbf{Tu}\rVert} = \frac{\mathbf x}{\lVert\mathbf x\rVert}\sim\operatorname{ACG}(\boldsymbol\Sigma)$

which shows that the ACG distribution also results from applying, to uniform variates, the normalized linear transform:{{sfn|Tyler|1987}}

: $f_{\mathbf T}(\mathbf u)=\frac{\mathbf{Tu}}{\lVert\mathbf{Tu}\rVert}$

Some further explanation of these two ways to obtain $\mathbf v\sim\operatorname{ACG}(\boldsymbol\Sigma)$ may be helpful:

If we start with $\mathbf x\in\mathbb R^n$ , sampled from a multivariate normal, we can project radially onto $\mathbb S^{n-1}$ to obtain ACG variates. To derive the ACG density, we first do a change of variables: $\mathbf x\mapsto(r,\mathbf v)$ , which is still an $n$ -dimensional representation, and this transformation induces the differential volume change factor, $r^{n-1}$ , which is proportional to volume in the $(n-1)$ -dimensional tangent space perpendicular to $\mathbf x$ . Then, to finally obtain the ACG density on the $(n-1)$ -dimensional unitsphere, we need to marginalize over $r$ .
If we start with $\mathbf u\in\mathbb S^{n-1}$ , sampled from the uniform distribution, we do not need to marginalize, because we are already in $n-1$ dimensions. Instead, to obtain ACG variates (and the associated density), we can directly do the change of variables, $\mathbf v=f_{\mathbf T}(\mathbf u)$ , for which further details are given in the next subsection.

Caveat: when $\boldsymbol\mu$ is nonzero, although $s\mathbf{Tu}+\boldsymbol\mu\sim\mathcal N_d(\boldsymbol\mu,\boldsymbol\Sigma)$ , a similar duality does not hold:

: $\frac{\mathbf {Tu} + \boldsymbol\mu}{\lVert\mathbf {Tu} + \boldsymbol\mu\rVert}
\ne\frac{s\mathbf {Tu} + \boldsymbol\mu}{\lVert s\mathbf {Tu} + \boldsymbol\mu\rVert}\sim\mathcal{PN}_n(\boldsymbol{\mu,\Sigma})$

Although we can radially project affine-transformed normal variates to get $\mathcal{PN}_n$ variates, this does not work for uniform variates.

=Wider application of the normalized linear transform=

The normalized linear transform, $\mathbf v=f_{\mathbf T}(\mathbf u)$ , is a bijection from the unitsphere to itself; the inverse is $\mathbf u=f_{\mathbf T^{-1}}(\mathbf v)$ . This transform is of independent interest, as it may be applied as a probabilistic flow on the hypersphere (similar to a normalizing flow) to generalize other (non-uniform) distributions on hyperspheres, for example the Von Mises-Fisher distribution. The fact that we have a closed form for the ACG density allows us to recover also in closed form the differential volume change induced by this transform.

For the change of variables, $\mathbf v=f_{\mathbf T}(\mathbf u)$ on the manifold, $\mathbb S^{n-1}$ , the uniform and ACG densities are related as:{{sfn|Sorrenson|Draxler|Rousselot|Hummerich|2024|ps=, Appendix A.}}

: $p_{\text{ACG}}(\mathbf v\mid\boldsymbol\Sigma) = \frac{p_{\text{uniform}}}{R(\mathbf v,\boldsymbol\Sigma)}$

where the (constant) uniform density is $p_{\text{uniform}}=\frac{\Gamma(n/2)}{2\pi^{n/2}}$ and where $R(\mathbf v,\boldsymbol\Sigma)$ is the differential volume change factor from the input to the output of the transformation; specifically, it is given by the absolute value of the determinant of an $(n-1)$ -by- $(n-1)$ matrix:

: $R(\mathbf v,\boldsymbol\Sigma) = \operatorname{abs}\left|\mathbf Q_{\mathbf v}'\mathbf J_{\mathbf u}\mathbf Q_{\mathbf u}\right|$

where $\mathbf J_{\mathbf u}$ is the $n$ -by- $n$ Jacobian matrix of the transformation in Euclidean space, $f_{\mathbf T}:\mathbb R^n\to\mathbb R^n$ , evaluated at $\mathbf u$ . In Euclidean space, the transformation and its Jacobian are non-invertible, but when the domain and co-domain are restricted to $\mathbb S^{n-1}$ , then $f_{\mathbf T}:\mathbb S^{n-1}\to\mathbb S^{n-1}$ is a bijection and the induced differential volume ratio, $R(\mathbf v,\boldsymbol\Sigma)$ is obtained by projecting $\mathbf J_{\mathbf u}$ onto the $(n-1)$ -dimensional tangent spaces at the transformation input and output: $\mathbf Q_{\mathbf u}, \mathbf Q_{\mathbf v}$ are $n$ -by- $(n-1)$ matrices whose orthonormal columns span the tangent spaces. Although the above determinant formula is relatively easy to evaluate numerically on a software platform equipped with linear algebra and automatic differentiation, a simple closed form is hard to derive directly. However, since we already have $p_{\text{ACG}}$ , we can recover:

: $R(\mathbf v, \boldsymbol\Sigma) = \left|\boldsymbol\Sigma\right|^{\frac12}(\mathbf v'\boldsymbol\Sigma^{-1}\mathbf v)^{\frac n2}
= \frac{\operatorname{abs}\left|\mathbf T\right|}{\lVert\mathbf{Tu}\rVert^n}$

where in the final RHS it is understood that $\boldsymbol\Sigma=\mathbf T\mathbf T'$ and $\mathbf u=f_{\mathbf T^{-1}}(\mathbf v)$ .

The normalized linear transform can now be used, for example, to give a closed-form density for a more flexible distribution on the hypersphere, that is generalized from the Von Mises-Fisher. Let $\mathbf x\sim\text{VMF}(\boldsymbol\mu,\kappa)$ and $\mathbf v = f_{\mathbf T}(\mathbf x)$ ; the resulting density is:

: $p(\mathbf v\mid\boldsymbol\mu,\kappa,\mathbf T) = \frac{p_\text{VMF}\bigl(\mathbf f_{T^{-1}}(\mathbf v)\mid\boldsymbol\mu,\kappa\bigr)}{R(\mathbf v,\mathbf T\mathbf T')}$

References

Sources

{{cite journal|title=Pattern recognition based on scale invariant discriminant functions|year=1988|journal=Information Sciences|volume=45|pages=379–389|issue=3|last1=Pukkila|first1=Tarmo M.|last2=Rao|first2=C. Radhakrishna|doi=10.1016/0020-0255(88)90012-6 }}
{{cite journal|title=The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference|year=2017|journal=Bayesian Analysis|volume=12|pages=113–133|issue=1|last1=Hernandez-Stumpfhauser|first1=Daniel|last2=Breidt|first2=F. Jay|last3=van der Woerd|first3=Mark J.|doi=10.1214/15-BA989 |doi-access=free}}
{{cite journal|title=Directional data analysis under the general projected normal distribution|last1=Wang|first1=Fangpo|last2=Gelfand|first2=Alan E|journal=Statistical Methodology|volume=10|number=1|pages=113–127|year=2013|publisher=Elsevier|doi=10.1016/j.stamet.2012.07.005 |pmid=24046539 |pmc=3773532 }}
{{cite journal|title=Statistical analysis for the angular central Gaussian distribution on the sphere|last1=Tyler|first1=David E|journal=Biometrika|volume=74|number=3|pages=579–589|year=1987|doi=10.2307/2336697}}
{{cite arxiv

| title = Learning Distributions on Manifolds with Free-Form Flows

| first1 = Peter | last1 = Sorrenson

| first2 = Felix | last2 = Draxler

| first3 = Armand | last3 = Rousselot

| first4 = Sander | last4 = Hummerich

| first5 = Ullrich | last5 = Köthe

| eprint = 2312.09852

| year = 2024

| class = cs.LG

}}

Category:Normal distribution

Category:Continuous distributions

Category:Directional statistics

Projected normal distribution#Wider application of the normalized linear transform

Definition and properties

Density function

= Circular distribution =

= Spherical distribution =

Angular Central Gaussian Distribution

=ACG via transformation of normal or uniform variates=

=Wider application of the normalized linear transform=

See also

References

Sources