scaled inverse chi-squared distribution

{{Probability distribution|

name =Scaled inverse chi-squared|

type =density|

pdf_image =250px|

cdf_image =250px|

parameters = $\nu > 0\,$
$\tau^2 > 0\,$ |

support = $x \in (0, \infty)$ |

pdf = $\frac{(\tau^2\nu/2)^{\nu/2}}{\Gamma(\nu/2)}~
\frac{\exp\left[ \frac{-\nu \tau^2}{2 x}\right]}{x^{1+\nu/2}}$ |

cdf = $\Gamma\left(\frac{\nu}{2},\frac{\tau^2\nu}{2x}\right)
\left/\Gamma\left(\frac{\nu}{2}\right)\right.$ |

mean = $\frac{\nu \tau^2}{\nu-2}$ for $\nu >2\,$ |

median =|

mode = $\frac{\nu \tau^2}{\nu+2}$ |

variance = $\frac{2 \nu^2 \tau^4}{(\nu-2)^2 (\nu-4)}$ for $\nu >4\,$ |

skewness = $\frac{4}{\nu-6}\sqrt{2(\nu-4)}$ for $\nu >6\,$ |

kurtosis = $\frac{12(5\nu-22)}{(\nu-6)(\nu-8)}$ for $\nu >8\,$ |

entropy = $\frac{\nu}{2}
\!+\!\ln\left(\frac{\tau^2\nu}{2}\Gamma\left(\frac{\nu}{2}\right)\right)$

$\!-\!\left(1\!+\!\frac{\nu}{2}\right)\psi\left(\frac{\nu}{2}\right)$ |

mgf = $\frac{2}{\Gamma(\frac{\nu}{2})}\left(\frac{-\tau^2\nu t}{2}\right)^{\!\!\frac{\nu}{4}}\!\!K_{\frac{\nu}{2}}\left(\sqrt{-2\tau^2\nu t}\right)$ |

char = $\frac{2}{\Gamma(\frac{\nu}{2})}\left(\frac{-i\tau^2\nu t}{2}\right)^{\!\!\frac{\nu}{4}}\!\!K_{\frac{\nu}{2}}\left(\sqrt{-2i\tau^2\nu t}\right)$ |

}}

The scaled inverse chi-squared distribution $\psi \, \mbox{inv-} \chi^2(\nu)$ , where $\psi$ is the scale parameter, equals the univariate inverse Wishart distribution

$\mathcal{W}^{-1}(\psi,\nu)$ with degrees of freedom $\nu$ .

This family of scaled inverse chi-squared distributions is linked to the inverse-chi-squared distribution and to the chi-squared distribution:

If $X \sim \psi \, \mbox{inv-} \chi^2(\nu)$ then $X/\psi \sim \mbox{inv-} \chi^2(\nu)$ as well as $\psi/X \sim \chi^2(\nu)$ and $1/X \sim \psi^{-1}\chi^2(\nu)$ .

Instead of $\psi$ , the scaled inverse chi-squared distribution is however most frequently

parametrized by the scale parameter $\tau^2 = \psi/\nu$ and the distribution $\nu \tau^2 \, \mbox{inv-} \chi^2(\nu)$ is denoted by $\mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ .

In terms of $\tau^2$ the above relations can be written as follows:

If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $\frac{X}{\nu \tau^2} \sim \mbox{inv-} \chi^2(\nu)$ as well as $\frac{\nu \tau^2}{X} \sim \chi^2(\nu)$ and $1/X \sim \frac{1}{\nu \tau^2}\chi^2(\nu)$ .

This family of scaled inverse chi-squared distributions is a reparametrization of the inverse-gamma distribution.

Specifically, if

: $X \sim \psi \, \mbox{inv-} \chi^2(\nu) = \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $X \sim \textrm{Inv-Gamma}\left(\frac{\nu}{2}, \frac{\psi}{2}\right) = \textrm{Inv-Gamma}\left(\frac{\nu}{2}, \frac{\nu\tau^2}{2}\right)$

Either form may be used to represent the maximum entropy distribution for a fixed first inverse moment $(E(1/X))$ and first logarithmic moment $(E(\ln(X))$ .

The scaled inverse chi-squared distribution also has a particular use in Bayesian statistics. Specifically, the scaled inverse chi-squared distribution can be used as a conjugate prior for the variance parameter of a normal distribution.

The same prior in alternative parametrization is given by

the inverse-gamma distribution.

Characterization

The probability density function of the scaled inverse chi-squared distribution extends over the domain $x>0$ and is

: $f(x; \nu, \tau^2)=
\frac{(\tau^2\nu/2)^{\nu/2}}{\Gamma(\nu/2)}~
\frac{\exp\left[ \frac{-\nu \tau^2}{2 x}\right]}{x^{1+\nu/2}}$

where $\nu$ is the degrees of freedom parameter and $\tau^2$ is the scale parameter. The cumulative distribution function is

: $F(x; \nu, \tau^2)=
\Gamma\left(\frac{\nu}{2},\frac{\tau^2\nu}{2x}\right)
\left/\Gamma\left(\frac{\nu}{2}\right)\right.$

: $=Q\left(\frac{\nu}{2},\frac{\tau^2\nu}{2x}\right)$

where $\Gamma(a,x)$ is the incomplete gamma function, $\Gamma(x)$ is the gamma function and $Q(a,x)$ is a regularized gamma function. The characteristic function is

: $\varphi(t;\nu,\tau^2)=$

: $\frac{2}{\Gamma(\frac{\nu}{2})}\left(\frac{-i\tau^2\nu t}{2}\right)^{\!\!\frac{\nu}{4}}\!\!K_{\frac{\nu}{2}}\left(\sqrt{-2i\tau^2\nu t}\right) ,$

where $K_{\frac{\nu}{2}}(z)$ is the modified Modified Bessel function of the second kind.

Parameter estimation

The maximum likelihood estimate of $\tau^2$ is

: $\tau^2 = n/\sum_{i=1}^n \frac{1}{x_i}.$

The maximum likelihood estimate of $\frac{\nu}{2}$ can be found using Newton's method on:

: $\ln\left(\frac{\nu}{2}\right) - \psi\left(\frac{\nu}{2}\right) = \frac{1}{n} \sum_{i=1}^n \ln\left(x_i\right) - \ln\left(\tau^2\right) ,$

where $\psi(x)$ is the digamma function. An initial estimate can be found by taking the formula for mean and solving it for $\nu.$ Let $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$ be the sample mean. Then an initial estimate for $\nu$ is given by:

: $\frac{\nu}{2} = \frac{\bar{x}}{\bar{x} - \tau^2}.$

Bayesian estimation of the variance of a normal distribution

The scaled inverse chi-squared distribution has a second important application, in the Bayesian estimation of the variance of a Normal distribution.

According to Bayes' theorem, the posterior probability distribution for quantities of interest is proportional to the product of a prior distribution for the quantities and a likelihood function:

: $p(\sigma^2|D,I) \propto p(\sigma^2|I) \; p(D|\sigma^2)$

where D represents the data and I represents any initial information about σ² that we may already have.

The simplest scenario arises if the mean μ is already known; or, alternatively, if it is the conditional distribution of σ² that is sought, for a particular assumed value of μ.

Then the likelihood term L(σ²|D) = p(D|σ²) has the familiar form

: $\mathcal{L}(\sigma^2|D,\mu) = \frac{1}{\left(\sqrt{2\pi}\sigma\right)^n} \; \exp \left[ -\frac{\sum_i^n(x_i-\mu)^2}{2\sigma^2} \right]$

Combining this with the rescaling-invariant prior p(σ²|I) = 1/σ², which can be argued (e.g. following Jeffreys) to be the least informative possible prior for σ² in this problem, gives a combined posterior probability

: $p(\sigma^2|D, I, \mu) \propto \frac{1}{\sigma^{n+2}} \; \exp \left[ -\frac{\sum_i^n(x_i-\mu)^2}{2\sigma^2} \right]$

This form can be recognised as that of a scaled inverse chi-squared distribution, with parameters ν = n and τ² = s² = (1/n) Σ (x_i-μ)²

Gelman and co-authors remark that the re-appearance of this distribution, previously seen in a sampling context, may seem remarkable; but given the choice of prior "this result is not surprising."{{cite book |first=Andrew |last=Gelman |first2=John B. |last2=Carlin |first3=Hal S. |last3=Stern |first4=David B. |last4=Dunson |first5=Aki |last5=Vehtari |first6=Donald B. |last6=Rubin |display-authors=1 |page=65 |title=Bayesian Data Analysis |edition=Third |publisher=CRC Press |location=Boca Raton |year=2014 |isbn=978-1-4398-4095-5 }}

In particular, the choice of a rescaling-invariant prior for σ² has the result that the probability for the ratio of σ² / s² has the same form (independent of the conditioning variable) when conditioned on s² as when conditioned on σ²:

: $p(\tfrac{\sigma^2}{s^2}|s^2) = p(\tfrac{\sigma^2}{s^2}|\sigma^2)$

In the sampling-theory case, conditioned on σ², the probability distribution for (1/s²) is a scaled inverse chi-squared distribution; and so the probability distribution for σ² conditioned on s², given a scale-agnostic prior, is also a scaled inverse chi-squared distribution.

= Use as an informative prior =

If more is known about the possible values of σ², a distribution from the scaled inverse chi-squared family, such as Scale-inv-χ²(n₀, s₀²) can be a convenient form to represent a more informative prior for σ², as if from the result of n₀ previous observations (though n₀ need not necessarily be a whole number):

: $p(\sigma^2|I^\prime, \mu) \propto \frac{1}{\sigma^{n_0+2}} \; \exp \left[ -\frac{n_0 s_0^2}{2\sigma^2} \right]$

Such a prior would lead to the posterior distribution

: $p(\sigma^2|D, I^\prime, \mu) \propto \frac{1}{\sigma^{n+n_0+2}} \; \exp \left[ -\frac{ns^2 + n_0 s_0^2}{2\sigma^2} \right]$

which is itself a scaled inverse chi-squared distribution. The scaled inverse chi-squared distributions are thus a convenient conjugate prior family for σ² estimation.

= Estimation of variance when mean is unknown =

If the mean is not known, the most uninformative prior that can be taken for it is arguably the translation-invariant prior p(μ|I) ∝ const., which gives the following joint posterior distribution for μ and σ²,

: $\begin{align}
p(\mu, \sigma^2 \mid D, I) & \propto \frac{1}{\sigma^{n+2}} \exp \left[ -\frac{\sum_i^n(x_i-\mu)^2}{2\sigma^2} \right] \\
& = \frac{1}{\sigma^{n+2}} \exp \left[ -\frac{\sum_i^n(x_i-\bar{x})^2}{2\sigma^2} \right] \exp \left[ -\frac{n(\mu -\bar{x})^2}{2\sigma^2} \right]
\end{align}$

The marginal posterior distribution for σ² is obtained from the joint posterior distribution by integrating out over μ,

: $\begin{align}
p(\sigma^2|D, I) \; \propto \; & \frac{1}{\sigma^{n+2}} \; \exp \left[ -\frac{\sum_i^n(x_i-\bar{x})^2}{2\sigma^2} \right] \; \int_{-\infty}^{\infty} \exp \left[ -\frac{n(\mu -\bar{x})^2}{2\sigma^2} \right] d\mu\\
= \; & \frac{1}{\sigma^{n+2}} \; \exp \left[ -\frac{\sum_i^n(x_i-\bar{x})^2}{2\sigma^2} \right] \; \sqrt{2 \pi \sigma^2 / n} \\
\propto \; & (\sigma^2)^{-(n+1)/2} \; \exp \left[ -\frac{(n-1)s^2}{2\sigma^2} \right]
\end{align}$

This is again a scaled inverse chi-squared distribution, with parameters $\scriptstyle{n-1}\;$ and $\scriptstyle{s^2 = \sum (x_i - \bar{x})^2/(n-1)}$ .

Related distributions

If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $k X \sim \mbox{Scale-inv-}\chi^2(\nu, k \tau^2)\,$
If $X \sim \mbox{inv-}\chi^2(\nu) \,$ (Inverse-chi-squared distribution) then $X \sim \mbox{Scale-inv-}\chi^2(\nu, 1/\nu) \,$
If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $\frac{X}{\tau^2 \nu} \sim \mbox{inv-}\chi^2(\nu) \,$ (Inverse-chi-squared distribution)
If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $X \sim \textrm{Inv-Gamma}\left(\frac{\nu}{2}, \frac{\nu\tau^2}{2}\right)$ (Inverse-gamma distribution)
Scaled inverse chi square distribution is a special case of type 5 Pearson distribution

References

{{cite book |first=Andrew |last=Gelman |first2=John B. |last2=Carlin |first3=Hal S. |last3=Stern |first4=David B. |last4=Dunson |first5=Aki |last5=Vehtari |first6=Donald B. |last6=Rubin |display-authors=1 |page=583 |title=Bayesian Data Analysis |edition=Third |publisher=CRC Press |location=Boca Raton |year=2014 |isbn=978-1-4398-4095-5 }}

Category:Continuous distributions

Category:Exponential family distributions