integrated nested Laplace approximations

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method.{{cite journal |last1=Rue |first1=Håvard |last2=Martino |first2=Sara |last3=Chopin |first3=Nicolas |title=Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations |journal=J. R. Statist. Soc. B |date=2009 |volume=71 |issue=2 |pages=319–392 |doi=10.1111/j.1467-9868.2008.00700.x|s2cid=1657669 |hdl=2066/75507 |hdl-access=free }} It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.{{cite journal |last1=Taylor |first1=Benjamin M. |last2=Diggle |first2=Peter J. |title=INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes |journal=Journal of Statistical Computation and Simulation |date=2014 |volume=84 |issue=10 |pages=2266–2284 |doi=10.1080/00949655.2013.788653|arxiv=1202.1738 |s2cid=88511801 }}{{cite journal |last1=Teng |first1=M. |last2=Nathoo |first2=F. |last3=Johnson |first3=T. D. |title=Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods |journal=Journal of Statistical Computation and Simulation |date=2017 |volume=87 |issue=11 |pages=2227–2252 |doi=10.1080/00949655.2017.1326117|pmid=29200537 |pmc=5708893 }}{{cite book |last1=Wang |first1=Xiaofeng |last2=Yue |first2=Yu Ryan |last3=Faraway |first3=Julian J. |title=Bayesian Regression Modeling with INLA |date=2018 |publisher=Chapman and Hall/CRC |isbn=9781498727259}} Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology.{{cite book |last1=Blangiardo |first1=Marta |last2=Cameletti |first2=Michela |title=Spatial and Spatio-temporal Bayesian Models with R-INLA |date=2015 |publisher=John Wiley & Sons, Ltd |isbn=9781118326558}}{{cite journal |last1=Opitz |first1=T. |title=Latent Gaussian modeling and INLA: A review with focus on space-time applications |journal=Journal de la Société Française de Statistique |date=2017 |volume=158 |pages=62–85|arxiv=1708.02723 }}{{cite book |last1=Moraga |first1=Paula |title=Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny |date=2019 |publisher=Chapman and Hall/CRC |isbn=9780367357955}} It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.{{cite journal |last1=Lindgren |first1=Finn |last2=Rue |first2=Håvard |last3=Lindström |first3=Johan |title=An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach |journal=J. R. Statist. Soc. B |date=2011 |volume=73 |issue=4 |pages=423–498 |doi=10.1111/j.1467-9868.2011.00777.x|s2cid=120949984 |hdl=20.500.11820/1084d335-e5b4-4867-9245-ec9c4f6f4645 |hdl-access=free }}{{cite journal |last1=Lezama-Ochoa |first1=N. |last2=Grazia Pennino |first2=M. |last3=Hall |first3=M. A. |last4=Lopez |first4=J. |last5=Murua |first5=H. |title=Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular) |journal=Scientific Reports |date=2020 |volume=10 |issue=1 |pages=18822 |doi=10.1038/s41598-020-73879-3|pmid=33139744 |pmc=7606447 |bibcode=2020NatSR..1018822L }} The INLA method is implemented in the R-INLA R package.{{cite web |title=R-INLA Project |url=https://www.r-inla.org |access-date=21 April 2022}}

Latent Gaussian models

Let $\boldsymbol{y}=(y_1,\dots,y_n)$ denote the response variable (that is, the observations) which belongs to an exponential family, with the mean $\mu_i$ (of $y_i$ ) being linked to a linear predictor $\eta_i$ via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector $\boldsymbol{x}$ . The hyperparameters of the model are denoted by $\boldsymbol{\theta}$ . As per Bayesian statistics, $\boldsymbol{x}$ and $\boldsymbol{\theta}$ are random variables with prior distributions.

The observations are assumed to be conditionally independent given $\boldsymbol{x}$ and $\boldsymbol{\theta}$ :

$\pi(\boldsymbol{y} |\boldsymbol{x}, \boldsymbol{\theta}) = \prod_{i \in \mathcal{I}}\pi(y_i | \eta_i, \boldsymbol{\theta}),$

where $\mathcal{I}$ is the set of indices for observed elements of $\boldsymbol{y}$ (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor $\boldsymbol{\eta}$ is part of $\boldsymbol{x}$ .

For the model to be a latent Gaussian model, it is assumed that $\boldsymbol{x}|\boldsymbol{\theta}$ is a Gaussian Markov Random Field (GMRF) (that is, a multivariate Gaussian with additional conditional independence properties) with probability density

$\pi(\boldsymbol{x} | \boldsymbol{\theta}) \propto \left| \boldsymbol{Q_{\theta}} \right|^{1/2} \exp \left( -\frac{1}{2} \boldsymbol{x}^T \boldsymbol{Q_{\theta}} \boldsymbol{x} \right),$

where $\boldsymbol{Q_{\theta}}$ is a $\boldsymbol{\theta}$ -dependent sparse precision matrix and $\left| \boldsymbol{Q_{\theta}} \right|$ is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution $\pi(\boldsymbol{\theta})$ for the hyperparameters need not be Gaussian. However, the number of hyperparameters, $m=\mathrm{dim}(\boldsymbol{\theta})$ , is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables $\boldsymbol{x}$ and $\boldsymbol{\theta}$ . Applying Bayes' theorem

$\pi(\boldsymbol{x}, \boldsymbol{\theta} | \boldsymbol{y}) = \frac{\pi(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\theta})\pi(\boldsymbol{x} | \boldsymbol{\theta}) \pi(\boldsymbol{\theta}) }{\pi(\boldsymbol{y})},$

the joint posterior distribution of $\boldsymbol{x}$ and $\boldsymbol{\theta}$ is given by

$\begin{align}
\pi(\boldsymbol{x}, \boldsymbol{\theta} | \boldsymbol{y}) & \propto \pi(\boldsymbol{\theta})\pi(\boldsymbol{x}|\boldsymbol{\theta}) \prod_i \pi(y_i | \eta_i, \boldsymbol{\theta}) \\
& \propto \pi(\boldsymbol{\theta}) \left| \boldsymbol{Q_{\theta}} \right|^{1/2} \exp \left( -\frac{1}{2} \boldsymbol{x}^T \boldsymbol{Q_{\theta}} \boldsymbol{x} + \sum_i \log \left[\pi(y_i | \eta_i, \boldsymbol{\theta}) \right] \right).
\end{align}$

Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals

$\begin{array}{rcl}
\pi(x_i | \boldsymbol{y}) &=& \int \pi(x_i | \boldsymbol{\theta}, \boldsymbol{y}) \pi(\boldsymbol{\theta} | \boldsymbol{y}) d\boldsymbol{\theta} \\
\pi(\theta_j | \boldsymbol{y}) &=& \int \pi(\boldsymbol{\theta} | \boldsymbol{y}) d \boldsymbol{\theta}_{-j} ,
\end{array}$

where $\boldsymbol{\theta}_{-j} = \left(\theta_1, \dots, \theta_{j-1}, \theta_{j+1}, \dots, \theta_m \right)$ .

A key idea of INLA is to construct nested approximations given by

$\begin{array}{rcl}
\widetilde{\pi}(x_i | \boldsymbol{y}) &=& \int \widetilde{\pi}(x_i | \boldsymbol{\theta}, \boldsymbol{y}) \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) d\boldsymbol{\theta} \\
\widetilde{\pi}(\theta_j | \boldsymbol{y}) &=& \int \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) d \boldsymbol{\theta}_{-j} ,
\end{array}$

where $\widetilde{\pi}(\cdot | \cdot)$ is an approximated posterior density. The approximation to the marginal density $\pi(x_i | \boldsymbol{y})$ is obtained in a nested fashion by first approximating $\pi(\boldsymbol{\theta} | \boldsymbol{y})$ and $\pi(x_i | \boldsymbol{\theta}, \boldsymbol{y})$ , and then numerically integrating out $\boldsymbol{\theta}$ as

$\begin{align}
\widetilde{\pi}(x_i | \boldsymbol{y}) = \sum_k \widetilde{\pi}\left( x_i | \boldsymbol{\theta}_k, \boldsymbol{y} \right) \times \widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) \times \Delta_k,
\end{align}$

where the summation is over the values of $\boldsymbol{\theta}$ , with integration weights given by $\Delta_k$ . The approximation of $\pi(\theta_j | \boldsymbol{y})$ is computed by numerically integrating $\boldsymbol{\theta}_{-j}$ out from $\widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y})$ .

To get the approximate distribution $\widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y})$ , one can use the relation

$\begin{align}
{\pi}( \boldsymbol{\theta} | \boldsymbol{y}) = \frac{\pi\left(\boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{y} \right)}{\pi\left(\boldsymbol{x} | \boldsymbol{\theta}, \boldsymbol{y} \right) \pi(\boldsymbol{y})},
\end{align}$

as the starting point. Then $\widetilde{\pi}( \boldsymbol{\theta} | \boldsymbol{y})$ is obtained at a specific value of the hyperparameters $\boldsymbol{\theta} = \boldsymbol{\theta}_k$ with Laplace's approximation

$\begin{align}
\widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) &\propto \left . \frac{\pi\left(\boldsymbol{x}, \boldsymbol{\theta}_k, \boldsymbol{y} \right)}{\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)} \right \vert_{\boldsymbol{x} = \boldsymbol{x}^{*}(\boldsymbol{\theta}_k)}, \\
& \propto \left . \frac{\pi(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\theta}_k)\pi(\boldsymbol{x} | \boldsymbol{\theta}_k) \pi(\boldsymbol{\theta}_k)}{\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)} \right \vert_{\boldsymbol{x} = \boldsymbol{x}^{*}(\boldsymbol{\theta}_k)},
\end{align}$

where $\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)$ is the Gaussian approximation to ${\pi}\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)$ whose mode at a given $\boldsymbol{\theta}_k$ is $\boldsymbol{x}^{*}(\boldsymbol{\theta}_k)$ . The mode can be found numerically for example with the Newton-Raphson method.

The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of $\boldsymbol{x}$ in the denominator since it is usually close to a Gaussian due to the GMRF property of $\boldsymbol{x}$ . Applying the approximation here improves the accuracy of the method, since the posterior ${\pi}( \boldsymbol{\theta} | \boldsymbol{y})$ itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on ${\pi}( \boldsymbol{\theta} | \boldsymbol{y})$ . The second important property of a GMRF, the sparsity of the precision matrix $\boldsymbol{Q}_{\boldsymbol{\theta}_k}$ , is required for efficient computation of $\widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y})$ for each value ${\boldsymbol{\theta}_k}$ .

Obtaining the approximate distribution $\widetilde{\pi}\left( x_i | \boldsymbol{\theta}_k, \boldsymbol{y} \right)$ is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation. For the numerical integration to obtain $\widetilde{\pi}(x_i | \boldsymbol{y})$ , also three options are available: grid search, central composite design, or empirical Bayes.

integrated nested Laplace approximations

Latent Gaussian models

Approximate Bayesian inference with INLA

References

Further reading