integrated nested Laplace approximations

{{Short description|Bayesian inference method}}

{{Bayesian statistics}}

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method.{{cite journal |last1=Rue |first1=Håvard |last2=Martino |first2=Sara |last3=Chopin |first3=Nicolas |title=Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations |journal=J. R. Statist. Soc. B |date=2009 |volume=71 |issue=2 |pages=319–392 |doi=10.1111/j.1467-9868.2008.00700.x|s2cid=1657669 |hdl=2066/75507 |hdl-access=free }} It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.{{cite journal |last1=Taylor |first1=Benjamin M. |last2=Diggle |first2=Peter J. |title=INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes |journal=Journal of Statistical Computation and Simulation |date=2014 |volume=84 |issue=10 |pages=2266–2284 |doi=10.1080/00949655.2013.788653|arxiv=1202.1738 |s2cid=88511801 }}{{cite journal |last1=Teng |first1=M. |last2=Nathoo |first2=F. |last3=Johnson |first3=T. D. |title=Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods |journal=Journal of Statistical Computation and Simulation |date=2017 |volume=87 |issue=11 |pages=2227–2252 |doi=10.1080/00949655.2017.1326117|pmid=29200537 |pmc=5708893 }}{{cite book |last1=Wang |first1=Xiaofeng |last2=Yue |first2=Yu Ryan |last3=Faraway |first3=Julian J. |title=Bayesian Regression Modeling with INLA |date=2018 |publisher=Chapman and Hall/CRC |isbn=9781498727259}} Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology.{{cite book |last1=Blangiardo |first1=Marta |last2=Cameletti |first2=Michela |title=Spatial and Spatio-temporal Bayesian Models with R-INLA |date=2015 |publisher=John Wiley & Sons, Ltd |isbn=9781118326558}}{{cite journal |last1=Opitz |first1=T. |title=Latent Gaussian modeling and INLA: A review with focus on space-time applications |journal=Journal de la Société Française de Statistique |date=2017 |volume=158 |pages=62–85|arxiv=1708.02723 }}{{cite book |last1=Moraga |first1=Paula |title=Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny |date=2019 |publisher=Chapman and Hall/CRC |isbn=9780367357955}} It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.{{cite journal |last1=Lindgren |first1=Finn |last2=Rue |first2=Håvard |last3=Lindström |first3=Johan |title=An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach |journal=J. R. Statist. Soc. B |date=2011 |volume=73 |issue=4 |pages=423–498 |doi=10.1111/j.1467-9868.2011.00777.x|s2cid=120949984 |hdl=20.500.11820/1084d335-e5b4-4867-9245-ec9c4f6f4645 |hdl-access=free }}{{cite journal |last1=Lezama-Ochoa |first1=N. |last2=Grazia Pennino |first2=M. |last3=Hall |first3=M. A. |last4=Lopez |first4=J. |last5=Murua |first5=H. |title=Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular) |journal=Scientific Reports |date=2020 |volume=10 |issue=1 |pages=18822 |doi=10.1038/s41598-020-73879-3|pmid=33139744 |pmc=7606447 |bibcode=2020NatSR..1018822L }} The INLA method is implemented in the R-INLA R package.{{cite web |title=R-INLA Project |url=https://www.r-inla.org |access-date=21 April 2022}}

Latent Gaussian models

Let \boldsymbol{y}=(y_1,\dots,y_n) denote the response variable (that is, the observations) which belongs to an exponential family, with the mean \mu_i (of y_i) being linked to a linear predictor \eta_i via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector \boldsymbol{x}. The hyperparameters of the model are denoted by \boldsymbol{\theta}. As per Bayesian statistics, \boldsymbol{x} and \boldsymbol{\theta} are random variables with prior distributions.

The observations are assumed to be conditionally independent given \boldsymbol{x} and \boldsymbol{\theta}:

\pi(\boldsymbol{y} |\boldsymbol{x}, \boldsymbol{\theta}) = \prod_{i \in \mathcal{I}}\pi(y_i | \eta_i, \boldsymbol{\theta}),

where \mathcal{I} is the set of indices for observed elements of \boldsymbol{y} (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor \boldsymbol{\eta} is part of \boldsymbol{x}.

For the model to be a latent Gaussian model, it is assumed that \boldsymbol{x}|\boldsymbol{\theta} is a Gaussian Markov Random Field (GMRF) (that is, a multivariate Gaussian with additional conditional independence properties) with probability density

\pi(\boldsymbol{x} | \boldsymbol{\theta}) \propto \left| \boldsymbol{Q_{\theta}} \right|^{1/2} \exp \left( -\frac{1}{2} \boldsymbol{x}^T \boldsymbol{Q_{\theta}} \boldsymbol{x} \right),

where \boldsymbol{Q_{\theta}} is a \boldsymbol{\theta}-dependent sparse precision matrix and \left| \boldsymbol{Q_{\theta}} \right| is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution \pi(\boldsymbol{\theta}) for the hyperparameters need not be Gaussian. However, the number of hyperparameters, m=\mathrm{dim}(\boldsymbol{\theta}), is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables \boldsymbol{x} and \boldsymbol{\theta}. Applying Bayes' theorem

\pi(\boldsymbol{x}, \boldsymbol{\theta} | \boldsymbol{y}) = \frac{\pi(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\theta})\pi(\boldsymbol{x} | \boldsymbol{\theta}) \pi(\boldsymbol{\theta}) }{\pi(\boldsymbol{y})},

the joint posterior distribution of \boldsymbol{x} and \boldsymbol{\theta} is given by

\begin{align}

\pi(\boldsymbol{x}, \boldsymbol{\theta} | \boldsymbol{y}) & \propto \pi(\boldsymbol{\theta})\pi(\boldsymbol{x}|\boldsymbol{\theta}) \prod_i \pi(y_i | \eta_i, \boldsymbol{\theta}) \\

& \propto \pi(\boldsymbol{\theta}) \left| \boldsymbol{Q_{\theta}} \right|^{1/2} \exp \left( -\frac{1}{2} \boldsymbol{x}^T \boldsymbol{Q_{\theta}} \boldsymbol{x} + \sum_i \log \left[\pi(y_i | \eta_i, \boldsymbol{\theta}) \right] \right).

\end{align}

Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals

\begin{array}{rcl}

\pi(x_i | \boldsymbol{y}) &=& \int \pi(x_i | \boldsymbol{\theta}, \boldsymbol{y}) \pi(\boldsymbol{\theta} | \boldsymbol{y}) d\boldsymbol{\theta} \\

\pi(\theta_j | \boldsymbol{y}) &=& \int \pi(\boldsymbol{\theta} | \boldsymbol{y}) d \boldsymbol{\theta}_{-j} ,

\end{array}

where \boldsymbol{\theta}_{-j} = \left(\theta_1, \dots, \theta_{j-1}, \theta_{j+1}, \dots, \theta_m \right).

A key idea of INLA is to construct nested approximations given by

\begin{array}{rcl}

\widetilde{\pi}(x_i | \boldsymbol{y}) &=& \int \widetilde{\pi}(x_i | \boldsymbol{\theta}, \boldsymbol{y}) \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) d\boldsymbol{\theta} \\

\widetilde{\pi}(\theta_j | \boldsymbol{y}) &=& \int \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) d \boldsymbol{\theta}_{-j} ,

\end{array}

where \widetilde{\pi}(\cdot | \cdot) is an approximated posterior density. The approximation to the marginal density \pi(x_i | \boldsymbol{y}) is obtained in a nested fashion by first approximating \pi(\boldsymbol{\theta} | \boldsymbol{y}) and \pi(x_i | \boldsymbol{\theta}, \boldsymbol{y}), and then numerically integrating out \boldsymbol{\theta} as

\begin{align}

\widetilde{\pi}(x_i | \boldsymbol{y}) = \sum_k \widetilde{\pi}\left( x_i | \boldsymbol{\theta}_k, \boldsymbol{y} \right) \times \widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) \times \Delta_k,

\end{align}

where the summation is over the values of \boldsymbol{\theta}, with integration weights given by \Delta_k. The approximation of \pi(\theta_j | \boldsymbol{y}) is computed by numerically integrating \boldsymbol{\theta}_{-j} out from \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}).

To get the approximate distribution \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}), one can use the relation

\begin{align}

{\pi}( \boldsymbol{\theta} | \boldsymbol{y}) = \frac{\pi\left(\boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{y} \right)}{\pi\left(\boldsymbol{x} | \boldsymbol{\theta}, \boldsymbol{y} \right) \pi(\boldsymbol{y})},

\end{align}

as the starting point. Then \widetilde{\pi}( \boldsymbol{\theta} | \boldsymbol{y}) is obtained at a specific value of the hyperparameters \boldsymbol{\theta} = \boldsymbol{\theta}_k with Laplace's approximation

\begin{align}

\widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) &\propto \left . \frac{\pi\left(\boldsymbol{x}, \boldsymbol{\theta}_k, \boldsymbol{y} \right)}{\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)} \right \vert_{\boldsymbol{x} = \boldsymbol{x}^{*}(\boldsymbol{\theta}_k)}, \\

& \propto \left . \frac{\pi(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\theta}_k)\pi(\boldsymbol{x} | \boldsymbol{\theta}_k) \pi(\boldsymbol{\theta}_k)}{\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)} \right \vert_{\boldsymbol{x} = \boldsymbol{x}^{*}(\boldsymbol{\theta}_k)},

\end{align}

where \widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right) is the Gaussian approximation to {\pi}\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right) whose mode at a given \boldsymbol{\theta}_k is \boldsymbol{x}^{*}(\boldsymbol{\theta}_k). The mode can be found numerically for example with the Newton-Raphson method.

The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of \boldsymbol{x} in the denominator since it is usually close to a Gaussian due to the GMRF property of \boldsymbol{x}. Applying the approximation here improves the accuracy of the method, since the posterior {\pi}( \boldsymbol{\theta} | \boldsymbol{y}) itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on {\pi}( \boldsymbol{\theta} | \boldsymbol{y}). The second important property of a GMRF, the sparsity of the precision matrix \boldsymbol{Q}_{\boldsymbol{\theta}_k}, is required for efficient computation of \widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) for each value {\boldsymbol{\theta}_k}.

Obtaining the approximate distribution \widetilde{\pi}\left( x_i | \boldsymbol{\theta}_k, \boldsymbol{y} \right) is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation. For the numerical integration to obtain \widetilde{\pi}(x_i | \boldsymbol{y}), also three options are available: grid search, central composite design, or empirical Bayes.

References

{{reflist}}

Further reading

  • {{cite book |first=Virgilio |last=Gomez-Rubio |title=Bayesian inference with INLA |location= |publisher=Chapman and Hall/CRC |year=2021 |isbn=978-1-03-217453-2 }}

Category:Computational statistics

Category:Bayesian inference