Gaussian process approximations
{{more footnotes needed|date=November 2024}}
In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.
Basic ideas
In statistical modeling, it is often convenient to assume that , the phenomenon under investigation is a Gaussian process indexed by which has mean function and covariance function .
One can also assume that data are values of a particular realization of this process for indices .
Consequently, the joint distribution of the data can be expressed as
:
\mathbf{y} \sim \mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma}),
where and , i.e. respectively a matrix with the covariance function values and a vector with the mean function values at corresponding (pairs of) indices.
The negative log-likelihood of the data then takes the form
:
-\log\ell(\mathbf{y}) = \frac{d}{2\pi} + \frac{1}{2}\log\det(\mathbf{\Sigma}) + \left(\mathbf{y}-\mathbf{\mu}\right)^\top\mathbf{\Sigma}^{-1}\left(\mathbf{y}-\mathbf{\mu}\right)
Similarly, the best predictor of , the values of for indices , given data has the form
:
\mathbf{\mu}^*_\mathbf{y} = \mathbb{E}\left[\mathbf{y}^*|\mathbf{y}\right] = \mathbf{\mu}^* - \mathbf{\Sigma}_{\mathbf{y}^*\mathbf{y}} \mathbf{\Sigma}^{-1}\left(\mathbf{y} - \mathbf{\mu}\right)
In the context of Gaussian models, especially in geostatistics, prediction using the best predictor, i.e. mean conditional on the data, is also known as kriging.
The most computationally expensive component of the best predictor formula is inverting the covariance matrix , which has cubic complexity . Similarly, evaluating likelihood involves both calculating and the determinant which has the same cubic complexity.
Gaussian process approximations can often be expressed in terms of assumptions on under which and can be calculated with much lower complexity. Since these assumptions are generally not believed to reflect reality, the likelihood and the best predictor obtained in this way are not exact, but they are meant to be close to their original values.
Model-based methods
This class of approximations is expressed through a set of assumptions which are imposed on the original process and which, typically, imply some special structure of the covariance matrix. Although most of these methods were developed independently, most of them can be expressed as special cases of the sparse general Vecchia approximation.
= Sparse covariance methods =
These methods approximate the true model in a way the covariance matrix is sparse. Typically, each method proposes its own algorithm that takes the full advantage of the sparsity pattern in the covariance matrix. Two prominent members of this class of approaches are covariance tapering and domain partitioning. The first method generally requires a metric over and assumes that for we have only if
= Sparse precision methods =
This family of methods assumes that the precision matrix
= Sparse Cholesky factor methods =
In many practical applications, calculating
= Low-rank methods =
While this approach encompasses many methods, the common assumption underlying them all is the assumption, that
where
More generally, on top of selecting
= Hierarchical methods =
The general principle of hierarchical approximations consists of a repeated application of some other method, such that each consecutive application refines the quality of the approximation. Even though they can be expressed as a set of statistical assumptions, they are often described in terms of a hierarchical matrix approximation (HODLR) or basis function expansion (LatticeKrig, MRA, wavelets). The hierarchical matrix approach can often be represented as a repeated application of a low-rank approximation to successively smaller subsets of the index set
= Unified framework =
Probabilistic graphical models provide a convenient framework for comparing model-based approximations. In this context, value of the process at index
Methods without a statistical model
This class of methods does not specify a statistical model or impose assumptions on an existing one. Three major members of this group are the meta-kriging algorithm, the gapfill algorithm and Local Approximate Gaussian Process approach. The first one partitions the set of indices into
References
- {{cite journal|arxiv=1807.01065|last1=Liu|first1=Haitao|last2=Ong|first2=Yew-Soon|last3=Shen|first3=Xiaobo|last4=Cai|first4=Jianfei|title=When Gaussian Process Meets Big Data: A Review of Scalable GPS|journal=IEEE Transactions on Neural Networks and Learning Systems|year=2020|volume=PP|issue=11 |pages=1–19|doi=10.1109/TNNLS.2019.2957109|pmid=31944966}}
- {{cite journal|last1=Heaton|first1=Matthew J.|last2=Datta|first2=Abhirup|last3=Finley|first3=Andrew O.|last4=Furrer|first4=Reinhard|last5=Guinness|first5=Joseph|last6=Guhaniyogi|first6=Rajarshi|last7=Gerber|first7=Florian|last8=Gramacy|first8=Robert B.|last9=Hammerling|first9=Dorit|last10=Katzfuss|first10=Matthias|last11=Lindgren|first11=Finn|last12=Nychka|first12=Douglas W.|last13=Sun|first13=Furong|last14=Zammit-Mangion|first14=Andrew|title=A Case Study Competition Among Methods for Analyzing Large Spatial Data|journal=Journal of Agricultural, Biological and Environmental Statistics|volume=24|issue=3|year=2018|pages=398–425|issn=1085-7117|doi=10.1007/s13253-018-00348-w|doi-access=free|pmid=31496633 |pmc=6709111}}
- {{cite journal|last1=Banerjee|first1=Sudipto|title=High-Dimensional Bayesian Geostatistics|journal=Bayesian Analysis|volume=12|issue=2|year=2017|pages=583–614|doi=10.1214/17-BA1056R|pmid=29391920|doi-access=free|pmc=5790125}}