Watanabe–Akaike information criterion

{{Short description|Generalized version of the Akaike information criterion}}

In statistics, the Widely Applicable Information Criterion (WAIC), also known as Watanabe–Akaike information criterion, is the generalized version of the Akaike information criterion (AIC) onto singular statistical models.{{cite journal |authorlink=Sumio Watanabe |first=Sumio |last=Watanabe |year=2010 |title=Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory |journal=Journal of Machine Learning Research |volume=11 |pages=3571–3594 }} It is used as measure how well will model predict data it wasn't trained on. It is asymptotically equivalent to cross-validation loss.{{Citation |last=Watanabe |first=Sumio |title=Higher Order Equivalence of Bayes Cross Validation and WAIC |date=2018 |work=Information Geometry and Its Applications |volume=252 |pages=47–73 |editor-last=Ay |editor-first=Nihat |url=http://link.springer.com/10.1007/978-3-319-97798-0_3 |access-date=2024-11-14 |place=Cham |publisher=Springer International Publishing |doi=10.1007/978-3-319-97798-0_3 |isbn=978-3-319-97797-3 |editor2-last=Gibilisco |editor2-first=Paolo |editor3-last=Matúš |editor3-first=František}} Lower values of WAIC correspond to better performance.

If we take log pointwise predictive density:

: $\text{lppd}(y, \Theta) = \sum_{i} \log \frac{1}{S} \sum_{s} p(y_i \mid \Theta_s)$

Then:

: $\text{WAIC}(y, \Theta) = -2 \left( \text{lppd} - \underbrace{\sum_{i} \operatorname{Var}_{\theta} \log p(y_i \mid \theta)}_{\text{penalty term}} \right)$

Where $y$ is predicted output in training data. Θ is models posterior distribution, $s$ are samples from posterior, and i iterates over training data.

In other words, in Bayesian statistics the posterior is represented by list of samples from it. WAIC penalty is then the variance of predictions among these samples, calculated and added for each datapoint from dataset.{{Cite book |last1=McElreath |first1=Richard |author-link1=Richard McElreath |title=Statistical Rethinking : A Bayesian Course with Examples in R and Stan |publisher=Chapman and Hall/CRC |year=2020 |isbn=978-0-367-13991-9 |edition=2nd}}

The penalty term is often referred to as the "effective number of parameters". This terminology stems from historical conventions, as a similar term is used in the Akaike Information Criterion.

Watanabe recommends in practice calculating both WAIC and PSIS – Pareto Smoothed Importance Sampling. Both are approximations of leave-one-out cross-validation. If they disagree then at least one of them is not reliable. Similarly PSIS can sometimes detect if its estimate is not reliable (if $\hat{k}$ > 0.7).{{Cite book |last=Watanabe |first=Sumio |title=Mathematical theory of Bayesian statistics |date=2020 |publisher=CRC Press, Taylor & Francis Group |isbn=978-1-4822-3806-8 |edition=First issued in paperback |location=Boca Raton London New York}}

Some textbooks of Bayesian statistics recommend WAIC over other information criteria, especially for multilevel and mixture models.{{Cite book |last1=Gelman |first1=Andrew |author-link1=Andrew Gelman |title=Bayesian Data Analysis |last2=Carlin |first2=John B. |author-link2=John Carlin (professor) |last3=Stern |first3=Hal S. |last4=Dunson |first4=David B. |last5=Vehtari |first5=Aki |last6=Rubin |first6=Donald B. |author-link6=Donald Rubin |publisher=Chapman and Hall/CRC |year=2013 |isbn=978-1-4398-4095-5 |edition=Third}}

Widely applicable Bayesian information criterion (WBIC) is the generalized version of Bayesian information criterion (BIC) onto singular statistical models.{{cite journal |first=Sumio |last=Watanabe |year=2013 |url=http://www.jmlr.org/papers/volume14/watanabe13a/watanabe13a.pdf |title=A Widely Applicable Bayesian Information Criterion |journal=Journal of Machine Learning Research |volume=14 |pages=867–897 }}

WBIC is the average log likelihood function over the posterior distribution with the inverse temperature > 1/log n where n is the sample size.

Both WAIC and WBIC can be numerically calculated without any information about a true distribution.

References

Category:Model selection

Category:Bayesian statistics

Watanabe–Akaike information criterion

See also

References