Whitening transformation

{{Short description|Decorrelation method that converts a covariance matrix of a set of samples into an identity matrix}}

A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1.{{cite journal|last1=Koivunen|first1=A.C.|last2=Kostinski|first2=A.B.|title=The Feasibility of Data Whitening to Improve Performance of Weather Radar|year=1999|doi=10.1175/1520-0450(1999)038<0741:TFODWT>2.0.CO;2|journal=Journal of Applied Meteorology|volume=38|issue=6|pages=741–749|issn=1520-0450|bibcode=1999JApMe..38..741K|url=https://digitalcommons.mtu.edu/cgi/viewcontent.cgi?article=1279&context=physics-fp|doi-access=free}} The transformation is called "whitening" because it changes the input vector into a white noise vector.

Several other transformations are closely related to whitening:

the decorrelation transform removes only the correlations but leaves variances intact,
the standardization transform sets variances to 1 but leaves correlations intact,
a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.{{cite web|last1=Hossain|first1=Miliha|title=Whitening and Coloring Transforms for Multivariate Gaussian Random Variables|url=https://www.projectrhea.org/rhea/index.php/ECE662_Whitening_and_Coloring_Transforms_S14_MH|publisher=Project Rhea|access-date=21 March 2016}}

Definition

Suppose $X$ is a random (column) vector with non-singular covariance matrix $\Sigma$ and mean $0$ . Then the transformation $Y = W X$ with

a whitening matrix $W$ satisfying the condition $W^\mathrm{T} W = \Sigma^{-1}$ yields the whitened random vector $Y$ with unit diagonal covariance.

If $X$ has non-zero mean $\mu$ , then whitening can be performed by $Y = W (X - \mu)$ .

There are infinitely many possible whitening matrices $W$ that all satisfy the above condition. Commonly used choices are $W = \Sigma^{-1/2}$ (Mahalanobis or ZCA whitening), $W = L^T$ where $L$ is the Cholesky decomposition of $\Sigma^{-1}$ (Cholesky whitening),{{cite journal|last1=Kessy|first1=A.|last2=Lewin|first2=A.|last3=Strimmer|first3=K.|title=Optimal whitening and decorrelation|year=2018|journal=The American Statistician| volume=72|issue=4| pages=309–314|doi=10.1080/00031305.2016.1277159|arxiv=1512.00809|s2cid=55075085 }} or the eigen-system of $\Sigma$ (PCA whitening).{{cite journal|last1=Friedman|first1=J.|title=Exploratory Projection Pursuit|journal=Journal of the American Statistical Association|volume=82|issue=397|pages=249–266|jstor=2289161|year=1987|issn=0162-1459|doi=10.1080/01621459.1987.10478427|osti=1447861 |url=https://www.slac.stanford.edu/cgi-bin/getdoc/slac-pub-3841.pdf}}

Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of $X$ and $Y$ . For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original $X$ and whitened $Y$ is produced by the whitening matrix $W = P^{-1/2} V^{-1/2}$ where $P$ is the correlation matrix and $V$ the diagonal variance matrix.

Whitening a data matrix

Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by maximum likelihood) and subsequently constructing a corresponding estimated whitening matrix (e.g. by Cholesky decomposition).

High-dimensional whitening

This modality is a generalization of the pre-whitening procedure extended to more general spaces where $X$ is usually assumed to be a random function or other random objects in a Hilbert space $H$ . One of the main issues of extending whitening to infinite dimensions is that the covariance operator has an unbounded inverse in $H$ . Nevertheless, if one assumes that Picard condition holds for $X$ in the range space of the covariance operator, whitening becomes possible.{{cite journal|last1=Vidal|first1=M.|last2=Aguilera|first2=A.M.|title=Novel whitening approaches in functional settings|journal=STAT|volume=12|issue=1|pages=e516|year=2022|doi=10.1002/sta4.516|doi-access=free|hdl=1854/LU-8770510|hdl-access=free}} A whitening operator can be then defined from the factorization of the Moore–Penrose inverse of the covariance operator, which has effective mapping on Karhunen–Loève type expansions of $X$ . The advantage of these whitening transformations is that they can be optimized according to the underlying topological properties of the data, thus producing more robust whitening representations. High-dimensional features of the data can be exploited through kernel regressors or basis function systems.{{cite book|last1=Ramsay|first1=J.O.|last2=Silverman|first2=J.O.|date=2005|title=Functional Data Analysis|url=https://link.springer.com/book/10.1007/b98888|publisher=Springer New York, NY|doi=10.1007/b98888 |isbn=978-0-387-40080-8}}

R implementation

An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package {{cite web|url=https://cran.r-project.org/package=whitening|title=whitening R package|access-date=2018-11-25}} published on CRAN. The R package "pfica"{{cite web|url=https://cran.r-project.org/web/packages/pfica|title=pfica R package|date=6 January 2023 |access-date=2023-02-11}} allows the computation of high-dimensional whitening representations using basis function systems (B-splines, Fourier basis, etc.).

References

External links

http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
[http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf The ZCA whitening transformation]. Appendix A of Learning Multiple Layers of Features from Tiny Images by A. Krizhevsky.

Category:Classification algorithms