Whitening transformation
{{Short description|Decorrelation method that converts a covariance matrix of a set of samples into an identity matrix}}
A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1.{{cite journal|last1=Koivunen|first1=A.C.|last2=Kostinski|first2=A.B.|title=The Feasibility of Data Whitening to Improve Performance of Weather Radar|year=1999|doi=10.1175/1520-0450(1999)038<0741:TFODWT>2.0.CO;2|journal=Journal of Applied Meteorology|volume=38|issue=6|pages=741–749|issn=1520-0450|bibcode=1999JApMe..38..741K|url=https://digitalcommons.mtu.edu/cgi/viewcontent.cgi?article=1279&context=physics-fp|doi-access=free}} The transformation is called "whitening" because it changes the input vector into a white noise vector.
Several other transformations are closely related to whitening:
- the decorrelation transform removes only the correlations but leaves variances intact,
- the standardization transform sets variances to 1 but leaves correlations intact,
- a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.{{cite web|last1=Hossain|first1=Miliha|title=Whitening and Coloring Transforms for Multivariate Gaussian Random Variables|url=https://www.projectrhea.org/rhea/index.php/ECE662_Whitening_and_Coloring_Transforms_S14_MH|publisher=Project Rhea|access-date=21 March 2016}}
Definition
Suppose is a random (column) vector with non-singular covariance matrix and mean . Then the transformation with
a whitening matrix satisfying the condition yields the whitened random vector with unit diagonal covariance.
If has non-zero mean , then whitening can be performed by .
There are infinitely many possible whitening matrices that all satisfy the above condition. Commonly used choices are (Mahalanobis or ZCA whitening), where is the Cholesky decomposition of (Cholesky whitening),{{cite journal|last1=Kessy|first1=A.|last2=Lewin|first2=A.|last3=Strimmer|first3=K.|title=Optimal whitening and decorrelation|year=2018|journal=The American Statistician| volume=72|issue=4| pages=309–314|doi=10.1080/00031305.2016.1277159|arxiv=1512.00809|s2cid=55075085 }} or the eigen-system of (PCA whitening).{{cite journal|last1=Friedman|first1=J.|title=Exploratory Projection Pursuit|journal=Journal of the American Statistical Association|volume=82|issue=397|pages=249–266|jstor=2289161|year=1987|issn=0162-1459|doi=10.1080/01621459.1987.10478427|osti=1447861 |url=https://www.slac.stanford.edu/cgi-bin/getdoc/slac-pub-3841.pdf}}
Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of and . For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original and whitened is produced by the whitening matrix where is the correlation matrix and the diagonal variance matrix.
Whitening a data matrix
Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by maximum likelihood) and subsequently constructing a corresponding estimated whitening matrix (e.g. by Cholesky decomposition).
High-dimensional whitening
This modality is a generalization of the pre-whitening procedure extended to more general spaces where is usually assumed to be a random function or other random objects in a Hilbert space . One of the main issues of extending whitening to infinite dimensions is that the covariance operator has an unbounded inverse in . Nevertheless, if one assumes that Picard condition holds for in the range space of the covariance operator, whitening becomes possible.{{cite journal|last1=Vidal|first1=M.|last2=Aguilera|first2=A.M.|title=Novel whitening approaches in functional settings|journal=STAT|volume=12|issue=1|pages=e516|year=2022|doi=10.1002/sta4.516|doi-access=free|hdl=1854/LU-8770510|hdl-access=free}} A whitening operator can be then defined from the factorization of the Moore–Penrose inverse of the covariance operator, which has effective mapping on Karhunen–Loève type expansions of . The advantage of these whitening transformations is that they can be optimized according to the underlying topological properties of the data, thus producing more robust whitening representations. High-dimensional features of the data can be exploited through kernel regressors or basis function systems.{{cite book|last1=Ramsay|first1=J.O.|last2=Silverman|first2=J.O.|date=2005|title=Functional Data Analysis|url=https://link.springer.com/book/10.1007/b98888|publisher=Springer New York, NY|doi=10.1007/b98888 |isbn=978-0-387-40080-8}}
R implementation
An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package {{cite web|url=https://cran.r-project.org/package=whitening|title=whitening R package|access-date=2018-11-25}} published on CRAN. The R package "pfica"{{cite web|url=https://cran.r-project.org/web/packages/pfica|title=pfica R package|date=6 January 2023 |access-date=2023-02-11}} allows the computation of high-dimensional whitening representations using basis function systems (B-splines, Fourier basis, etc.).
See also
- Decorrelation
- Principal component analysis
- Weighted least squares
- Canonical correlation
- Mahalanobis distance (is Euclidean after W. transformation).
References
{{reflist}}
External links
- http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
- [http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf The ZCA whitening transformation]. Appendix A of Learning Multiple Layers of Features from Tiny Images by A. Krizhevsky.