kernel-independent component analysis

In statistics, kernel-independent component analysis (kernel ICA) is an efficient algorithm for independent component analysis which estimates source components by optimizing a generalized variance contrast function, which is based on representations in a reproducing kernel Hilbert space.{{Cite journal | last1 = Bach | first1 = Francis R. | last2 = Jordan | first2 = Michael I. | doi = 10.1162/153244303768966085 | title = Kernel independent component analysis | journal = The Journal of Machine Learning Research | volume = 3 | pages = 1–48 | year = 2003 | url = https://www.di.ens.fr/~fbach/kernelICA-jmlr.pdf}}{{Cite book | last1 = Bach | first1 = Francis R. | last2 = Jordan | first2 = Michael I. | title = 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03) | chapter = Kernel independent component analysis | doi = 10.1109/icassp.2003.1202783 | volume = 4 | pages = IV-876-9 | year = 2003 | url = https://www.di.ens.fr/~fbach/kernelICA-icassp03.pdf| isbn = 978-0-7803-7663-2 | s2cid = 7691428 }} Those contrast functions use the notion of mutual information as a measure of statistical independence.

Main idea

Kernel ICA is based on the idea that correlations between two random variables can be represented in a reproducing kernel Hilbert space (RKHS), denoted by \mathcal{F}, associated with a feature map L_x: \mathcal{F} \mapsto \mathbb{R} defined for a fixed x \in \mathbb{R}. The \mathcal{F}-correlation between two random variables X and Y is defined as

: \rho_{\mathcal{F}}(X,Y) = \max_{f, g \in \mathcal{F}} \operatorname{corr}( \langle L_X,f \rangle, \langle L_Y,g \rangle)

where the functions f,g: \mathbb{R} \to \mathbb{R} range over \mathcal{F} and

: \operatorname{corr}( \langle L_X,f \rangle, \langle L_Y,g \rangle) := \frac{\operatorname{cov}(f(X), g(Y)) }{\operatorname{var}(f(X))^{1/2} \operatorname{var}(g(Y))^{1/2} }

for fixed f,g \in \mathcal{F}. Note that the reproducing property implies that f(x) = \langle L_x, f \rangle for fixed x \in \mathbb{R} and f \in \mathcal{F}.{{cite book |last=Saitoh |first=Saburou | title=Theory of Reproducing Kernels and Its Applications |publisher=Longman |year=1988|isbn = 978-0582035645}} It follows then that the \mathcal{F}-correlation between two independent random variables is zero.

This notion of \mathcal{F}-correlations is used for defining contrast functions that are optimized in the Kernel ICA algorithm. Specifically, if \mathbf{X} := (x_{ij}) \in \mathbb{R}^{n \times m} is a prewhitened data matrix, that is, the sample mean of each column is zero and the sample covariance of the rows is the m \times m dimensional identity matrix, Kernel ICA estimates a m \times m dimensional orthogonal matrix \mathbf{A} so as to minimize finite-sample \mathcal{F}-correlations between the columns of \mathbf{S} := \mathbf{X} \mathbf{A}^{\prime}.

References

{{Reflist}}

{{Statistics-stub}}

Category:Statistical algorithms