kernel-independent component analysis

In statistics, kernel-independent component analysis (kernel ICA) is an efficient algorithm for independent component analysis which estimates source components by optimizing a generalized variance contrast function, which is based on representations in a reproducing kernel Hilbert space.{{Cite journal | last1 = Bach | first1 = Francis R. | last2 = Jordan | first2 = Michael I. | doi = 10.1162/153244303768966085 | title = Kernel independent component analysis | journal = The Journal of Machine Learning Research | volume = 3 | pages = 1–48 | year = 2003 | url = https://www.di.ens.fr/~fbach/kernelICA-jmlr.pdf}}{{Cite book | last1 = Bach | first1 = Francis R. | last2 = Jordan | first2 = Michael I. | title = 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03) | chapter = Kernel independent component analysis | doi = 10.1109/icassp.2003.1202783 | volume = 4 | pages = IV-876-9 | year = 2003 | url = https://www.di.ens.fr/~fbach/kernelICA-icassp03.pdf| isbn = 978-0-7803-7663-2 | s2cid = 7691428 }} Those contrast functions use the notion of mutual information as a measure of statistical independence.

Main idea

Kernel ICA is based on the idea that correlations between two random variables can be represented in a reproducing kernel Hilbert space (RKHS), denoted by $\mathcal{F}$ , associated with a feature map $L_x: \mathcal{F} \mapsto \mathbb{R}$ defined for a fixed $x \in \mathbb{R}$ . The $\mathcal{F}$ -correlation between two random variables $X$ and $Y$ is defined as

: $\rho_{\mathcal{F}}(X,Y) = \max_{f, g \in \mathcal{F}} \operatorname{corr}( \langle L_X,f \rangle, \langle L_Y,g \rangle)$

where the functions $f,g: \mathbb{R} \to \mathbb{R}$ range over $\mathcal{F}$ and

: $\operatorname{corr}( \langle L_X,f \rangle, \langle L_Y,g \rangle) := \frac{\operatorname{cov}(f(X), g(Y)) }{\operatorname{var}(f(X))^{1/2} \operatorname{var}(g(Y))^{1/2} }$

for fixed $f,g \in \mathcal{F}$ . Note that the reproducing property implies that $f(x) = \langle L_x, f \rangle$ for fixed $x \in \mathbb{R}$ and $f \in \mathcal{F}$ .{{cite book |last=Saitoh |first=Saburou | title=Theory of Reproducing Kernels and Its Applications |publisher=Longman |year=1988|isbn = 978-0582035645}} It follows then that the $\mathcal{F}$ -correlation between two independent random variables is zero.

This notion of $\mathcal{F}$ -correlations is used for defining contrast functions that are optimized in the Kernel ICA algorithm. Specifically, if $\mathbf{X} := (x_{ij}) \in \mathbb{R}^{n \times m}$ is a prewhitened data matrix, that is, the sample mean of each column is zero and the sample covariance of the rows is the $m \times m$ dimensional identity matrix, Kernel ICA estimates a $m \times m$ dimensional orthogonal matrix $\mathbf{A}$ so as to minimize finite-sample $\mathcal{F}$ -correlations between the columns of $\mathbf{S} := \mathbf{X} \mathbf{A}^{\prime}$ .

References

Category:Statistical algorithms