FastICA

FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology.{{Cite journal | last1 = Hyvärinen | first1 = A. | last2 = Oja | first2 = E. | doi = 10.1016/S0893-6080(00)00026-5 | title = Independent component analysis: Algorithms and applications | journal = Neural Networks | volume = 13 | issue = 4–5 | pages = 411–430 | year = 2000 | pmid = 10946390| url = http://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf| citeseerx = 10.1.1.79.7003 }}{{Cite journal | last1 = Hyvarinen | first1 = A. | title = Fast and robust fixed-point algorithms for independent component analysis | doi = 10.1109/72.761722 | journal = IEEE Transactions on Neural Networks | volume = 10 | issue = 3 | pages = 626–634 | year = 1999 | pmid = 18252563| url = http://www.cs.helsinki.fi/u/ahyvarin/papers/TNN99new.pdf| citeseerx = 10.1.1.297.8229 }} Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration.

Algorithm

= ''Prewhitening'' the data =

Let the $\mathbf{X} := (x_{ij}) \in \mathbb{R}^{N \times M}$ denote the input data matrix, $M$ the number of columns corresponding with the number of samples of mixed signals and $N$ the number of rows corresponding with the number of independent source signals. The input data matrix $\mathbf{X}$ must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it.

Centering the data entails demeaning each component of the input data $\mathbf{X}$ , that is,

x_{ij} \leftarrow x_{ij} - \frac{1}{M} \sum_{j^{\prime}} x_{ij^{\prime}}

:for each $i =1,\ldots,N$ and $j = 1, \ldots, M$ . After centering, each row of $\mathbf{X}$ has an expected value of $0$ .

Whitening the data requires a linear transformation $\mathbf{L}: \mathbb{R}^{N \times M} \to \mathbb{R}^{N \times M}$ of the centered data so that the components of $\mathbf{L}(\mathbf{X})$ are uncorrelated and have variance one. More precisely, if $\mathbf{X}$ is a centered data matrix, the covariance of $\mathbf{L}_{\mathbf{x}} := \mathbf{L}(\mathbf{X})$ is the $(N \times N)$ -dimensional identity matrix, that is,

\mathrm{E}\left \{ \mathbf{L}_{\mathbf{x}} \mathbf{L}_{\mathbf{x}}^{T} \right \} = \mathbf{I}_N

: A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data $\mathbf{X}$ , $E\left \{ \mathbf{X} \mathbf{X}^{T} \right \} = \mathbf{E}\mathbf{D}\mathbf{E}^T$ , where $\mathbf{E}$ is the matrix of eigenvectors and $\mathbf{D}$ is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by

\mathbf{X} \leftarrow \mathbf{D}^{-1/2}\mathbf{E}^T\mathbf{X}.

= Single component extraction =

The iterative algorithm finds the direction for the weight vector $\mathbf{w} \in \mathbb{R}^N$

that maximizes a measure of non-Gaussianity of the projection $\mathbf{w}^T \mathbf{X}$ ,

with $\mathbf{X} \in \mathbb{R}^{N \times M}$ denoting a prewhitened data matrix as described above.

Note that $\mathbf{w}$ is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function $f(u)$ , its first derivative $g(u)$ , and its second derivative $g^{\prime}(u)$ . Hyvärinen states that the functions

f(u) = \log \cosh (u), \quad  g(u) = \tanh (u), \quad \text{and} \quad {g}'(u) = 1-\tanh^2(u),

are useful for general purposes, while

f(u) = -e^{-u^2/2}, \quad g(u) = u e^{-u^2/2}, \quad \text{and} \quad {g}'(u) = (1-u^2) e^{-u^2/2}

may be highly robust. The steps for extracting the weight vector $\mathbf{w}$ for single component in FastICA are the following:

Randomize the initial weight vector $\mathbf{w}$
Let

\mathbf{w}^+ \leftarrow E\left\{\mathbf{X} g(\mathbf{w}^T \mathbf{X})^T\right\} - E\left\{g'(\mathbf{w}^T \mathbf{X})\right\}\mathbf{w} , where

E\left\{...\right\}

means averaging over all column-vectors of matrix

\mathbf{X}

Let $\mathbf{w} \leftarrow \mathbf{w}^+ / \|\mathbf{w}^+\|$
If not converged, go back to 2

= Multiple component extraction =

The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, $\mathbf{1_{M}}$ is a column vector of 1's of dimension $M$ .

Algorithm FastICA

:Input: $C$ Number of desired components

:Input: $\mathbf{X} \in \mathbb{R}^{N \times M}$ Prewhitened matrix, where each column represents an $N$ -dimensional sample, where $C <= N$

:Output: $\mathbf{W} \in \mathbb{R}^{N \times C}$ Un-mixing matrix where each column projects $\mathbf{X}$ onto independent component.

:Output: $\mathbf{S} \in \mathbb{R}^{C \times M}$ Independent components matrix, with $M$ columns representing a sample with $C$ dimensions.

for p in 1 to C:

$\mathbf{w_p} \leftarrow$ Random vector of length N

while $\mathbf{w_p}$ changes

$\mathbf{w_p} \leftarrow \frac{1}{M}\mathbf{X} g(\mathbf{w_p}^T \mathbf{X})^T - \frac{1}{M}g'(\mathbf{w_p}^T\mathbf{X})\mathbf{1_{M}} \mathbf{w_p}$

$\mathbf{w_p} \leftarrow \mathbf{w_p} - \sum_{j = 1}^{p-1} (\mathbf{w_p}^T\mathbf{w_j})\mathbf{w_j}$

$\mathbf{w_p} \leftarrow \frac{\mathbf{w_p}}{\|\mathbf{w_p}\|}$

output $\mathbf{W} \leftarrow \begin{bmatrix} \mathbf{w_1}, \dots, \mathbf{w_C} \end{bmatrix}$

output $\mathbf{S} \leftarrow \mathbf{W^T}\mathbf{X}$

References

External links

[https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html FastICA in Python]
[http://www.cis.hut.fi/projects/ica/fastica/ FastICA package for Matlab or Octave]
[https://cran.r-project.org/web/packages/fastICA/index.html fastICA package] in R programming language
[http://sourceforge.net/projects/fastica FastICA in Java] on SourceForge
[http://rapid-i.com/wiki/index.php?title=Independent_Component_Analysis FastICA in Java] in RapidMiner.
[https://github.com/derekharrison/FastICA FastICA in Matlab]
[http://mdp-toolkit.sourceforge.net/index.html FastICA in MDP]
[https://juliastats.org/MultivariateStats.jl/stable/ica/ FastICA in Julia]

Category:Factor analysis

Category:Computational statistics

Category:Machine learning algorithms