Random projection

{{Short description|Technique to reduce dimensionality of points in Euclidean space}}

{{multiple issues|

}}

In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse.{{cite conference

| first1 = Bingham | last1 = Ella

| first2 = Mannila | last2 = Heikki

| title = Random projection in dimensionality reduction: Applications to image and text data

| book-title = KDD-2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

| pages = 245–250

| publisher = Association for Computing Machinery | date = 2001 | location = New York

| doi = 10.1145/502512.502546 | citeseerx = 10.1.1.24.5135

}} They have been applied to many natural language tasks under the name random indexing.

Dimensionality reduction

Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including: principal component analysis, linear discriminant analysis, canonical correlation analysis, discrete cosine transform, random projection, etc.

Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset.

Method

The core idea behind random projection is given in the Johnson-Lindenstrauss lemma,{{cite book

| last1 = Johnson

| first1 = William B.

| author1-link = William B. Johnson (mathematician)

| last2 = Lindenstrauss

| first2 = Joram

| author2-link = Joram Lindenstrauss

| contribution = Extensions of Lipschitz mappings into a Hilbert space

| doi = 10.1090/conm/026/737400

| location = Providence, RI

| mr = 737400

| pages = [https://archive.org/details/conferenceinmode0000conf/page/189 189–206]

| publisher = American Mathematical Society

| series = Contemporary Mathematics

| title = Conference in Modern Analysis and Probability (New Haven, Conn., 1982)

| volume = 26

| year = 1984

| isbn = 9780821850305

| s2cid = 117819162

| url-access = registration

| url = https://archive.org/details/conferenceinmode0000conf/page/189

}}.

which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability.

In random projection, the original $d$ -dimensional data is projected to a $k$ -dimensional subspace, by multiplying on the left by a random matrix $R \in \R^{k \times d}$ . Using matrix notation: If $X_{d \times N}$ is the original set of N d-dimensional observations, then $X_{k \times N}^{RP}=R_{k \times d}X_{d \times N}$ is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the $d \times N$ data matrix X onto K dimensions of order $O(dkN)$ . If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order $O(ckN)$ .{{Cite web|url =http://www.ime.unicamp.br/~wanderson/Artigos/randon_projection_kdd.pdf |title =Random projection in dimensionality reduction: Applications to image and text data |date = May 6, 2014|website = | last1 = Bingham | first1 = Ella | last2 = Mannila | first2 = Heikki }}

= Orthogonal random projection =

A unit vector can be orthogonally projected to a random subspace. Let $u$ be the original unit vector, and let $v$ be its projection. The norm-squared $\|v\|_2^2$ has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first $k$ coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution $x \sim \mathcal N(0, I_{d \times d})$ , then normalizing it.

Therefore, $\|v\|_2^2$ has the same distribution as $\frac{\sum_{i=1}^k x_i^2}{\sum_{i=1}^k x_i^2 + \sum_{i=k+1}^{d} x_i^2}$ , which by the chi-squared construction of the Beta distribution, has distribution $\operatorname{Beta}(k/2, (d-k)/2)$ , with mean $k/d$ .

We have a concentration inequality $Pr\left[\left|\|v\|_2-\frac{k}{d}\right| \geq \epsilon \sqrt{\frac{k}{d}}\right] \leq 3 \exp \left(-k \epsilon^2 / 64\right)$ for any $\epsilon \in (0, 1)$ .{{Citation |last=Mahoney |first=Michael W. |title=Lecture Notes on Randomized Linear Algebra |date=2016-08-16 |arxiv=1608.04481 }}{{Pg|page=50}}

=Gaussian random projection=

The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from $S^{d-1}$ . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied:

Spherical symmetry: For any orthogonal matrix $A \in O(d)$ , RA and R have the same distribution.
Orthogonality: The rows of R are orthogonal to each other.
Normality: The rows of R are unit-length vectors.

=More computationally efficient random projections=

Achlioptas{{cite book|doi=10.1145/375551.375608|chapter=Database-friendly random projections|title=Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '01|pages=274–281|year=2001|last1=Achlioptas|first1=Dimitris|isbn=978-1581133615|citeseerx=10.1.1.28.6652|s2cid=2640788}} has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to

: $R_{i,j} = \sqrt{3/k} \times \begin{cases}
+1 & \text{with probability }\frac{1}{6}\\
0 & \text{with probability }\frac{2}{3}\\
-1 & \text{with probability }\frac{1}{6} \end{cases}$

or the full matrix can be sampled IID according to $R_{i,j} = \sqrt{1/k} \times \begin{cases}
+1 & \text{with probability }\frac{1}{2}\\
-1 & \text{with probability }\frac{1}{2} \end{cases}$ Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in.{{cite book

| title = Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining | chapter = Very sparse random projections | doi = 10.1145/1150402.1150436

| issue = 1

| pages = 287–296

| year = 2006| isbn = 1595933395 | s2cid = 7995734 }}

It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform.{{cite journal

| first1 = Daniel M. | last1 = Kane | last2 = Nelson | first2 = Jelani

| doi = 10.1145/2559902

| issue = 1

| pages = 1–23 | journal = Journal of the ACM

| title = Sparser Johnson-Lindenstrauss Transforms

| volume = 61

| year = 2014

| mr = 3167920| arxiv = 1012.1577| s2cid = 7821848 }}

This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster.

=Random Projection with Quantization=

Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash,{{cite book

| first1 = Moses | last1 = Charikar

| title = Proceedings of the thiry-fourth annual ACM symposium on Theory of computing

| chapter = Similarity estimation techniques from rounding algorithms

| doi = 10.1145/509907.509965

| issue = 1

| pages = 380–388

| volume = 1

| year = 2002| isbn = 1581134959

| s2cid = 4229473

}}

RP tree,{{cite journal

| issue = 1

| pages = 473–480 | journal = 20th International Conference on Neural Information Processing Systems

| title = Learning the structure of manifolds using random projections

| volume = 1

| year = 2007}}

and other memory efficient estimation and learning methods.{{cite book

| first1 = Petros | last1 = Boufounos | first2 = Richard | last2 = Baraniuk

| title = 2008 42nd Annual Conference on Information Sciences and Systems | chapter = 1-Bit compressive sensing | issue = 1

| pages = 16–21

| volume = 1

| year = 2008| doi = 10.1109/CISS.2008.4558487 | isbn = 978-1-4244-2246-3 | s2cid = 206563812 }}

{{cite journal

| first1 = Xiaoyun | last1 = Li | first2 = Ping | last2 = Li

| pages = 15150–15160 | journal = 33rd International Conference on Neural Information Processing Systems

| title = Generalization error analysis of quantized compressive learning

| volume = 1

| year = 2019}}

Large quasiorthogonal [[Basis (linear algebra)|bases]]

The Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension n with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of n-dimensional Euclidean space.{{citation

| last1 = Kainen | first1 = Paul C. | author1-link = Paul Chester Kainen

| last2 = Kůrková | first2 = Věra | author2-link = Věra Kůrková

| doi = 10.1016/0893-9659(93)90023-G

| issue = 3

| journal = Applied Mathematics Letters

| mr = 1347278

| pages = 7–10

| title = Quasiorthogonal dimension of Euclidean spaces

| volume = 6

| year = 1993| doi-access = free

}} There are exponentially large (in dimension n) sets of almost orthogonal vectors (with small value of inner products) in n–dimensional Euclidean space. This observation is useful in indexing of high-dimensional data.{{cite book |last1=Hecht-Nielsen |first1=R. |chapter=Context vectors: General-purpose approximate meaning representations self-organized from raw data |pages=43–56 |editor1-last=Zurada |editor1-first=Jacek M. |editor2-last=Marks |editor2-first=Robert Jackson |editor3-last=Robinson |editor3-first=Charles J. |title=Computational Intelligence: Imitating Life |date=1994 |publisher=IEEE |isbn=978-0-7803-1104-6 }}

Quasiorthogonality of large random sets is important for methods of random approximation in machine learning. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one.{{cite journal

| first1 = Alexander N. | last1 = Gorban | author1-link = Aleksandr Gorban

| doi = 10.1016/j.ins.2015.09.021

| pages = 129–145

| journal = Information Sciences

| title = Approximation with Random Bases: Pro et Contra

| volume = 364-365

| year = 2016

| arxiv = 1506.04631| s2cid = 2239376 }} This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space.

Implementations

[https://cran.r-project.org/web/packages/RandPro/index.html RandPro] - An R package for random projection {{cite journal |last1=Ravindran |first1=Siddharth |title=A Data-Independent Reusable Projection (DIRP) Technique for Dimension Reduction in Big Data Classification Using k-Nearest Neighbor (k-NN) |journal=National Academy Science Letters |volume=43 |pages=13–21 |doi=10.1007/s40009-018-0771-6 |year=2020 |s2cid=91946077 }}{{cite journal |last1=Siddharth |first1=R. |last2=Aghila |first2=G. |title=RandPro- A practical implementation of random projection-based feature extraction for high dimensional multivariate data analysis in R |journal=SoftwareX |date=July 2020 |volume=12 |pages=100629 |doi=10.1016/j.softx.2020.100629 |bibcode=2020SoftX..1200629S |doi-access=free }}
[http://scikit-learn.org/stable/modules/random_projection.html sklearn.random_projection] - A module for random projection from the scikit-learn Python library
Weka implementation [http://weka.sourceforge.net/doc.stable/weka/filters/unsupervised/attribute/RandomProjection.html]

Random projection

Dimensionality reduction

Method

= Orthogonal random projection =

=Gaussian random projection=

=More computationally efficient random projections=

=Random Projection with Quantization=

Large quasiorthogonal [[Basis (linear algebra)|bases]]

Implementations

See also

References

Further reading