Semidefinite embedding

Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.{{Harvnb|Weinberger, Sha and Saul|2004a}}{{Harvnb|Weinberger and Saul|2004b}}{{Harvnb|Weinberger and Saul|2006}}

It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality,{{Harvnb|Lawrence|2012|loc=page 1612}} as it leverages the Kernel trick to non-linearly map the original data into an inner-product space.

Algorithm

MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 7.}}

  1. A neighbourhood graph is created. Each input is connected with its k-nearest input vectors (according to Euclidean distance metric) and all k-nearest neighbors are connected with each other. If the data is sampled well enough, the resulting graph is a discrete approximation of the underlying manifold.
  2. The neighbourhood graph is "unfolded" with the help of semidefinite programming. Instead of learning the output vectors directly, the semidefinite programming aims to find an inner product matrix that maximizes the pairwise distances between any two inputs that are not connected in the neighbourhood graph while preserving the nearest neighbors distances.
  3. The low-dimensional embedding is finally obtained by application of multidimensional scaling on the learned inner product matrix.

The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.{{harvnb|Linial, London and Rabinovich|1995}}

Optimization formulation

Let X \,\! be the original input and Y\,\! be the embedding. If i,j\,\! are two neighbors, then the local isometry constraint that needs to be satisfied is:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 3, equation 8}}{{Harvnb|Weinberger and Saul|2004b|loc=page 3, equation 2}}{{Harvnb|Weinberger and Saul|2006|loc=page 4, equation 2}}

:|X_{i}-X_{j}|^{2}=|Y_{i}-Y_{j}|^{2}\,\!

Let G, K\,\! be the Gram matrices of X \,\! and Y \,\! (i.e.: G_{ij}=X_i \cdot X_j,K_{ij}=Y_i \cdot Y_j \,\!). We can express the above constraint for every neighbor points i,j\,\! in term of G, K\,\!:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 3, equation 9}}{{Harvnb|Weinberger and Saul|2004b|loc=page 3, equation 3}}

:G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}\,\!

In addition, we also want to constrain the embedding Y \,\! to center at the origin:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 3, equation 6}}{{Harvnb|Weinberger and Saul|2004b|loc=page 3, equation 5}}{{Harvnb|Weinberger and Saul|2006|loc=page 5, equation 8}}

0 = |\sum_{i}Y_{i}|^2\Leftrightarrow(\sum_{i}Y_{i}) \cdot (\sum_{i}Y_{i})\Leftrightarrow\sum_{i,j}Y_{i} \cdot Y_{j}\Leftrightarrow\sum_{i,j}K_{ij}

As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 4, equation 10}}{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 6}}{{Harvnb|Weinberger and Saul|2006|loc=page 5, equation 4}}

T(Y)=\dfrac{1}{2N}\sum_{i,j}|Y_{i}-Y_{j}|^{2}

Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint {{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 7}}

Let \tau = max \{\eta_{ij}|Y_{i}-Y_{j}|^2\} \,\! where

\eta_{ij} := \begin{cases}

1 & \mbox{if}\ i \mbox{ is a neighbour of } j \\

0 & \mbox{otherwise} .

\end{cases}

prevents the objective function from diverging (going to infinity).

Since the graph has N points, the distance between any two points |Y_{i}-Y_{j}|^2 \leq N \tau \,\!. We can then bound the objective function as follows:{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 8}}{{Harvnb|Weinberger and Saul|2006|loc=page 5, equation 6}}

:T(Y)=\dfrac{1}{2N}\sum_{i,j}|Y_{i}-Y_{j}|^{2} \leq \dfrac{1}{2N}\sum_{i,j}(N\tau)^2 = \dfrac{N^3\tau^2}{2} \,\!

The objective function can be rewritten purely in the form of the Gram matrix:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 4, equation 11}}{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 9}}{{Harvnb|Weinberger and Saul|2006|loc=page 6, equations 10 to 13}}

:

\begin{align}

T(Y) &{}= \dfrac{1}{2N}\sum_{i,j}|Y_{i}-Y_{j}|^{2} \\

&{}= \dfrac{1}{2N}\sum_{i,j}(Y_{i}^2+Y_{j}^2-Y_{i} \cdot Y_{j} - Y_{j} \cdot Y_{i})\\

&{}= \dfrac{1}{2N}(\sum_{i,j}Y_{i}^2+\sum_{i,j}Y_{j}^2-\sum_{i,j}Y_{i} \cdot Y_{j} -\sum_{i,j}Y_{j} \cdot Y_{i})\\

&{}= \dfrac{1}{2N}(\sum_{i,j}Y_{i}^2+\sum_{i,j}Y_{j}^2-0 -0)\\

&{}= \dfrac{1}{N}(\sum_{i}Y_{i}^2)=\dfrac{1}{N}(Tr(K))\\

\end{align}

\,\!

Finally, the optimization can be formulated as:{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 4, section 3.3}}{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 9}}{{Harvnb|Weinberger and Saul|2006|loc=page 6, equations 10 to 13}}

\begin{align}

& \text{Maximize} && Tr(\mathbf{K})\\

& \text{subject to} && \mathbf{K} \succeq 0, \sum_{ij}\mathbf{K}_{ij} = 0 \\

& \text{and} && G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}, \forall i, j \mbox{ where } \eta_{ij} = 1,

\end{align}

After the Gram matrix K \,\! is learned by semidefinite programming, the output Y \,\! can be obtained via Cholesky decomposition.

In particular, the Gram matrix can be written as K_{ij}=\sum_{\alpha = 1}^{N}(\lambda_{\alpha } V_{\alpha i} V_{\alpha j}) \,\! where V_{\alpha i} \,\! is the i-th element of eigenvector V_{\alpha} \,\! of the eigenvalue \lambda_{\alpha } \,\!.{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 10}}{{Harvnb|Weinberger and Saul|2006|loc=page 7, equations 14}}

It follows that the \alpha \,\!-th element of the output Y_i \,\! is \sqrt{\lambda_{\alpha }} V_{\alpha i} \,\!.{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 11}}{{Harvnb|Weinberger and Saul|2006|loc=page 7, equations 15}}

See also

Notes

{{Reflist|2}}

References

  • {{cite journal|author-last=Linial, London and Rabinovich | author-first=Nathan, Eran and Yuri|title=The geometry of graphs and some of its algorithmic applications|journal=Combinatorica|year=1995|volume=15| issue=2|pages=215–245| doi=10.1007/BF01200757| s2cid=5071936|url=https://www.researchgate.net/publication/228057833}}
  • {{cite conference|author-last= Weinberger, Sha and Saul|author-first=Kilian Q., Fei and Lawrence K.|title=Learning a kernel matrix for nonlinear dimensionality reduction|conference=Proceedings of the Twenty First International Conference on Machine Learning (ICML 2004). Banff, Alberta, Canada|date=4 July 2004a|url=https://repository.upenn.edu/cis_papers/2/}}
  • {{cite conference|author-last=Weinberger and Saul|author-first=Kilian Q. and Lawrence K. | title=Unsupervised learning of image manifolds by semidefinite programming|volume=2|conference=2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition|date=27 June 2004b|url=https://repository.upenn.edu/cis_papers/1/}}
  • {{cite journal|author-last=Weinberger and Saul| author-first=Kilian Q. and Lawrence K.| title=Unsupervised learning of image manifolds by semidefinite programming|volume=70|journal=International Journal of Computer Vision|date=1 May 2006| pages=77–90|doi=10.1007/s11263-005-4939-z| s2cid=291166|url=http://bicmr.pku.edu.cn/~wenzw/bigdata/weinberger-saul-mvu-image-manifolds.pdf}}
  • {{cite journal|last=Lawrence|first=Neil D|title=A unifying probabilistic perspective for spectral dimensionality reduction: insights and new models|year=2012|pages=1612|volume=13|issue=May|journal=Journal of Machine Learning Research|url=http://www.jmlr.org/papers/v13/lawrence12a.html|bibcode=2010arXiv1010.4830L|arxiv=1010.4830}}

Additional material

  • [https://web.archive.org/web/20120313162120/http://www.cse.wustl.edu/~kilian/code/code.html Kilian Q. Weinberger's MVU Matlab code]

Category:Computational statistics

Category:Dimension reduction

Category:Optimization algorithms and methods