Learning vector quantization

In computer science, learning vector quantization (LVQ) is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems. LVQ can be understood as a special case of an artificial neural network, more precisely, it applies a winner-take-all Hebbian learning-based approach. It is a precursor to self-organizing maps (SOM) and related to neural gas and the k-nearest neighbor algorithm (k-NN). LVQ was invented by Teuvo Kohonen.T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.

Definition

An LVQ system is represented by prototypes W=(w(i),...,w(n)) which are defined in the feature space of observed data. In winner-take-all training algorithms one determines, for each data point, the prototype which is closest to the input according to a given distance measure. The position of this so-called winner prototype is then adapted, i.e. the winner is moved closer if it correctly classifies the data point or moved away if it classifies the data point incorrectly.

An advantage of LVQ is that it creates prototypes that are easy to interpret for experts in the respective application domain.{{citation|author=T. Kohonen|contribution=Learning vector quantization|editor=M.A. Arbib|title=The Handbook of Brain Theory and Neural Networks|pages=537–540|publisher=MIT Press|location=Cambridge, MA|year=1995}}

LVQ systems can be applied to multi-class classification problems in a natural way.

A key issue in LVQ is the choice of an appropriate measure of distance or similarity for training and classification. Recently, techniques have been developed which adapt a parameterized distance measure in the course of training the system, see e.g. (Schneider, Biehl, and Hammer, 2009){{cite journal|author1=P. Schneider |author2=B. Hammer |author3=M. Biehl |title=Adaptive Relevance Matrices in Learning Vector Quantization|journal= Neural Computation|volume=21|issue=10|pages=3532–3561|year=2009|doi=10.1162/neco.2009.10-08-892|pmid=19635012|citeseerx=10.1.1.216.1183|s2cid=17306078}} and references therein.

LVQ can be a source of great help in classifying text documents.{{Citation needed|date=December 2019|reason=removed citation to predatory publisher content}}

Algorithm

The algorithms are presented as in.{{Citation |last=Kohonen |first=Teuvo |title=Learning Vector Quantization |date=2001 |work=Self-Organizing Maps |volume=30 |pages=245–261 |url=http://link.springer.com/10.1007/978-3-642-56927-2_6 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-56927-2_6 |isbn=978-3-540-67921-9}}

Set up:

  • Let the data be denoted by x_i \in \R^D, and their corresponding labels by y_i \in \{1, 2, \dots, C\}.
  • The complete dataset is \{(x_i, y_i)\}_{i=1}^N.
  • The set of code vectors is w_j \in \R^D.
  • The learning rate at iteration step t is denoted by \alpha_t.
  • The hyperparameters w and \epsilon are used by LVQ2 and LVQ3. The original paper suggests \epsilon \in [0.1, 0.5] and w \in [0.2, 0.3].

= LVQ1 =

Initialize several code vectors per label. Iterate until convergence criteria is reached.

  1. Sample a datum x_i, and find out the code vector w_j, such that x_i falls within the Voronoi cell of w_j.
  2. If its label y_i is the same as that of w_j, then w_j \leftarrow w_j + \alpha_t(x_i - w_j), otherwise, w_j \leftarrow w_j - \alpha_t(x_i - w_j).

= LVQ2 =

LVQ2 is the same as LVQ3, but with this sentence removed: "If w_j and w_k and x_i have the same class, then w_j \leftarrow w_j - \alpha_t(x_i - w_j) and w_k \leftarrow w_k + \alpha_t(x_i - w_k).". If w_j and w_k and x_i have the same class, then nothing happens.

= LVQ3 =

File:Apollonian_circles.svg

Initialize several code vectors per label. Iterate until convergence criteria is reached.

  1. Sample a datum x_i, and find out two code vectors w_j, w_k closest to it.
  2. Let d_j := \|x_i - w_j\|, d_k := \|x_i - w_k\|.
  3. If \min \left(\frac{d_j}{d_k}, \frac{d_k}{d_j}\right)>s , where s=\frac{1-w}{1+w}, then
  4. * If w_j and x_i have the same class, and w_k and x_i have different classes, then w_j \leftarrow w_j + \alpha_t(x_i - w_j) and w_k \leftarrow w_k - \alpha_t(x_i - w_k).
  5. * If w_k and x_i have the same class, and w_j and x_i have different classes, then w_j \leftarrow w_j - \alpha_t(x_i - w_j) and w_k \leftarrow w_k + \alpha_t(x_i - w_k).
  6. * If w_j and w_k and x_i have the same class, then w_j \leftarrow w_j - \epsilon\alpha_t(x_i - w_j) and w_k \leftarrow w_k + \epsilon\alpha_t(x_i - w_k).
  7. * If w_k and x_i have different classes, and w_j and x_i have different classes, then the original paper simply does not explain what happens in this case, but presumably nothing happens in this case.
  8. Otherwise, skip.

Note that condition \min \left(\frac{d_j}{d_k}, \frac{d_k}{d_j}\right)>s , where s=\frac{1-w}{1+w}, precisely means that the point x_i falls between two Apollonian spheres.

References

Further reading

  • {{cite journal |last1=Somervuo |first1=Panu |last2=Kohonen |first2=Teuvo |date=1999 |title=Self-organizing maps and learning vector quantization for feature sequences |journal=Neural Processing Letters |volume=10 |issue=2 |pages=151–159 |doi=10.1023/A:1018741720065}}
  • {{Cite journal |last=Nova |first=David |last2=Estévez |first2=Pablo A. |date=2014-09-01 |title=A review of learning vector quantization classifiers |url=https://link.springer.com/article/10.1007/s00521-013-1535-3 |journal=Neural Computing and Applications |language=en |volume=25 |issue=3 |pages=511–524 |doi=10.1007/s00521-013-1535-3 |issn=1433-3058}}