Restricted Boltzmann machine
{{Short description|Class of artificial neural network}}
{{Machine learning|Artificial neural network}}
File:Restricted Boltzmann machine.svg
A restricted Boltzmann machine (RBM) (also called a restricted Sherrington–Kirkpatrick model with external field or restricted stochastic Ising–Lenz–Little model) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.{{citation|title=Solvable Model of a Spin-Glass|number=35|year=1975|author1= Sherrington, David|author2=Kirkpatrick, Scott|journal=Physical Review Letters|volume=35|pages=1792–1796|doi=10.1103/PhysRevLett.35.1792|bibcode=1975PhRvL..35.1792S}}
RBMs were initially proposed under the name Harmonium by Paul Smolensky in 1986,{{cite book
|last=Smolensky
|first=Paul
|editor1-first=David E.
|editor1-last=Rumelhart
|editor2-first=James L.
|editor2-last=McLelland
|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations
|publisher=MIT Press
|year=1986
|pages=[https://archive.org/details/paralleldistribu00rume/page/194 194–281]
|chapter=Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory
|chapter-url=https://stanford.edu/~jlmcc/papers/PDP/Volume%201/Chap6_PDP86.pdf
|isbn=0-262-68053-X
|title-link=Connectionism
}} and rose to prominence after Geoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction,{{Cite journal | last1 = Hinton | first1 = G. E. | last2 = Salakhutdinov | first2 = R. R. | title = Reducing the Dimensionality of Data with Neural Networks | doi = 10.1126/science.1127647 | journal = Science | volume = 313 | issue = 5786 | pages = 504–507 | year = 2006 | pmid = 16873662 | url = http://www.cs.toronto.edu/~hinton/science.pdf | bibcode = 2006Sci...313..504H | s2cid = 1658773 | access-date = 2015-12-02 | archive-date = 2015-12-23 | archive-url = https://web.archive.org/web/20151223152006/http://www.cs.toronto.edu/~hinton/science.pdf | url-status = dead }} classification,{{Cite conference | last1 = Larochelle | first1 = H. | last2 = Bengio | first2 = Y. | doi = 10.1145/1390156.1390224 | title = Classification using discriminative restricted Boltzmann machines | conference = Proceedings of the 25th international conference on Machine learning - ICML '08 | pages = 536 | year = 2008 | isbn = 978-1-60558-205-4 | url = http://machinelearning.org/archive/icml2008/papers/601.pdf}} collaborative filtering,{{Cite conference | doi = 10.1145/1273496.1273596| title = Restricted Boltzmann machines for collaborative filtering| conference = Proceedings of the 24th international conference on Machine learning - ICML '07| pages = 791| year = 2007| last1 = Salakhutdinov | first1 = R. | last2 = Mnih | first2 = A. | last3 = Hinton | first3 = G. | isbn = 978-1-59593-793-3}} feature learning,{{cite conference
|last1 = Coates
|first1 = Adam
|last2 = Lee
|first2 = Honglak
|last3 = Ng
|first3 = Andrew Y.
|title = An analysis of single-layer networks in unsupervised feature learning
|conference = International Conference on Artificial Intelligence and Statistics (AISTATS)
|year = 2011
|url = http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf
|access-date = 2014-12-19
|archive-date = 2014-12-20
|archive-url = https://web.archive.org/web/20141220030058/http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf
|url-status = dead
}} topic modelling,Ruslan Salakhutdinov and Geoffrey Hinton (2010). [http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf Replicated softmax: an undirected topic model] {{Webarchive|url=https://web.archive.org/web/20120525063031/http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf |date=2012-05-25 }}. Neural Information Processing Systems 23. immunology,{{Cite journal |last1=Bravi |first1=Barbara |last2=Di Gioacchino |first2=Andrea |last3=Fernandez-de-Cossio-Diaz |first3=Jorge |last4=Walczak |first4=Aleksandra M |last5=Mora |first5=Thierry |last6=Cocco |first6=Simona |last7=Monasson |first7=Rémi |date=2023-09-08 |editor-last=Bitbol |editor-first=Anne-Florence |editor2-last=Eisen |editor2-first=Michael B |title=A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity |journal=eLife |volume=12 |pages=e85126 |doi=10.7554/eLife.85126 |doi-access=free |issn=2050-084X |pmc=10522340 |pmid=37681658}} and even many{{nbh}}body quantum mechanics.{{Cite journal|last1=Carleo|first1=Giuseppe|last2=Troyer|first2=Matthias|date=2017-02-10|title=Solving the quantum many-body problem with artificial neural networks|journal=Science|language=en|volume=355|issue=6325|pages=602–606|doi=10.1126/science.aag2302|pmid=28183973|issn=0036-8075|arxiv=1606.02318|bibcode=2017Sci...355..602C|s2cid=206651104}}
They can be trained in either supervised or unsupervised ways, depending on the task.{{fact|date=February 2024}}
As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph:
- a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and
- there are no connections between nodes within a group.
By contrast, "unrestricted" Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm.Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.221.8829&rep=rep1&type=pdf#page=42 On contrastive divergence learning]. Artificial Intelligence and Statistics.
Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation.{{Cite journal
| last1 = Hinton | first1 = G.
| title = Deep belief networks
| doi = 10.4249/scholarpedia.5947
|doi-access= free| journal = Scholarpedia
| volume = 4
| issue = 5
| pages = 5947
| year = 2009
| bibcode = 2009SchpJ...4.5947H}}
Structure
The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a matrix of weights of size . Each weight element of the matrix is associated with the connection between the visible (input) unit and the hidden unit . In addition, there are bias weights (offsets) for and for . Given the weights and biases, the energy of a configuration (pair of Boolean vectors) {{math|(v,h)}} is defined as
:
or, in matrix notation,
:
This energy function is analogous to that of a Hopfield network. As with general Boltzmann machines, the joint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows,Geoffrey Hinton (2010). [http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf A Practical Guide to Training Restricted Boltzmann Machines]. UTML TR 2010–003, University of Toronto.
:
where is a partition function defined as the sum of over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1. The marginal probability of a visible vector is the sum of over all possible hidden layer configurations,
:,
and vice versa. Since the underlying graph structure of the RBM is bipartite (meaning there are no intra-layer connections), the hidden unit activations are mutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations. That is, for m visible units and n hidden units, the conditional probability of a configuration of the visible units {{mvar|v}}, given a configuration of the hidden units {{mvar|h}}, is
:.
Conversely, the conditional probability of {{mvar|h}} given {{mvar|v}} is
:.
The individual activation probabilities are given by
: and
where denotes the logistic sigmoid.
The visible units of Restricted Boltzmann Machine can be multinomial, although the hidden units are Bernoulli.{{clarify|date=April 2021|reason=What is meant by Bernoulli in this context?}} In this case, the logistic function for visible units is replaced by the softmax function
:
where K is the number of discrete values that the visible values have. They are applied in topic modeling, and recommender systems.
=Relation to other models=
Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields.{{cite journal | first1 = Ilya | last1 = Sutskever | first2 = Tijmen | last2 = Tieleman | year = 2010 | title = On the convergence properties of contrastive divergence | journal = Proc. 13th Int'l Conf. On AI and Statistics (AISTATS) | url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf | url-status = dead | archive-url = https://web.archive.org/web/20150610230811/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf | archive-date = 2015-06-10 }}Asja Fischer and Christian Igel. [http://image.diku.dk/igel/paper/TRBMAI.pdf Training Restricted Boltzmann Machines: An Introduction] {{Webarchive|url=https://web.archive.org/web/20150610230447/http://image.diku.dk/igel/paper/TRBMAI.pdf# |date=2015-06-10 }}. Pattern Recognition 47, pp. 25-39, 2014
The graphical model of RBMs corresponds to that of factor analysis.{{cite journal |author1=María Angélica Cueto |author2=Jason Morton |author3=Bernd Sturmfels |year=2010 |title=Geometry of the restricted Boltzmann machine |journal=Algebraic Methods in Statistics and Probability |volume=516 |publisher=American Mathematical Society |arxiv=0908.4425 |bibcode=2009arXiv0908.4425A }}
Training algorithm
{{Anchor|Contrastive divergence}}
Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set (a matrix, each row of which is treated as a visible vector ),
:
or equivalently, to maximize the expected log probability of a training sample selected randomly from :
:
The algorithm most often used to train RBMs, that is, to optimize the weight matrix , is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train PoE (product of experts) models.Geoffrey Hinton (1999). [http://www.gatsby.ucl.ac.uk/publications/papers/06-1999.pdf Products of Experts]. ICANN 1999.{{Cite journal | last1 = Hinton | first1 = G. E. | title = Training Products of Experts by Minimizing Contrastive Divergence | doi = 10.1162/089976602760128018 | journal = Neural Computation | volume = 14 | issue = 8 | pages = 1771–1800 | year = 2002 | pmid = 12180402| s2cid = 207596505 | url = http://www.cs.toronto.edu/~hinton/absps/tr00-004.pdf}}
The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.
The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:
- Take a training sample {{mvar|v}}, compute the probabilities of the hidden units and sample a hidden activation vector {{mvar|h}} from this probability distribution.
- Compute the outer product of {{mvar|v}} and {{mvar|h}} and call this the positive gradient.
- From {{mvar|h}}, sample a reconstruction {{mvar|v'}} of the visible units, then resample the hidden activations {{mvar|h'}} from this. (Gibbs sampling step)
- Compute the outer product of {{mvar|v'}} and {{mvar|h'}} and call this the negative gradient.
- Let the update to the weight matrix be the positive gradient minus the negative gradient, times some learning rate: .
- Update the biases {{mvar|a}} and {{mvar|b}} analogously: , .
A Practical Guide to Training RBMs written by Hinton can be found on his homepage.
Stacked Restricted Boltzmann Machine
{{Technical|section|date=August 2023}}
{{More citations needed section|date=August 2023}}
{{See also|Deep belief network}}
- The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes.
- The usage of Stacked Boltzmann is to understand Natural languages, retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one.
- Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure: . The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with Gibbs sampling: Δwij = e*(pij - p'ij)
- The restricted Boltzmann's strength is it performs a non-linear transformation so it's easy to expand, and can give a hierarchical layer of features. The Weakness is that it has complicated calculations of integer and real-valued neurons. It does not follow the gradient of any function, so the approximation of Contrastive divergence to maximum likelihood is improvised.
Literature
- {{Citation|last1=Fischer|first1=Asja|title=An Introduction to Restricted Boltzmann Machines|date=2012|work=Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications|pages=14–36|place=Berlin, Heidelberg|publisher=Springer Berlin Heidelberg|last2=Igel|first2=Christian|series=Lecture Notes in Computer Science |volume=7441 |doi=10.1007/978-3-642-33275-3_2 |isbn=978-3-642-33274-6 |doi-access=free}}
See also
References
{{Reflist|1}}
Bibliography
- {{cite web |url=http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/ |title=Introduction to Restricted Boltzmann Machines |first=Edwin |last=Chen |work=Edwin Chen's blog |date=2011-07-18 }}
- {{Cite web |url=https://deeplearning4j.org/restrictedboltzmannmachine.html |title=A Beginner's Tutorial for Restricted Boltzmann Machines |access-date=2018-11-15 |archive-url=https://web.archive.org/web/20170211042953/https://deeplearning4j.org/restrictedboltzmannmachine.html |archive-date=2017-02-11 |url-status=bot: unknown |work=Deeplearning4j Documentation |first1=Chris |last1=Nicholson |first2=Adam |last2=Gibson }}
- {{Cite web |url=http://deeplearning4j.org/understandingRBMs.html |title=Understanding RBMs |access-date=2014-12-29 |archive-url=https://web.archive.org/web/20160920122139/http://deeplearning4j.org/understandingRBMs.html |archive-date=2016-09-20 |url-status=dead |work=Deeplearning4j Documentation |first1=Chris |last1=Nicholson |first2=Adam |last2=Gibson }}
External links
- Python [https://github.com/AmazaspShumik/sklearn-bayes/blob/master/skbayes/decomposition_models/rbm.py implementation] of Bernoulli RBM and [https://github.com/AmazaspShumik/sklearn-bayes/blob/master/ipython_notebooks_tutorials/decomposition_models/rbm_demo.ipynb tutorial]
- [https://github.com/swirepe/SimpleRBM SimpleRBM] is a very small RBM code (24kB) useful for you to learn about how RBMs learn and work.
- Julia implementation of Restricted Boltzmann machines: https://github.com/cossio/RestrictedBoltzmannMachines.jl