recursive neural network

{{short description|Type of neural network which utilizes recursion}}

A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. These networks were first introduced to learn distributed representations of structure (such as logical terms),{{cite book|doi=10.1109/ICNN.1996.548916 |last1=Goller|first1=C.|title=Proceedings of International Conference on Neural Networks (ICNN'96)|volume=1|pages=347–352|last2=Küchler|first2=A. |year=1996|isbn=978-0-7803-3210-2|citeseerx=10.1.1.52.4759|chapter=Learning task-dependent distributed representations by backpropagation through structure|s2cid=6536466}} but have been successful in multiple applications, for instance in learning sequence and tree structures in natural language processing (mainly continuous representations of phrases and sentences based on word embeddings).

Architectures

= Basic =

File:Simple recursive neural network.svg

In the simplest architecture, nodes are combined into parents using a weight matrix (which is shared across the whole network) and a non-linearity such as the $\tanh$ hyperbolic function. If $c_1$ and $c_2$ are $n$ -dimensional vector representations of nodes, their parent will also be an $n$ -dimensional vector, defined as:

: $p_{1,2} = \tanh(W[c_1;c_2])$

where $W$ is a learned $n \times 2n$ weight matrix.

This architecture, with a few improvements, has been used for successfully parsing natural scenes, syntactic parsing of natural language sentences,{{cite book|last1=Socher|first1=Richard|last2=Lin|first2=Cliff|last3=Ng|first3=Andrew Y.|last4=Manning|first4=Christopher D.|chapter=Parsing Natural Scenes and Natural Language with Recursive Neural Networks|title=The 28th International Conference on Machine Learning (ICML 2011)|chapter-url=http://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf}} and recursive autoencoding and generative modeling of 3D shape structures in the form of cuboid abstractions.{{cite journal|last1=Li|first1=Jun|last2=Xu|first2=Kai|last3=Chaudhuri|first3=Siddhartha|last4=Yumer|first4=Ersin|last5=Zhang|first5=Hao|last6=Guibas|first6=Leonadis|title=GRASS: Generative Recursive Autoencoders for Shape Structures|journal=ACM Transactions on Graphics|year=2017|url=https://www2.cs.sfu.ca/~haoz/pubs/li_sig17_grass.pdf|volume=36 |issue=4 |page=52 |doi=10.1145/3072959.3073613|arxiv=1705.02090|s2cid=20432407}}

= Recursive cascade correlation (RecCC) =

RecCC is a constructive neural network approach to deal with tree domains{{Cite journal|last1=Sperduti|first1=A.|last2=Starita|first2=A.|date=1997-05-01|title=Supervised neural networks for the classification of structures|journal=IEEE Transactions on Neural Networks|volume=8|issue=3|pages=714–735|doi=10.1109/72.572108|pmid=18255672|issn=1045-9227}} with pioneering applications to chemistry{{Cite journal|last1=Bianucci|first1=Anna Maria|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|last4=Starita|first4=Antonina|year=2000|title=Application of Cascade Correlation Networks for Structures to Chemistry|journal=Applied Intelligence|language=en|volume=12|issue=1–2|pages=117–147|doi=10.1023/A:1008368105614|s2cid=10031212|issn=0924-669X}} and extension to directed acyclic graphs.{{Cite journal|last1=Micheli|first1=A.|last2=Sona|first2=D.|last3=Sperduti|first3=A.|date=2004-11-01|title=Contextual processing of structured data by recursive cascade correlation|journal=IEEE Transactions on Neural Networks|volume=15|issue=6|pages=1396–1410|doi=10.1109/TNN.2004.837783|pmid=15565768|issn=1045-9227|citeseerx=10.1.1.135.8772|s2cid=12370239}}

= Unsupervised RNN =

A framework for unsupervised RNN has been introduced in 2004.{{Cite journal|last1=Hammer|first1=Barbara|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|last4=Strickert|first4=Marc|year=2004|title=Recursive self-organizing network models|journal=Neural Networks|volume=17|issue=8–9|pages=1061–1085|doi=10.1016/j.neunet.2004.06.009|pmid=15555852|citeseerx=10.1.1.129.6155}}{{Cite journal|last1=Hammer|first1=Barbara|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|last4=Strickert|first4=Marc|date=2004-03-01|title=A general framework for unsupervised processing of structured data|journal=Neurocomputing|volume=57|pages=3–35|doi=10.1016/j.neucom.2004.01.008|citeseerx=10.1.1.3.984}}

= Tensor =

Recursive neural tensor networks use a single tensor-based composition function for all nodes in the tree.{{cite book|last1=Socher|first1=Richard|last2=Perelygin|first2=Alex|last3=Y. Wu|first3=Jean|last4=Chuang|first4=Jason|last5=D. Manning|first5=Christopher|last6=Y. Ng|first6=Andrew|last7=Potts|first7=Christopher|chapter=Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank|title=EMNLP 2013|chapter-url=http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf}}

Training

= Stochastic gradient descent =

Typically, stochastic gradient descent (SGD) is used to train the network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks.

Properties

The universal approximation capability of RNNs over trees has been proved in literature.{{Cite book|url=https://books.google.com/books?id=H3_1BwAAQBAJ&pg=PA1|title=Learning with Recurrent Neural Networks|last=Hammer|first=Barbara|date=2007-10-03|publisher=Springer|isbn=9781846285677|language=en}}{{Cite journal|last1=Hammer|first1=Barbara|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|date=2005-05-01|title=Universal Approximation Capability of Cascade Correlation for Structures|journal=Neural Computation|language=en|volume=17|issue=5|pages=1109–1159|doi=10.1162/0899766053491878|citeseerx=10.1.1.138.2224|s2cid=10845957}}

Related models

= Recurrent neural networks =

Recurrent neural networks are recursive artificial neural networks with a certain structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

= Tree Echo State Networks =

An efficient approach to implement recursive neural networks is given by the Tree Echo State Network{{Cite journal|last1=Gallicchio|first1=Claudio|last2=Micheli|first2=Alessio|date=2013-02-04|title=Tree Echo State Networks|journal=Neurocomputing|volume=101|pages=319–337|doi=10.1016/j.neucom.2012.08.017|hdl=11568/158480|hdl-access=free}} within the reservoir computing paradigm.

= Extension to graphs =

Extensions to graphs include graph neural network (GNN),{{Cite journal|last1=Scarselli|first1=F.|last2=Gori|first2=M.|last3=Tsoi|first3=A. C.|last4=Hagenbuchner|first4=M.|last5=Monfardini|first5=G.|date=2009-01-01|title=The Graph Neural Network Model|journal=IEEE Transactions on Neural Networks|volume=20|issue=1|pages=61–80|doi=10.1109/TNN.2008.2005605|pmid=19068426|s2cid=206756462|issn=1045-9227|url=https://repository.hkbu.edu.hk/vprd_ja/1}} Neural Network for Graphs (NN4G),{{Cite journal|last=Micheli|first=A.|date=2009-03-01|title=Neural Network for Graphs: A Contextual Constructive Approach|journal=IEEE Transactions on Neural Networks|volume=20|issue=3|pages=498–511|doi=10.1109/TNN.2008.2010350|pmid=19193509|s2cid=17486263|issn=1045-9227}} and more recently convolutional neural networks for graphs.

References

Category:Neural network architectures

Category:Artificial neural networks