Neural differential equation

{{Short description|Equation in machine learning}}

Neural differential equations are a class of models in machine learning that combine neural networks with the mathematical framework of differential equations.{{cite conference |last1=Chen |first1=Ricky T. Q. |last2=Rubanova |first2=Yulia |last3=Bettencourt |first3=Jesse |last4=Duvenaud |first4=David K. |year=2018 |editor1-last=Bengio |editor1-first=S. |editor2-last=Wallach |editor2-first=H. |editor3-last=Larochelle |editor3-first=H. |editor4-last=Grauman |editor4-first=K. |editor5-last=Cesa-Bianchi |editor5-first=N. |editor6-last=Garnett |editor6-first=R. |title=Neural Ordinary Differential Equations |url=https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf |conference= |publisher=Curran Associates, Inc. |volume=31 |arxiv=1806.07366 |book-title=Advances in Neural Information Processing Systems}} These models provide an alternative approach to neural network design, particularly for systems that evolve over time or through continuous transformations.

The most common type, a neural ordinary differential equation (neural ODE), defines the evolution of a system's state using an ordinary differential equation whose dynamics are governed by a neural network: \frac{\mathrm{d} \mathbf{h}(t)}{\mathrm{d} t}=f_\theta(\mathbf{h}(t), t).In this formulation, the neural network parameters θ determine how the state changes at each point in time. This approach contrasts with conventional neural networks, where information flows through discrete layers indexed by natural numbers. Neural ODEs instead use continuous layers indexed by positive real numbers, where the function h: \mathbb{R}_{\ge 0} \to \mathbb{R} represents the network's state at any given layer depth t.

Neural ODEs can be understood as continuous-time control systems, where their ability to interpolate data can be interpreted in terms of controllability.{{Cite journal |last1=Ruiz-Balet |first1=Domènec |last2=Zuazua |first2=Enrique |date=2023 |title=Neural ODE Control for Classification, Approximation, and Transport |url=https://epubs.siam.org/doi/10.1137/21M1411433 |journal=SIAM Review |language=en |volume=65 |issue=3 |pages=735–773 |arxiv=2104.05278 |doi=10.1137/21M1411433 |issn=0036-1445}} They have found applications in time series analysis, generative modeling, and the study of complex dynamical systems.

Connection with residual neural networks

Neural ODEs can be interpreted as a residual neural network with a continuum of layers rather than a discrete number of layers.{{cite conference |last1=Chen |first1=Ricky T. Q. |last2=Rubanova |first2=Yulia |last3=Bettencourt |first3=Jesse |last4=Duvenaud |first4=David K. |year=2018 |editor1-last=Bengio |editor1-first=S. |editor2-last=Wallach |editor2-first=H. |editor3-last=Larochelle |editor3-first=H. |editor4-last=Grauman |editor4-first=K. |editor5-last=Cesa-Bianchi |editor5-first=N. |editor6-last=Garnett |editor6-first=R. |title=Neural Ordinary Differential Equations |url=https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf |conference= |publisher=Curran Associates, Inc. |volume=31 |arxiv=1806.07366 |book-title=Advances in Neural Information Processing Systems}} Applying the Euler method with a unit time step to a neural ODE yields the forward propagation equation of a residual neural network:

\mathbf{h}_{\ell+1} = f_{\theta}(\mathbf{h}_{\ell}, \ell) + \mathbf{h}_{\ell},

with ℓ being the ℓ-th layer of this residual neural network. While the forward propagation of a residual neural network is done by applying a sequence of transformations starting at the input layer, the forward propagation computation of a neural ODE is done by solving a differential equation. More precisely, the output \mathbf{h}_{\text{out}} associated to the input \mathbf{h}_{\text{in}} of the neural ODE is obtained by solving the initial value problem

\frac{\mathrm{d} \mathbf{h}(t)}{\mathrm{d} t}=f_\theta(\mathbf{h}(t), t), \quad \mathbf{h}(0)=\mathbf{h}_{\text{in}},

and assigning the value \mathbf{h}(T) to \mathbf{h}_{\text{out}} .

Universal differential equations

{{About|combinations of common and neural ordinary differential equations|differential algebraic equations which can approximate any continuous function|Universal differential equation|section=yes}}

In physics-informed contexts where additional information is known, neural ODEs can be combined with an existing first-principles model to build a physics-informed neural network model called universal differential equations (UDE).{{cite arXiv |eprint=2001.04385 |class= cs.LG|author1=Christopher Rackauckas |author2=Yingbo Ma |title=Universal Differential Equations for Scientific Machine Learning |date=2024 |author3=Julius Martensen |author4=Collin Warner |author5=Kirill Zubov |author6=Rohit Supekar |author7=Dominic Skinner |author8=Ali Ramadhan |author9=Alan Edelman}}{{Cite journal |last1=Xiao |first1=Tianbai |last2=Frank |first2=Martin |date=2023 |title=RelaxNet: A structure-preserving neural network to approximate the Boltzmann collision operator |url=https://linkinghub.elsevier.com/retrieve/pii/S0021999123004126 |journal=Journal of Computational Physics |language=en |volume=490 |pages=112317 |doi=10.1016/j.jcp.2023.112317|arxiv=2211.08149 |bibcode=2023JCoPh.49012317X }}{{Citation |last1=Silvestri |first1=Mattia |title=An Analysis of Universal Differential Equations for Data-Driven Discovery of Ordinary Differential Equations |date=2023 |work=Computational Science – ICCS 2023 |volume=10476 |pages=353–366 |editor-last=Mikyška |editor-first=Jiří |url=https://link.springer.com/10.1007/978-3-031-36027-5_27 |access-date=2024-08-18 |place=Cham |publisher=Springer Nature Switzerland |language=en |doi=10.1007/978-3-031-36027-5_27 |isbn=978-3-031-36026-8 |last2=Baldo |first2=Federico |last3=Misino |first3=Eleonora |last4=Lombardi |first4=Michele |editor2-last=de Mulatier |editor2-first=Clélia |editor3-last=Paszynski |editor3-first=Maciej |editor4-last=Krzhizhanovskaya |editor4-first=Valeria V.|url-access=subscription }}{{cite arXiv | eprint = 2408.07143 | author1 = Christoph Plate | author2 = Carl Julius Martensen | author3 = Sebastian Sager | title = Optimal Experimental Design for Universal Differential Equations | year = 2024| class = math.OC }} For instance, an UDE version of the Lotka-Volterra model can be written as{{cite thesis |type=PhD |title=On Neural Differential Equations |author=Patrick Kidger |publisher=University of Oxford, Mathematical Institute |date=2021 |degree=Doctor of Philosophy |location=Oxford, United Kingdom |url=https://ora.ox.ac.uk/objects/uuid:af32d844-df84-4fdc-824d-44bebc3d7aa9}}

\begin{align}

\frac{dx}{dt} &= \alpha x - \beta x y + f_{\theta}(x(t),y(t)), \\

\frac{dy}{dt} &= - \gamma y + \delta x y + g_{\theta}(x(t),y(t)),

\end{align}

where the terms f_{\theta} and g_{\theta} are correction terms parametrized by neural networks.

References

{{Reflist}}

See also