adjoint state method

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.{{Cite journal|last1=Pollini|first1=Nicolò|last2=Lavan|first2=Oren|last3=Amir|first3=Oded|date=2018-06-01|title=Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers|journal=Structural and Multidisciplinary Optimization|language=en|volume=57|issue=6|pages=2273–2289|doi=10.1007/s00158-017-1858-2|s2cid=125712091|issn=1615-1488}} It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations [https://arxiv.org/abs/1806.07366 Available online]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. [https://academic.oup.com/gji/article/167/2/495/559970 free access on GJI website]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient.

The adjoint method is derived from the dual problem{{cite journal |last1=McNamara |first1=Antoine |last2=Treuille |first2=Adrien |last3=Popović |first3=Zoran |last4=Stam |first4=Jos |title=Fluid control using the adjoint method |url=https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf |access-date=28 October 2022 |archive-url=https://web.archive.org/web/20220129011505/https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf |archive-date=29 January 2022 |journal=ACM Transactions on Graphics |volume=23 | issue=3| pages=449–456 |doi=10.1145/1015706.1015744 |date=August 2004 |url-status=live}} and is used e.g. in the Landweber iteration method.{{cite web |last1=Lundvall |first1=Johan |title=Data Assimilation in Fluid Dynamics using Adjoint Optimization |url=http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf |publisher=Linköping University of Technology |access-date=28 October 2022 |archive-url=https://web.archive.org/web/20221009083111/http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf |archive-date=9 October 2022 |location=Sweden |language=en |date=2007 |url-status=live}}

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix $A^*=\overline A ^T$ is used.

When the initial problem consists of calculating the product $s^T x$ and $x$ must satisfy $Ax=b$ , the dual problem can be realized as calculating the product {{nowrap| $r^T b$ ( $= s^T x$ )}}, where $r$ must satisfy $A^* r = s$ .

And

$r$ is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,{{Cite journal|last1=Cea|first1=Jean|date=1986|title=Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût|journal=ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique|language=fr|volume=20|issue=3|pages=371–402|doi=10.1051/m2an/1986200303711 |url=http://www.numdam.org/item/M2AN_1986__20_3_371_0/|doi-access=free}} with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable $u \in \mathcal{U}$ , an optimization variable $v\in\mathcal{V}$ , an objective functional $J:\mathcal{U}\times\mathcal{V}\to\mathbb{R}$ is defined. The state variable $u$ is often implicitly dependent on $v$ through the (direct) state equation $D_v(u)=0$ (usually the weak form of a partial differential equation), thus the considered objective is $j(v) = J(u_v,v)$ , where $u_v$ is the solution of the state equation given the optimization variables $v$ . Usually, one would be interested in calculating $\nabla j(v)$ using the chain rule:

: $\nabla j(v) = \nabla_v J(u_v,v) + \nabla_u J(u_v)\nabla_v u_v.$

Unfortunately, the term $\nabla_v u_v$ is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of $j$ , the problem

: $\text{minimize}\ j(v) = J(u_v,v)$

: $\text{subject to}\ D_v(u_v) = 0$

has an associate Lagrangian functional $\mathcal{L}:\mathcal{U}\times\mathcal{V}\times\mathcal{U} \to \mathbb{R}$ defined by

: $\mathcal{L}(u,v,\lambda) = J(u,v) + \langle D_v(u),\lambda\rangle,$

where $\lambda\in\mathcal{U}$ is a Lagrange multiplier or adjoint state variable and $\langle\cdot,\cdot\rangle$ is an inner product on $\mathcal{U}$ . The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

: $\begin{cases}
d_u\mathcal{L}(u,v,\lambda;\delta_u) = d_uJ(u,v;\delta_u) + \langle \delta_u,D_v^*(\lambda)\rangle = 0 & \forall \delta_u \in \mathcal{U},\\
d_v\mathcal{L}(u,v,\lambda;\delta_v) = d_vJ(u,v;\delta_v) + \langle d_vD_v(u;\delta_v),\lambda\rangle = 0 & \forall \delta_v\in\mathcal{V},\\
d_\lambda\mathcal{L}(u,v,\lambda;\delta_\lambda) = \langle D_v(u), \delta_\lambda\rangle = 0 \quad & \forall \delta_\lambda \in \mathcal{U},
\end{cases}$

where $d_xF(x;\delta_x)$ is the Gateaux derivative of $F$ with respect to $x$ in the direction $\delta_x$ . The last equation is equivalent to $D_v(u) = 0$ , the state equation, to which the solution is $u_v$ . The first equation is the so-called adjoint state equation,

: $\langle \delta_u,D_v^*(\lambda)\rangle = -d_uJ(u_v,v;\delta_u) \quad \forall \delta_u \in \mathcal{U},$

because the operator involved is the adjoint operator of $D_v$ , $D_v^*$ . Resolving this equation yields the adjoint state $\lambda_v$ .

The gradient of the quantity of interest $j$ with respect to $v$ is $\langle \nabla j(v),\delta_v\rangle = d_v j(v;\delta_v) = d_v \mathcal{L}(u_v,v,\lambda_v;\delta_v)$ (the second equation with $u=u_v$ and $\lambda=\lambda_v$ ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator $D_v$ is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be $J(u,v) = \langle Au , v \rangle$ , for $v\in\mathbb{R}^n$ , $u\in\mathbb{R}^m$ and $A \in \mathbb{R}^{n\times m}$ , and let the state equation be $B_vu = b$ , with $B_v \in \mathbb{R}^{m\times m}$ and $b\in \mathbb{R}^m$ .

The Lagrangian function of the problem is $\mathcal{L}(u,v,\lambda) = \langle Au,v\rangle + \langle B_vu - b,\lambda\rangle$ , where $\lambda\in\mathbb{R}^m$ .

The derivative of $\mathcal{L}$ with respect to $\lambda$ yields the state equation as shown before, and the state variable is $u_v = B_v^{-1}b$ . The derivative of $\mathcal{L}$ with respect to $u$ is equivalent to the adjoint equation, which is, for every $\delta_u \in \mathbb{R}^m$ ,

: $d_u[\langle B_v\cdot- b, \lambda\rangle](u;\delta_u)= - \langle A^\top v,\delta u\rangle \iff \langle B_v \delta_u,\lambda\rangle = - \langle A^\top v,\delta u\rangle \iff \langle B_v^\top \lambda + A^\top v,\delta_u\rangle = 0 \iff B_v^\top \lambda = - A^\top v.$

Thus, we can write symbolically $\lambda_v = B_v^{-\top}A^\top v$ . The gradient would be

: $\langle \nabla j(v),\delta_v\rangle = \langle Au_v,\delta_v\rangle + \langle \nabla_vB_v : \lambda_v\otimes u_v,\delta_v\rangle,$

where $\nabla_v B_v = \frac{\partial B_{ij}}{\partial v_k}$ is a third-order tensor, $\lambda_v \otimes u_v = \lambda^\top_v u_v$ is the dyadic product between the direct and adjoint states and $:$ denotes a double tensor contraction. It is assumed that $B_v$ has a known analytic expression that can be differentiated easily.

= Numerical consideration for the self-adjoint case =

If the operator $B_v$ was self-adjoint, $B_v = B_v^\top$ , the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in $O(m^3)$ operations for the decomposition and $O(m^2)$ operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only $O(m^2)$ operations since the matrices are the same.

References

External links

A well written explanation by Errico: [https://dx.doi.org/10.1175/1520-0477(1997)078%3C2577:WIAAM%3E2.0.CO;2 What is an adjoint Model? ]
Another well written explanation with worked examples, written by Bradley [http://cs.stanford.edu/~ambrad/adjoint_tutorial.pdf]
More technical explanation: A [https://doi.org/10.1111/j.1365-246X.2006.02978.x review] of the adjoint-state method for computing the gradient of a functional with geophysical applications
MIT course [https://web.archive.org/web/20150906095132/http://ocw.mit.edu/courses/mathematics/18-325-topics-in-applied-mathematics-waves-and-imaging-fall-2012/lecture-notes/MIT18_325F12_Chapter4.pdf]
MIT notes [http://math.mit.edu/~stevenj/18.336/adjoint.pdf]

Category:Numerical analysis