adjoint state method

{{Short description|Numerical method}}

{{primary sources|date=January 2010}}

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.{{Cite journal|last1=Pollini|first1=Nicolò|last2=Lavan|first2=Oren|last3=Amir|first3=Oded|date=2018-06-01|title=Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers|journal=Structural and Multidisciplinary Optimization|language=en|volume=57|issue=6|pages=2273–2289|doi=10.1007/s00158-017-1858-2|s2cid=125712091|issn=1615-1488}} It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations [https://arxiv.org/abs/1806.07366 Available online]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. [https://academic.oup.com/gji/article/167/2/495/559970 free access on GJI website]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient.

The adjoint method is derived from the dual problem{{cite journal |last1=McNamara |first1=Antoine |last2=Treuille |first2=Adrien |last3=Popović |first3=Zoran |last4=Stam |first4=Jos |title=Fluid control using the adjoint method |url=https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf |access-date=28 October 2022 |archive-url=https://web.archive.org/web/20220129011505/https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf |archive-date=29 January 2022 |journal=ACM Transactions on Graphics |volume=23 | issue=3| pages=449–456 |doi=10.1145/1015706.1015744 |date=August 2004 |url-status=live}} and is used e.g. in the Landweber iteration method.{{cite web |last1=Lundvall |first1=Johan |title=Data Assimilation in Fluid Dynamics using Adjoint Optimization |url=http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf |publisher=Linköping University of Technology |access-date=28 October 2022 |archive-url=https://web.archive.org/web/20221009083111/http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf |archive-date=9 October 2022 |location=Sweden |language=en |date=2007 |url-status=live}}

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix A^*=\overline A ^T is used.

When the initial problem consists of calculating the product s^T x and x must satisfy Ax=b, the dual problem can be realized as calculating the product {{nowrap|r^T b ( = s^T x )}}, where r must satisfy A^* r = s .

And

r is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,{{Cite journal|last1=Cea|first1=Jean|date=1986|title=Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût|journal=ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique|language=fr|volume=20|issue=3|pages=371–402|doi=10.1051/m2an/1986200303711 |url=http://www.numdam.org/item/M2AN_1986__20_3_371_0/|doi-access=free}} with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable u \in \mathcal{U}, an optimization variable v\in\mathcal{V}, an objective functional J:\mathcal{U}\times\mathcal{V}\to\mathbb{R} is defined. The state variable u is often implicitly dependent on v through the (direct) state equation D_v(u)=0 (usually the weak form of a partial differential equation), thus the considered objective is j(v) = J(u_v,v), where u_v is the solution of the state equation given the optimization variables v. Usually, one would be interested in calculating \nabla j(v) using the chain rule:

: \nabla j(v) = \nabla_v J(u_v,v) + \nabla_u J(u_v)\nabla_v u_v.

Unfortunately, the term \nabla_v u_v is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of j, the problem

: \text{minimize}\ j(v) = J(u_v,v)

: \text{subject to}\ D_v(u_v) = 0

has an associate Lagrangian functional \mathcal{L}:\mathcal{U}\times\mathcal{V}\times\mathcal{U} \to \mathbb{R} defined by

: \mathcal{L}(u,v,\lambda) = J(u,v) + \langle D_v(u),\lambda\rangle,

where \lambda\in\mathcal{U} is a Lagrange multiplier or adjoint state variable and \langle\cdot,\cdot\rangle is an inner product on \mathcal{U}. The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

:

\begin{cases}

d_u\mathcal{L}(u,v,\lambda;\delta_u) = d_uJ(u,v;\delta_u) + \langle \delta_u,D_v^*(\lambda)\rangle = 0 & \forall \delta_u \in \mathcal{U},\\

d_v\mathcal{L}(u,v,\lambda;\delta_v) = d_vJ(u,v;\delta_v) + \langle d_vD_v(u;\delta_v),\lambda\rangle = 0 & \forall \delta_v\in\mathcal{V},\\

d_\lambda\mathcal{L}(u,v,\lambda;\delta_\lambda) = \langle D_v(u), \delta_\lambda\rangle = 0 \quad & \forall \delta_\lambda \in \mathcal{U},

\end{cases}

where d_xF(x;\delta_x) is the Gateaux derivative of F with respect to x in the direction \delta_x. The last equation is equivalent to D_v(u) = 0, the state equation, to which the solution is u_v. The first equation is the so-called adjoint state equation,

: \langle \delta_u,D_v^*(\lambda)\rangle = -d_uJ(u_v,v;\delta_u) \quad \forall \delta_u \in \mathcal{U},

because the operator involved is the adjoint operator of D_v, D_v^*. Resolving this equation yields the adjoint state \lambda_v.

The gradient of the quantity of interest j with respect to v is \langle \nabla j(v),\delta_v\rangle = d_v j(v;\delta_v) = d_v \mathcal{L}(u_v,v,\lambda_v;\delta_v) (the second equation with u=u_v and \lambda=\lambda_v), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator D_v is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be J(u,v) = \langle Au , v \rangle, for v\in\mathbb{R}^n, u\in\mathbb{R}^m and A \in \mathbb{R}^{n\times m}, and let the state equation be B_vu = b, with B_v \in \mathbb{R}^{m\times m} and b\in \mathbb{R}^m.

The Lagrangian function of the problem is \mathcal{L}(u,v,\lambda) = \langle Au,v\rangle + \langle B_vu - b,\lambda\rangle, where \lambda\in\mathbb{R}^m.

The derivative of \mathcal{L} with respect to \lambda yields the state equation as shown before, and the state variable is u_v = B_v^{-1}b. The derivative of \mathcal{L} with respect to u is equivalent to the adjoint equation, which is, for every \delta_u \in \mathbb{R}^m,

: d_u[\langle B_v\cdot- b, \lambda\rangle](u;\delta_u)= - \langle A^\top v,\delta u\rangle \iff \langle B_v \delta_u,\lambda\rangle = - \langle A^\top v,\delta u\rangle \iff \langle B_v^\top \lambda + A^\top v,\delta_u\rangle = 0 \iff B_v^\top \lambda = - A^\top v.

Thus, we can write symbolically \lambda_v = B_v^{-\top}A^\top v. The gradient would be

: \langle \nabla j(v),\delta_v\rangle = \langle Au_v,\delta_v\rangle + \langle \nabla_vB_v : \lambda_v\otimes u_v,\delta_v\rangle,

where \nabla_v B_v = \frac{\partial B_{ij}}{\partial v_k} is a third-order tensor, \lambda_v \otimes u_v = \lambda^\top_v u_v is the dyadic product between the direct and adjoint states and : denotes a double tensor contraction. It is assumed that B_v has a known analytic expression that can be differentiated easily.

= Numerical consideration for the self-adjoint case =

If the operator B_v was self-adjoint, B_v = B_v^\top, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in O(m^3) operations for the decomposition and O(m^2) operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only O(m^2) operations since the matrices are the same.

See also

References