value function

The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem.{{cite book |first1=Wendell H. |last1=Fleming |author-link=Wendell Fleming |first2=Raymond W. |last2=Rishel |title=Deterministic and Stochastic Optimal Control |location=New York |publisher=Springer |year=1975 |pages=81–83 |url=https://books.google.com/books?id=qJDbBwAAQBAJ&pg=PA81 |isbn=0-387-90155-8 }}{{cite book |first=Michael R. |last=Caputo |title=Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications |location=New York |publisher=Cambridge University Press |year=2005 |isbn=0-521-60368-4 |page=185 |url=https://books.google.com/books?id=XZ2yYSVKWJkC&pg=PA185 }} In a controlled dynamical system, the value function represents the optimal payoff of the system over the interval [t, t₁] when started at the time-t state variable x(t)=x.{{cite book |first=Thomas A. |last=Weber |title=Optimal Control Theory : with Applications in Economics |location=Cambridge |publisher=The MIT Press |year=2011 |isbn=978-0-262-01573-8 |page=82 }} If the objective function represents some cost that is to be minimized, the value function can be interpreted as the cost to finish the optimal program, and is thus referred to as "cost-to-go function."{{cite book |first1=Dimitri P. |last1=Bertsekas |first2=John N. |last2=Tsitsiklis |title=Neuro-Dynamic Programming |location=Belmont |publisher=Athena Scientific |year=1996 |isbn=1-886529-10-8 |page=2 }}{{cite web |title=EE365: Dynamic Programming |url=https://stanford.edu/class/ee365/lectures/dp.pdf#page=3 }} In an economic context, where the objective function usually represents utility, the value function is conceptually equivalent to the indirect utility function.{{cite book |first1=Andreu |last1=Mas-Colell |author-link=Andreu Mas-Colell |first2=Michael D. |last2=Whinston |author-link2=Michael Whinston |first3=Jerry R. |last3=Green |title=Microeconomic Theory |location=New York |publisher=Oxford University Press |year=1995 |isbn=0-19-507340-1 |page=964 }}{{cite book |first1=Dean |last1=Corbae |first2=Maxwell B. |last2=Stinchcombe |first3=Juraj |last3=Zeman |title=An Introduction to Mathematical Analysis for Economic Theory and Econometrics |publisher=Princeton University Press |year=2009 |page=145 |isbn=978-0-691-11867-3 |url=https://books.google.com/books?id=j5P83LtzVO8C&pg=PA145 }}

In a problem of optimal control, the value function is defined as the supremum of the objective function taken over the set of admissible controls. Given $(t_{0}, x_{0}) \in [0, t_{1}] \times \mathbb{R}^{d}$ , a typical optimal control problem is to

: $\text{maximize} \quad J(t_{0}, x_{0}; u) = \int_{t_{0}}^{t_{1}} I(t,x(t), u(t)) \, \mathrm{d}t + \phi(x(t_{1}))$

subject to

: $\frac{\mathrm{d}x(t)}{\mathrm{d}t} = f(t, x(t), u(t))$

with initial state variable $x(t_{0})=x_{0}$ .{{cite book |first1=Morton I. |last1=Kamien |author-link=Morton Kamien |first2=Nancy L. |last2=Schwartz |title=Dynamic Optimization : The Calculus of Variations and Optimal Control in Economics and Management |location=Amsterdam |publisher=North-Holland |edition=2nd |year=1991 |isbn=0-444-01609-0 |page=259 }} The objective function $J(t_{0}, x_{0}; u)$ is to be maximized over all admissible controls $u \in U[t_{0},t_{1}]$ , where $u$ is a Lebesgue measurable function from $[t_{0}, t_{1}]$ to some prescribed arbitrary set in $\mathbb{R}^{m}$ . The value function is then defined as

{{Equation box 1

|indent =:

|equation = $V(t, x(t)) = \max_{u \in U} \int_{t}^{t_{1}} I(\tau,x(\tau), u(\tau)) \, \mathrm{d}\tau + \phi(x(t_{1}))$

|cellpadding

|border

|border colour = #50C878

|background colour = #ECFCF4}}

with $V(t_{1}, x(t_{1})) = \phi(x(t_{1}))$ , where $\phi(x(t_{1}))$ is the "scrap value". If the optimal pair of control and state trajectories is $(x^\ast, u^\ast)$ , then $V(t_{0}, x_{0}) = J(t_{0}, x_{0}; u^\ast)$ . The function $h$ that gives the optimal control $u^\ast$ based on the current state $x$ is called a feedback control policy, or simply a policy function.{{cite book |first1=Lars |last1=Ljungqvist |author-link=Lars Ljungqvist |first2=Thomas J. |last2=Sargent |author-link2=Thomas J. Sargent |title=Recursive Macroeconomic Theory |location=Cambridge |publisher=MIT Press |edition=Fourth |year=2018 |isbn=978-0-262-03866-9 |page=106 |url=https://books.google.com/books?id=Jm1qDwAAQBAJ&pg=PA106 }}

Bellman's principle of optimality roughly states that any optimal policy at time $t$ , $t_{0} \leq t \leq t_{1}$ taking the current state $x(t)$ as "new" initial condition must be optimal for the remaining problem. If the value function happens to be continuously differentiable,Benveniste and Scheinkman established sufficient conditions for the differentiability of the value function, which in turn allows an application of the envelope theorem, see {{cite journal |first1=L. M. |last1=Benveniste |first2=J. A. |last2=Scheinkman |title=On the Differentiability of the Value Function in Dynamic Models of Economics |journal=Econometrica |volume=47 |issue=3 |year=1979 |pages=727–732 |jstor=1910417 |doi=10.2307/1910417 }} Also see {{cite journal |first=Atle |last=Seierstad |title=Differentiability Properties of the Optimal Value Function in Control Theory |journal=Journal of Economic Dynamics and Control |volume=4 |year=1982 |pages=303–310 |doi=10.1016/0165-1889(82)90019-7 }} this gives rise to an important partial differential equation known as Hamilton–Jacobi–Bellman equation,

: $-\frac{\partial V(t,x)}{\partial t} = \max_u \left\{ I(t,x,u) + \frac{\partial V(t,x)}{\partial x} f(t, x, u) \right\}$

where the maximand on the right-hand side can also be re-written as the Hamiltonian, $H \left(t, x, u, \lambda \right) = I(t,x,u) + \lambda(t) f(t, x, u)$ , as

: $-\frac{\partial V(t,x)}{\partial t} = \max_u H(t,x,u,\lambda)$

with $\partial V(t,x)/\partial x = \lambda(t)$ playing the role of the costate variables.{{cite book |first=Donald E. |last=Kirk |title=Optimal Control Theory |location=Englewood Cliffs, NJ |publisher=Prentice-Hall |year=1970 |isbn=0-13-638098-0 |page=88 }} Given this definition, we further have $\mathrm{d} \lambda(t) / \mathrm{d}t = \partial^{2} V(t,x) / \partial x \partial t + \partial^{2} V(t,x) / \partial x^{2} \cdot f(x)$ , and after differentiating both sides of the HJB equation with respect to $x$ ,

: $- \frac{\partial^{2} V(t,x)}{\partial t \partial x} = \frac{\partial I}{\partial x} + \frac{\partial^{2} V(t,x)}{\partial x^{2}} f(x) + \frac{\partial V(t,x)}{\partial x} \frac{\partial f(x)}{\partial x}$

which after replacing the appropriate terms recovers the costate equation

: $- \dot{\lambda}(t) = \underbrace{\frac{\partial I}{\partial x} + \lambda(t) \frac{\partial f(x)}{\partial x} }_{= \frac{\partial H}{\partial x}}$

where $\dot{\lambda}(t)$ is Newton notation for the derivative with respect to time.{{cite journal |first=X. Y. |last=Zhou |title=Maximum Principle, Dynamic Programming, and their Connection in Deterministic Control |journal=Journal of Optimization Theory and Applications |year=1990 |volume=65 |issue=2 |pages=363–373 |doi=10.1007/BF01102352 |s2cid=122333807 }}

The value function is the unique viscosity solution to the Hamilton–Jacobi–Bellman equation.Theorem 10.1 in {{cite web |title=Viscosity Solutions of Hamilton-Jacobi Equations and Optimal Control Problems |first=Alberto |last=Bressan |date=2019 |work=Lecture Notes |url=http://personal.psu.edu/axb62/PSPDF/HJlnotes19.pdf#page=54 }} In an online closed-loop approximate optimal control, the value function is also a Lyapunov function that establishes global asymptotic stability of the closed-loop system.{{cite book |first1=Rushikesh |last1=Kamalapurkar |first2=Patrick |last2=Walters |first3=Joel |last3=Rosenfeld |first4=Warren |last4=Dixon |title=Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach |location=Berlin |publisher=Springer |year=2018 |isbn=978-3-319-78383-3 |chapter=Optimal Control and Lyapunov Stability |pages=26–27 |chapter-url=https://books.google.com/books?id=R3haDwAAQBAJ&pg=PA27 }}

value function

References

Further reading