implicit function theorem

{{short description|On converting relations to functions of several real variables}}

In multivariable calculus, the implicit function theorem{{efn|Also called Dini's theorem by the Pisan school in Italy. In the English-language literature, Dini's theorem is a different theorem in mathematical analysis.}} is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.

More precisely, given a system of {{mvar|m}} equations {{math|1=f_i{{space|hair}}(x₁, ..., x_n, y₁, ..., y_m) = 0, i = 1, ..., m}} (often abbreviated into {{math|1=F(x, y) = 0}}), the theorem states that, under a mild condition on the partial derivatives (with respect to each {{math|y_i}} ) at a point, the {{mvar|m}} variables {{math|y_i}} are differentiable functions of the {{math|x_j}} in some neighborhood of the point. As these functions generally cannot be expressed in closed form, they are implicitly defined by the equations, and this motivated the name of the theorem.{{Cite book |last=Chiang |first=Alpha C. |author-link=Alpha Chiang |title=Fundamental Methods of Mathematical Economics |publisher=McGraw-Hill |edition=3rd |year=1984 |pages=[https://archive.org/details/fundamentalmetho0000chia_b4p1/page/204 204–206] |isbn=0-07-010813-7 |url=https://archive.org/details/fundamentalmetho0000chia_b4p1/page/204 }}

In other words, under a mild condition on the partial derivatives, the set of zeros of a system of equations is locally the graph of a function.

History

Augustin-Louis Cauchy (1789–1857) is credited with the first rigorous form of the implicit function theorem. Ulisse Dini (1845–1918) generalized the real-variable version of the implicit function theorem to the context of functions of any number of real variables.{{cite book |first1=Steven |last1=Krantz |first2=Harold |last2=Parks |title=The Implicit Function Theorem |series=Modern Birkhauser Classics |publisher=Birkhauser |year=2003 |isbn=0-8176-4285-4 |url=https://archive.org/details/implicitfunction0000kran |url-access=registration }}

Two variables case

Let $f:\R^2 \to \R$ be a continuously differentiable function defining the implicit equation of a curve $f(x,y) = 0$ . Let $(x_0, y_0)$ be a point on the curve, that is, a point such that $f(x_0, y_0)=0$ . In this simple case, the implicit function theorem can be stated as follows:

{{math theorem|math_statement=If {{tmath|f(x,y)}} is a function that is continuously differentiable in a neighbourhood of the point {{tmath|(x_0,y_0)}}, and

$\frac{\partial f}{ \partial y} (x_0, y_0) \neq 0,$ then there exists a unique differentiable function {{tmath|\varphi}} such that {{tmath|1=y_0=\varphi(x_0)}} and {{tmath|1=f(x, \varphi(x))=0}} in a neighbourhood of {{tmath|x_0}}.}}

Proof. By differentiating the equation {{tmath|1=f(x, \varphi(x))=0}}, one gets

$\frac{\partial f}{ \partial x}(x, \varphi(x))+\varphi'(x)\, \frac{\partial f}{ \partial y}(x, \varphi(x))=0.$

and thus

$\varphi'(x)=-\frac{\frac{\partial f}{ \partial x}(x, \varphi(x))}{\frac{\partial f}{ \partial y}(x, \varphi(x))}.$

This gives an ordinary differential equation for {{tmath|\varphi}}, with the initial condition {{tmath|1=\varphi(x_0) = y_0}}.

Since $\frac{\partial f}{ \partial y} (x_0, y_0) \neq 0,$ the right-hand side of the differential equation is continuous. Hence, the Peano existence theorem applies so there is a (possibly non-unique) solution. To see why $\varphi$ is unique, note that the function $g_x(y)=f(x,y)$ is strictly monotone in a neighborhood of $x_0,y_0$ (as $\frac{\partial f}{ \partial y} (x_0, y_0) \neq 0$ ), thus it is injective. If $\varphi,\phi$ are solutions to the differential equation, then $g_x(\varphi(x))=g_x(\phi(x))=0$ and by injectivity we get, $\varphi(x)=\phi(x)$ .

First example

Image:Implicit circle.svg is the graph of some function of {{mvar|x}}, while around {{math|B}}, there is no function of {{mvar|x}} with the circle as its graph.
This is exactly what the implicit function theorem asserts in this case.]]

If we define the function {{math|1=f(x, y) = x² + y²}}, then the equation {{math|1=f(x, y) = 1}} cuts out the unit circle as the level set {{math|1={(x, y) {{!}} f(x, y) = 1}}}. There is no way to represent the unit circle as the graph of a function of one variable {{math|1=y = g(x)}} because for each choice of {{math|x ∈ (−1, 1)}}, there are two choices of y, namely $\pm\sqrt{1-x^2}$ .

However, it is possible to represent part of the circle as the graph of a function of one variable. If we let $g_1(x) = \sqrt{1-x^2}$ for {{math|−1 ≤ x ≤ 1}}, then the graph of {{math|1=y = g₁(x)}} provides the upper half of the circle. Similarly, if $g_2(x) = -\sqrt{1-x^2}$ , then the graph of {{math|1=y = g₂(x)}} gives the lower half of the circle.

The purpose of the implicit function theorem is to tell us that functions like {{math|g₁(x)}} and {{math|g₂(x)}} almost always exist, even in situations where we cannot write down explicit formulas. It guarantees that {{math|g₁(x)}} and {{math|g₂(x)}} are differentiable, and it even works in situations where we do not have a formula for {{math|f(x, y)}}.

Definitions

Let $f: \R^{n+m} \to \R^m$ be a continuously differentiable function. We think of $\R^{n+m}$ as the Cartesian product $\R^n\times\R^m,$ and we write a point of this product as $(\mathbf{x}, \mathbf{y}) = (x_1,\ldots, x_n, y_1, \ldots y_m).$ Starting from the given function $f$ , our goal is to construct a function $g: \R^n \to \R^m$ whose graph $(\textbf{x}, g(\textbf{x}))$ is precisely the set of all $(\textbf{x}, \textbf{y})$ such that $f(\textbf{x}, \textbf{y}) = \textbf{0}$ .

As noted above, this may not always be possible. We will therefore fix a point $(\textbf{a}, \textbf{b}) = (a_1, \dots, a_n, b_1, \dots, b_m)$ which satisfies $f(\textbf{a}, \textbf{b}) = \textbf{0}$ , and we will ask for a $g$ that works near the point $(\textbf{a}, \textbf{b})$ . In other words, we want an open set $U \subset \R^n$ containing $\textbf{a}$ , an open set $V \subset \R^m$ containing $\textbf{b}$ , and a function $g : U \to V$ such that the graph of $g$ satisfies the relation $f = \textbf{0}$ on $U\times V$ , and that no other points within $U \times V$ do so. In symbols,

$\{ (\mathbf{x}, g(\mathbf{x})) \mid \mathbf x \in U \} = \{ (\mathbf{x}, \mathbf{y})\in U \times V \mid f(\mathbf{x}, \mathbf{y}) = \mathbf{0} \}.$

To state the implicit function theorem, we need the Jacobian matrix of $f$ , which is the matrix of the partial derivatives of $f$ . Abbreviating $(a_1, \dots, a_n, b_1, \dots, b_m)$ to $(\textbf{a}, \textbf{b})$ , the Jacobian matrix is

$(Df)(\mathbf{a},\mathbf{b})
= \left[\begin{array}{ccc|ccc}
\frac{\partial f_1}{\partial x_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_1}{\partial x_n}(\mathbf{a},\mathbf{b}) &
\frac{\partial f_1}{\partial y_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_1}{\partial y_m}(\mathbf{a},\mathbf{b}) \\
\vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_m}{\partial x_n}(\mathbf{a},\mathbf{b}) &
\frac{\partial f_m}{\partial y_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_m}{\partial y_m}(\mathbf{a},\mathbf{b})
\end{array}\right]
= \left[\begin{array}{c|c} X & Y \end{array}\right]$

where $X$ is the matrix of partial derivatives in the variables $x_i$ and $Y$ is the matrix of partial derivatives in the variables $y_j$ . The implicit function theorem says that if $Y$ is an invertible matrix, then there are $U$ , $V$ , and $g$ as desired. Writing all the hypotheses together gives the following statement.

Statement of the theorem

Let $f: \R^{n+m} \to \R^m$ be a continuously differentiable function, and let $\R^{n+m}$ have coordinates $(\textbf{x}, \textbf{y})$ . Fix a point $(\textbf{a}, \textbf{b}) = (a_1,\dots,a_n, b_1,\dots, b_m)$ with $f(\textbf{a}, \textbf{b}) = \mathbf{0}$ , where $\mathbf{0} \in \R^m$ is the zero vector. If the Jacobian matrix (this is the right-hand panel of the Jacobian matrix shown in the previous section):

$J_{f, \mathbf{y}} (\mathbf{a}, \mathbf{b}) = \left [ \frac{\partial f_i}{\partial y_j} (\mathbf{a}, \mathbf{b}) \right ]$

is invertible, then there exists an open set $U \subset \R^n$ containing $\textbf{a}$ such that there exists a unique function $g: U \to \R^m$ such that {{nowrap| $g(\mathbf{a}) = \mathbf{b}$ ,}} and {{nowrap| $f(\mathbf{x}, g(\mathbf{x})) = \mathbf{0} ~ \text{for all} ~ \mathbf{x}\in U$ .}} Moreover, $g$ is continuously differentiable and, denoting the left-hand panel of the Jacobian matrix shown in the previous section as:

$J_{f, \mathbf{x}} (\mathbf{a}, \mathbf{b}) = \left [ \frac{\partial f_i}{\partial x_j} (\mathbf{a}, \mathbf{b}) \right ],$

the Jacobian matrix of partial derivatives of $g$ in $U$ is given by the matrix product:{{Cite journal |first=Oswaldo |last=de Oliveira |title=The Implicit and Inverse Function Theorems: Easy Proofs |journal=Real Anal. Exchange |volume=39 |issue=1 |year=2013 |doi=10.14321/realanalexch.39.1.0207 |pages=214–216 |s2cid=118792515 |arxiv=1212.2066 }}

$\left[\frac{\partial g_i}{\partial x_j} (\mathbf{x})\right]_{m\times n} =- \left [ J_{f, \mathbf{y}}(\mathbf{x}, g(\mathbf{x})) \right ]_{m \times m} ^{-1} \, \left [ J_{f, \mathbf{x}}(\mathbf{x}, g(\mathbf{x})) \right ]_{m \times n}$

For a proof, see Inverse function theorem#Implicit_function_theorem. Here, the two-dimensional case is detailed.

=Higher derivatives=

If, moreover, $f$ is analytic or continuously differentiable $k$ times in a neighborhood of $(\textbf{a}, \textbf{b})$ , then one may choose $U$ in order that the same holds true for $g$ inside $U$ . {{Cite book |first1=K. | last1=Fritzsche |first2=H. |last2=Grauert |year=2002 |url=https://books.google.com/books?id=jSeRz36zXIMC&pg=PA34 |title=From Holomorphic Functions to Complex Manifolds |publisher=Springer |page=34 |isbn=9780387953953 }} In the analytic case, this is called the analytic implicit function theorem.

The circle example

Let us go back to the example of the unit circle. In this case n = m = 1 and $f(x,y) = x^2 + y^2 - 1$ . The matrix of partial derivatives is just a 1 × 2 matrix, given by

$(Df)(a,b) = \begin{bmatrix} \dfrac{\partial f}{\partial x}(a,b) & \dfrac{\partial f}{\partial y}(a,b) \end{bmatrix} = \begin{bmatrix} 2a & 2b \end{bmatrix}$

Thus, here, the {{math|Y}} in the statement of the theorem is just the number {{math|2b}}; the linear map defined by it is invertible if and only if {{math|b ≠ 0}}. By the implicit function theorem we see that we can locally write the circle in the form {{math|1=y = g(x)}} for all points where {{math|y ≠ 0}}. For {{math|(±1, 0)}} we run into trouble, as noted before. The implicit function theorem may still be applied to these two points, by writing {{mvar|x}} as a function of {{mvar|y}}, that is, $x = h(y)$ ; now the graph of the function will be $\left(h(y), y\right)$ , since where {{math|1=b = 0}} we have {{math|1=a = 1}}, and the conditions to locally express the function in this form are satisfied.

The implicit derivative of y with respect to x, and that of x with respect to y, can be found by totally differentiating the implicit function $x^2+y^2-1$ and equating to 0:

$2x\, dx+2y\, dy = 0,$

giving

$\frac{dy}{dx}=-\frac{x}{y}$

and

$\frac{dx}{dy} = -\frac{y}{x}.$

Application: change of coordinates

Suppose we have an {{mvar|m}}-dimensional space, parametrised by a set of coordinates $(x_1,\ldots,x_m)$ . We can introduce a new coordinate system $(x'_1,\ldots,x'_m)$ by supplying m functions $h_1\ldots h_m$ each being continuously differentiable. These functions allow us to calculate the new coordinates $(x'_1,\ldots,x'_m)$ of a point, given the point's old coordinates $(x_1,\ldots,x_m)$ using $x'_1=h_1(x_1,\ldots,x_m), \ldots, x'_m=h_m(x_1,\ldots,x_m)$ . One might want to verify if the opposite is possible: given coordinates $(x'_1,\ldots,x'_m)$ , can we 'go back' and calculate the same point's original coordinates $(x_1,\ldots,x_m)$ ? The implicit function theorem will provide an answer to this question. The (new and old) coordinates $(x'_1,\ldots,x'_m, x_1,\ldots,x_m)$ are related by f = 0, with

$f(x'_1,\ldots,x'_m,x_1,\ldots, x_m)=(h_1(x_1,\ldots, x_m)-x'_1,\ldots , h_m(x_1,\ldots, x_m)-x'_m).$

Now the Jacobian matrix of f at a certain point (a, b) [ where $a=(x'_1,\ldots,x'_m), b=(x_1,\ldots,x_m)$ ] is given by

$(Df)(a,b) = \left [\begin{matrix}
-1 & \cdots & 0 \\
\vdots & \ddots & \vdots \\
0 & \cdots & -1
\end{matrix}\left|
\begin{matrix}
\frac{\partial h_1}{\partial x_1}(b) & \cdots & \frac{\partial h_1}{\partial x_m}(b)\\
\vdots & \ddots & \vdots\\
\frac{\partial h_m}{\partial x_1}(b) & \cdots & \frac{\partial h_m}{\partial x_m}(b)\\
\end{matrix} \right.\right] = [-I_m |J ].$

where I_m denotes the m × m identity matrix, and {{mvar|J}} is the {{math|m × m}} matrix of partial derivatives, evaluated at (a, b). (In the above, these blocks were denoted by X and Y. As it happens, in this particular application of the theorem, neither matrix depends on a.) The implicit function theorem now states that we can locally express $(x_1,\ldots,x_m)$ as a function of $(x'_1,\ldots,x'_m)$ if J is invertible. Demanding J is invertible is equivalent to det J ≠ 0, thus we see that we can go back from the primed to the unprimed coordinates if the determinant of the Jacobian J is non-zero. This statement is also known as the inverse function theorem.

= Example: polar coordinates =

As a simple application of the above, consider the plane, parametrised by polar coordinates {{math|(R, θ)}}. We can go to a new coordinate system (cartesian coordinates) by defining functions {{math|1=x(R, θ) = R cos(θ)}} and {{math|1=y(R, θ) = R sin(θ)}}. This makes it possible given any point {{math|(R, θ)}} to find corresponding Cartesian coordinates {{math|(x, y)}}. When can we go back and convert Cartesian into polar coordinates? By the previous example, it is sufficient to have {{math|1=det J ≠ 0}}, with

$J =\begin{bmatrix}
\frac{\partial x(R,\theta)}{\partial R} & \frac{\partial x(R,\theta)}{\partial \theta} \\
\frac{\partial y(R,\theta)}{\partial R} & \frac{\partial y(R,\theta)}{\partial \theta} \\
\end{bmatrix}=
\begin{bmatrix}
\cos \theta & -R \sin \theta \\
\sin \theta & R \cos \theta
\end{bmatrix}.$

Since {{math|1=det J = R}}, conversion back to polar coordinates is possible if {{math|1=R ≠ 0}}. So it remains to check the case {{math|1=R = 0}}. It is easy to see that in case {{math|1=R = 0}}, our coordinate transformation is not invertible: at the origin, the value of θ is not well-defined.

Generalizations

= Banach space version =

Based on the inverse function theorem in Banach spaces, it is possible to extend the implicit function theorem to Banach space valued mappings.{{Cite book |last=Lang |first=Serge |author-link=Serge Lang |title=Fundamentals of Differential Geometry |url=https://archive.org/details/fundamentalsdiff00lang_678 |url-access=limited |year=1999 |publisher=Springer | location=New York |series=Graduate Texts in Mathematics |isbn=0-387-98593-X |pages=[https://archive.org/details/fundamentalsdiff00lang_678/page/n15 15]–21 }}{{Cite book |last=Edwards |first=Charles Henry |title=Advanced Calculus of Several Variables |publisher=Dover Publications |location=Mineola, New York |year=1994 |orig-year=1973 |isbn=0-486-68336-2 |pages=417–418 }}

Let X, Y, Z be Banach spaces. Let the mapping {{math|f : X × Y → Z}} be continuously Fréchet differentiable. If $(x_0,y_0)\in X\times Y$ , $f(x_0,y_0)=0$ , and $y\mapsto Df(x_0,y_0)(0,y)$ is a Banach space isomorphism from Y onto Z, then there exist neighbourhoods U of x₀ and V of y₀ and a Fréchet differentiable function g : U → V such that f(x, g(x)) = 0 and f(x, y) = 0 if and only if y = g(x), for all $(x,y)\in U\times V$ .

= Implicit functions from non-differentiable functions =

Various forms of the implicit function theorem exist for the case when the function f is not differentiable. It is standard that local strict monotonicity suffices in one dimension.{{springer |title=Implicit function |id=i/i050310 |last=Kudryavtsev |first=Lev Dmitrievich }} The following more general form was proven by Kumagai based on an observation by Jittorntrum.{{Cite journal |first=K. |last=Jittorntrum |title=An Implicit Function Theorem |journal=Journal of Optimization Theory and Applications |volume=25 |issue=4 |year=1978 |doi=10.1007/BF00933522 |pages=575–577 |s2cid=121647783 }}{{Cite journal |first=S. |last=Kumagai |title=An implicit function theorem: Comment |journal=Journal of Optimization Theory and Applications |volume=31 |issue=2 |year=1980 |doi=10.1007/BF00934117 |pages=285–288 |s2cid=119867925 }}

Consider a continuous function $f : \R^n \times \R^m \to \R^n$ such that $f(x_0, y_0) = 0$ . If there exist open neighbourhoods $A \subset \R^n$ and $B \subset \R^m$ of x₀ and y₀, respectively, such that, for all y in B, $f(\cdot, y) : A \to \R^n$ is locally one-to-one, then there exist open neighbourhoods $A_0 \subset \R^n$ and $B_0 \subset \R^m$ of x₀ and y₀, such that, for all $y \in B_0$ , the equation

f(x, y) = 0 has a unique solution

$x = g(y) \in A_0,$

where g is a continuous function from B₀ into A₀.

= Collapsing manifolds =

Perelman’s collapsing theorem for 3-manifolds, the capstone of his proof of Thurston's geometrization conjecture, can be understood as an extension of the implicit function theorem.{{cite journal |last1=Cao |first1=Jianguo |last2=Ge |first2=Jian |title=A simple proof of Perelman's collapsing theorem for 3-manifolds |journal=J. Geom. Anal. |date=2011 |volume=21 |issue=4 |pages=807–869|doi=10.1007/s12220-010-9169-5 |arxiv=1003.2215 |s2cid=514106 }}