Matrix sign function

{{short description|Generalization of signum function to matrices}}

In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.{{Cite book|last=Higham|first=Nicholas J.|url=https://www.worldcat.org/oclc/693957820|title=Functions of matrices : theory and computation|date=2008|publisher=Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104)|others=Society for Industrial and Applied Mathematics|isbn=978-0-89871-777-8|location=Philadelphia, Pa.|oclc=693957820|author-link1=Nicholas Higham}}

It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.{{Cite journal|last=Roberts|first=J. D.|title=Linear model reduction and solution of the algebraic Riccati equation by use of the sign function|url=https://www.tandfonline.com/doi/full/10.1080/00207178008922881|journal=International Journal of Control|date=October 1980 |language=en|volume=32|issue=4|pages=677–687|doi=10.1080/00207178008922881|issn=0020-7179|url-access=subscription}}{{Cite journal|last1=Denman|first1=Eugene D.|last2=Beavers|first2=Alex N.|date=1976|title=The matrix sign function and computations in systems|url=https://www.sciencedirect.com/science/article/abs/pii/0096300376900205|journal=Applied Mathematics and Computation|language=en|volume=2|issue=1|pages=63–94|doi=10.1016/0096-3003(76)90020-5|issn=0096-3003|url-access=subscription}}

Definition

The matrix sign function is a generalization of the complex signum function

\operatorname{csgn}(z)= \begin{cases}

1 & \text{if } \mathrm{Re}(z) > 0, \\

-1 & \text{if } \mathrm{Re}(z) < 0,

\end{cases}

to the matrix valued analogue \operatorname{csgn}(A). Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).

Properties

Theorem: Let A\in\C^{n\times n}, then \operatorname{csgn}(A)^2 = I.

Theorem: Let A\in\C^{n\times n}, then \operatorname{csgn}(A) is diagonalizable and has eigenvalues that are \pm 1.

Theorem: Let A\in\C^{n\times n}, then (I+\operatorname{csgn}(A))/2 is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for (I-\operatorname{csgn}(A))/2 and the left-half plane.

Theorem: Let A\in\C^{n\times n}, and A = P\begin{bmatrix}J_+ & 0 \\ 0 & J_-\end{bmatrix}P^{-1} be a Jordan decomposition such that J_+ corresponds to eigenvalues with positive real part and J_- to eigenvalue with negative real part. Then \operatorname{csgn}(A) = P\begin{bmatrix}I_+ & 0 \\ 0 & -I_-\end{bmatrix}P^{-1}, where I_+ and I_- are identity matrices of sizes corresponding to J_+ and J_-, respectively.

Computational methods

The function can be computed with generic methods for matrix functions, but there are also specialized methods.

= Newton iteration =

The Newton iteration can be derived by observing that \operatorname{csgn}(x) = \sqrt{x^2}/x, which in terms of matrices can be written as \operatorname{csgn}(A) = A^{-1}\sqrt{A^2}, where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix A^2, that is, the iteration X_{k+1} = \frac{1}{2} \left(X_k + A^{2} X_k^{-1}\right), and define the new iterate Z_k = A^{-1}X_k, we arrive at the iteration

Z_{k+1} = \frac{1}{2}\left(Z_k + Z_k^{-1}\right)

,

where typically Z_0=A. Convergence is global, and locally it is quadratic.

The Newton iteration uses the explicit inverse of the iterates Z_k.

= Newton–Schulz iteration =

To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse, Z_k^{-1}\approx Z_k\left(2I-Z_k^2\right), derived by Schulz(de) in 1933.{{Cite journal|last=Schulz|first=Günther|date=1933|title=Iterative Berechung der reziproken Matrix|url=https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19330130111|journal=ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik|language=en|volume=13|issue=1|pages=57–59|doi=10.1002/zamm.19330130111|bibcode=1933ZaMM...13...57S |issn=1521-4001|url-access=subscription}} Substituting this approximation into the previous method, the new method becomes

Z_{k+1} = \frac{1}{2}Z_k\left(3I - Z_k^2\right)

.

Convergence is (still) quadratic, but only local (guaranteed for \|I-A^2\|<1).

Applications

= Solutions of Sylvester equations =

Theorem: Let A,B,C\in\R^{n\times n} and assume that A and B are stable, then the unique solution to the Sylvester equation, AX +XB = C, is given by X such that

\begin{bmatrix}

-I &2X\\ 0 & I

\end{bmatrix}

=

\operatorname{csgn}

\left(

\begin{bmatrix}

A &-C\\ 0 & -B

\end{bmatrix}

\right).

Proof sketch: The result follows from the similarity transform

\begin{bmatrix}

A &-C\\ 0 & -B

\end{bmatrix}

=

\begin{bmatrix}

I & X \\ 0 & I

\end{bmatrix}

\begin{bmatrix}

A & 0\\ 0 & -B

\end{bmatrix}

\begin{bmatrix}

I & X \\ 0 & I

\end{bmatrix}^{-1},

since

\operatorname{csgn}

\left(

\begin{bmatrix}

A &-C\\ 0 & -B

\end{bmatrix}

\right)

=

\begin{bmatrix}

I & X \\ 0 & I

\end{bmatrix}

\begin{bmatrix}

I & 0\\ 0 & -I

\end{bmatrix}

\begin{bmatrix}

I & -X \\ 0 & I

\end{bmatrix},

due to the stability of A and B.

The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of A and A^T.

= Solutions of algebraic Riccati equations =

There is a similar result applicable to the algebraic Riccati equation, A^H P + P A - P F P + Q = 0 . Define V,W\in\Complex^{2n\times n} as

\begin{bmatrix}

V & W

\end{bmatrix}

=

\operatorname{csgn}

\left(

\begin{bmatrix}

A^H &Q\\ F & -A

\end{bmatrix}

\right)

-

\begin{bmatrix}

I &0\\ 0 & I

\end{bmatrix}.

Under the assumption that F,Q\in\Complex^{n\times n} are Hermitian and there exists a unique stabilizing solution, in the sense that A-FP is stable, that solution is given by the over-determined, but consistent, linear system

VP=-W.

Proof sketch: The similarity transform

\begin{bmatrix}

A^H &Q\\ F & -A

\end{bmatrix}

=

\begin{bmatrix}

P & -I \\ I & 0

\end{bmatrix}

\begin{bmatrix}

-(A-FP) & -F\\ 0 & (A-FP)

\end{bmatrix}

\begin{bmatrix}

P & -I \\ I & 0

\end{bmatrix}^{-1},

and the stability of A-FP implies that

\left(

\operatorname{csgn}

\left(

\begin{bmatrix}

A^H &Q\\ F & -A

\end{bmatrix}

\right)

-

\begin{bmatrix}

I &0\\ 0 & I

\end{bmatrix}

\right)

\begin{bmatrix}

X & -I \\ I & 0

\end{bmatrix}

=

\begin{bmatrix}

X & -I\\ I & 0

\end{bmatrix}

\begin{bmatrix}

0 & Y \\ 0 & -2I

\end{bmatrix},

for some matrix Y\in\Complex^{n\times n} .

= Computations of matrix square-root =

The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that A - PIP=0 is a degenerate algebraic Riccati equation and by definition a solution P is the square root of A.

References