Min-max theorem#Cauchy interlacing theorem

{{short description|Variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces}}

{{redirect-distinguish|Variational theorem|variational principle}}

In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.

This article first discusses the finite-dimensional case and its applications before considering compact operators on infinite-dimensional Hilbert spaces.

We will see that for compact operators, the proof of the main theorem uses essentially the same idea from the finite-dimensional argument.

In the case that the operator is non-Hermitian, the theorem provides an equivalent characterization of the associated singular values.

The min-max theorem can be extended to self-adjoint operators that are bounded below.

Matrices

Let {{mvar|A}} be a {{math|n × n}} Hermitian matrix. As with many other variational results on eigenvalues, one considers the Rayleigh–Ritz quotient {{math|R_A : Cⁿ \ {0} → R}} defined by

: $R_A(x) = \frac{(Ax, x)}{(x,x)}$

where {{math|(⋅, ⋅)}} denotes the Euclidean inner product on {{math|Cⁿ}}.

Equivalently, the Rayleigh–Ritz quotient can be replaced by

: $f(x) = (Ax, x), \; \|x\| = 1.$

The Rayleigh quotient of an eigenvector $v$ is its associated eigenvalue $\lambda$ because $R_A(v) = (\lambda x, x)/(x, x) = \lambda$ .

For a Hermitian matrix A, the range of the continuous functions R_A(x) and f(x) is a compact interval [a, b] of the real line. The maximum b and the minimum a are the largest and smallest eigenvalue of A, respectively. The min-max theorem is a refinement of this fact.

= Min-max theorem =

Let $A$ be Hermitian on an inner product space $V$ with dimension $n$ , with spectrum ordered in descending order $\lambda_1 \geq ... \geq \lambda_n$ .

Let $v_1, ..., v_n$ be the corresponding unit-length orthogonal eigenvectors.

Reverse the spectrum ordering, so that $\xi_1 = \lambda_n, ..., \xi_n = \lambda_1$ .

{{Math theorem

| name = (Poincaré’s inequality)

| note =

| math_statement = Let $M$ be a subspace of $V$ with dimension $k$ , then there exists unit vectors $x, y\in M$ , such that

$\langle x, Ax\rangle\leq \lambda_k$ , and $\langle y, Ay\rangle \geq \xi_k$ .

}}

{{Math proof|title=Proof|proof=

Part 2 is a corollary, using $-A$ .

$M$ is a $k$ dimensional subspace, so if we pick any list of $n-k+1$ vectors, their span $N := span(v_k, ... v_n)$ must intersect $M$ on at least a single line.

Take unit $x \in M\cap N$ . That’s what we need.

: $x = \sum_{i=k}^n a_i v_i$ , since $x\in N$ .

: Since $\sum_{i=k}^n |a_i|^2 = 1$ , we find $\langle x,Ax \rangle = \sum_{i=k}^n |a_i|^2\lambda_i \leq \lambda_k$ .

}}

{{Math theorem

| name = min-max theorem

| note =

| math_statement = $\begin{aligned}
\lambda_k &=\max _{\begin{array}{c} \mathcal{M} \subset V \\ \operatorname{dim}(\mathcal{M})=k \end{array}} \min _{\begin{array}{c} x \in \mathcal{M} \\ \|x\|=1 \end{array}}\langle x, A x\rangle\\
&=\min _{\begin{array}{c} \mathcal{M} \subset V \\ \operatorname{dim}(\mathcal{M})=n-k+1 \end{array}} \max _{\begin{array}{c} x \in \mathcal{M} \\ \|x\|=1 \end{array}}\langle x, A x\rangle \text{. }
\end{aligned}$

}}

{{Math proof|title=Proof|proof=

Part 2 is a corollary of part 1, by using $-A$ .

By Poincare’s inequality, $\lambda_k$ is an upper bound to the right side.

By setting $\mathcal M = span(v_1, ... v_k)$ , the upper bound is achieved.

}}

Define the partial trace $tr_V(A)$ to be the trace of projection of $A$ to $V$ . It is equal to $\sum_i v_i^*Av_i$ given an orthonormal basis of $V$ .

{{Math theorem|name=Wielandt minimax formula|note={{Cite book |last=Tao |first=Terence |title=Topics in random matrix theory |date=2012 |publisher=American Mathematical Society |isbn=978-0-8218-7430-1 |series=Graduate studies in mathematics |location=Providence, R.I}}{{Pg|page=44}}|math_statement=

Let $1 \leq i_1<\cdots be integers. Define a partial flag to be a nested collection V_1 \subset \cdots \subset V_k of subspaces of \mathbb{C}^n such that \operatorname{dim}\left(V_j\right)=i_j for all 1 \leq j \leq k .$

Define the associated Schubert variety $X\left(V_1, \ldots, V_k\right)$ to be the collection of all $k$ dimensional subspaces $W$ such that $\operatorname{dim}\left(W \cap V_j\right) \geq j$ .

$\lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)=\sup _{V_1, \ldots, V_k} \inf_{W \in X\left(V_1, \ldots, V_k\right)} tr_W(A)$

}}

{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=Proof}}

{{Math proof|title=Proof|proof=

The $\leq$ case.

Let $V_{j} = span(e_1, \dots, e_{i_j})$ , and any $W \in X\left(V_1, \ldots, V_k\right)$ , it remains to show that $\lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A) \leq tr_W(A)$

To show this, we construct an orthonormal set of vectors $v_1, \dots, v_k$ such that $v_j \in V_j \cap W$ . Then $tr_W(A) \geq \sum_j \langle v_j, Av_j\rangle \geq \lambda_{i_j}(A)$

Since $dim(V_1 \cap W) \geq 1$ , we pick any unit $v_1 \in V_1 \cap W$ . Next, since $dim(V_2 \cap W) \geq 2$ , we pick any unit $v_2 \in (V_2 \cap W)$ that is perpendicular to $v_1$ , and so on.

The $\geq$ case.

For any such sequence of subspaces $V_i$ , we must find some $W \in X\left(V_1, \ldots, V_k\right)$ such that $\lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A) \geq tr_W(A)$

Now we prove this by induction.

The $n=1$ case is the Courant-Fischer theorem. Assume now $n \geq 2$ .

If $i_1 \geq 2$ , then we can apply induction. Let $E = span(e_{i_1}, \dots, e_n)$ . We construct a partial flag within $E$ from the intersection of $E$ with $V_1, \dots, V_k$ .

We begin by picking a $(i_k-(i_1-1))$ -dimensional subspace $W_k' \subset E \cap V_{i_k}$ , which exists by counting dimensions. This has codimension $(i_1-1)$ within $V_{i_k}$ .

Then we go down by one space, to pick a $(i_{k-1} - (i_1 - 1))$ -dimensional subspace $W_{k-1}' \subset W_k \cap V_{i_{k-1}}$ . This still exists. Etc. Now since $dim(E) \leq n-1$ , apply the induction hypothesis, there exists some $W \in X(W_1, \dots, W_k)$ such that $\lambda_{i_1 - (i_1-1)}(A|E)+\cdots+\lambda_{i_k- (i_1-1)}(A|E) \geq tr_W(A)$ Now $\lambda_{i_j - (i_1-1)}(A|E)$ is the $(i_j-(i_1-1))$ -th eigenvalue of $A$ orthogonally projected down to $E$ . By Cauchy interlacing theorem, $\lambda_{i_j - (i_1-1)}(A|E) \leq \lambda_{i_j}(A)$ . Since $X(W_1, \dots, W_k)\subset X(V_1, \dots, V_k)$ , we’re done.

If $i_1 = 1$ , then we perform a similar construction. Let $E = span(e_{2}, \dots, e_n)$ . If $V_k \subset E$ , then we can induct. Otherwise, we construct a partial flag sequence $W_2, \dots, W_k$ By induction, there exists some $W' \in X(W_2, \dots, W_k)\subset X(V_2, \dots, V_k)$ , such that $\lambda_{i_2-1}(A|E)+\cdots+\lambda_{i_k-1}(A|E) \geq tr_{W'}(A)$ thus

$\lambda_{i_2}(A)+\cdots+\lambda_{i_k}(A) \geq tr_{W'}(A)$ And it remains to find some $v$ such that $W' \oplus v \in X(V_1, \dots, V_k)$ .

If $V_1 \not\subset W'$ , then any $v \in V_1 \setminus W'$ would work. Otherwise, if $V_2 \not\subset W'$ , then any $v \in V_2 \setminus W'$ would work, and so on. If none of these work, then it means $V_k \subset E$ , contradiction.

}}{{hidden end}}

This has some corollaries:{{Pg|page=44}}

{{Math theorem|name=Extremal partial trace|note=|math_statement=

$\lambda_1(A)+\dots+\lambda_k(A)=\sup_{\operatorname{dim}(V)=k }tr_V(A)$

$\xi_1(A)+\dots+\xi_k(A)=\inf_{\operatorname{dim}(V)=k }tr_V(A)$

}}

{{Math theorem|name=Corollary|note=|math_statement=

The sum $\lambda_1(A)+\dots+\lambda_k(A)$ is a convex function, and $\xi_1(A)+\dots+\xi_k(A)$ is concave.

(Schur-Horn inequality) $\xi_1(A)+\dots+\xi_k(A) \leq a_{i_1,i_1} + \dots + a_{i_k,i_k} \leq \lambda_1(A)+\dots+\lambda_k(A)$ for any subset of indices.

Equivalently, this states that the diagonal vector of $A$ is majorized by its eigenspectrum.

}}

{{Math theorem|name=Schatten-norm Hölder inequality|note=|math_statement=

Given Hermitian $A, B$ and Hölder pair $1/p + 1/q = 1$ , $|\operatorname{tr}(A B)| \leq\|A\|_{S^p}\|B\|_{S^q}$

}}

{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=Proof}}

{{Math proof|title=Proof|proof=

WLOG, $B$ is diagonalized, then we need to show $|\sum_i B_{ii} A_{ii} | \leq \|A \|_{S^p} \|(B_{ii})\|_{l^q}$

By the standard Hölder inequality, it suffices to show $\|(A_{ii})\|_{l^p}\leq \|A \|_{S^p}$

By the Schur-Horn inequality, the diagonals of $A$ are majorized by the eigenspectrum of $A$ , and since the map $f(x_1, \dots, x_n) = \|x\|_p$ is symmetric and convex, it is Schur-convex.

}}{{hidden end}}

= Counterexample in the non-Hermitian case =

Let N be the nilpotent matrix

: $\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}.$

Define the Rayleigh quotient $R_N(x)$ exactly as above in the Hermitian case. Then it is easy to see that the only eigenvalue of N is zero, while the maximum value of the Rayleigh quotient is {{math|{{sfrac|1|2}}}}. That is, the maximum value of the Rayleigh quotient is larger than the maximum eigenvalue.

Applications

= Min-max principle for singular values =

The singular values {σ_k} of a square matrix M are the square roots of the eigenvalues of M*M (equivalently MM*). An immediate consequence{{Citation needed|reason=claim is unreferenced and maybe suspicious|date=April 2014}} of the first equality in the min-max theorem is:

: $\sigma_k^{\downarrow} = \max_{S:\dim(S)=k} \min_{x \in S, \|x\| = 1} (M^* Mx, x)^{\frac{1}{2}}=\max_{S:\dim(S)=k} \min_{x \in S, \|x\| = 1} \| Mx \|.$

Similarly,

: $\sigma_k^{\downarrow} = \min_{S:\dim(S)=n-k+1} \max_{x \in S, \|x\| = 1} \| Mx \|.$

Here $\sigma_k^{\downarrow}$ denotes the k^th entry in the decreasing sequence of the singular values, so that $\sigma_1^{\downarrow} \geq \sigma_2^{\downarrow} \geq \cdots$ .

= Cauchy interlacing theorem =

Let {{mvar|A}} be a symmetric n × n matrix. The m × m matrix B, where m ≤ n, is called a compression of {{mvar|A}} if there exists an orthogonal projection P onto a subspace of dimension m such that PAP* = B. The Cauchy interlacing theorem states:

:Theorem. If the eigenvalues of {{mvar|A}} are {{math|α₁ ≤ ... ≤ α_n}}, and those of B are {{math|β₁ ≤ ... ≤ β_j ≤ ... ≤ β_m}}, then for all {{math|j ≤ m}},

:: $\alpha_j \leq \beta_j \leq \alpha_{n-m+j}.$

This can be proven using the min-max principle. Let β_i have corresponding eigenvector b_i and S_j be the j dimensional subspace {{math|S_j {{=}} span{b₁, ..., b_j},}} then

: $\beta_j = \max_{x \in S_j, \|x\| = 1} (Bx, x) = \max_{x \in S_j, \|x\| = 1} (PAP^*x, x) \geq \min_{S_j} \max_{x \in
S_j, \|x\| = 1} (A(P^*x), P^*x) = \alpha_j.$

According to first part of min-max, {{math|α_j ≤ β_j.}} On the other hand, if we define {{math|S_m−j+1 {{=}} span{b_j, ..., b_m},}} then

: $\beta_j = \min_{x \in S_{m-j+1}, \|x\| = 1} (Bx, x) = \min_{x \in S_{m-j+1}, \|x\| = 1} (PAP^*x, x)= \min_{x \in S_{m-j+1}, \|x\| = 1} (A(P^*x), P^*x) \leq \alpha_{n-m+j},$

where the last inequality is given by the second part of min-max.

When {{math|n − m {{=}} 1}}, we have {{math|α_j ≤ β_j ≤ α_j+1}}, hence the name interlacing theorem.

= Lidskii's inequality =

{{Math theorem

| name = Lidskii inequality

| note =

| math_statement = If $1 \leq i_1<\cdots then \begin{aligned}
& \lambda_{i_1}(A+B)+\cdots+\lambda_{i_k}(A+B) \\
& \quad \leq \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)+\lambda_1(B)+\cdots+\lambda_k(B)
\end{aligned}$

$\begin{aligned}
& \lambda_{i_1}(A+B)+\cdots+\lambda_{i_k}(A+B) \\
& \quad \geq \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)+\xi_1(B)+\cdots+\xi_k(B)
\end{aligned}$

}}

{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=Proof}}

{{Math proof|title=Proof|proof=

The second is the negative of the first. The first is by Wielandt minimax.

$\begin{aligned}
& \lambda_{i_1}(A+B)+\cdots+\lambda_{i_k}(A+B) \\
=& \sup_{V_1, \dots, V_k} \inf_{W\in X(V_1, \dots, V_k)}(tr_W(A) + tr_W(B)) \\
=& \sup_{V_1, \dots, V_k} ( \inf_{W\in X(V_1, \dots, V_k)} tr_W(A) + tr_W(B)) \\
\leq& \sup_{V_1, \dots, V_k} ( \inf_{W\in X(V_1, \dots, V_k)} tr_W(A) + (\lambda_1(B)+\cdots+\lambda_k(B))) \\
=& \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)+\lambda_1(B)+\cdots+\lambda_k(B)
\end{aligned}$

}}{{hidden end}}

Note that $\sum_i \lambda_i(A+B) = tr(A+B) = \sum_i \lambda_i(A) + \lambda_i(B)$ . In other words, $\lambda(A+B) - \lambda(A) \preceq \lambda(B)$ where $\preceq$ means majorization. By the Schur convexity theorem, we then have

{{Math theorem

| name = p-Wielandt-Hoffman inequality

| note =

| math_statement = $\|\lambda(A+B) - \lambda(A)\|_{\ell^p} \leq \|B\|_{S^p}$ where $\|\cdot\|_{S^p}$ stands for the p-Schatten norm.

}}

Compact operators

Let {{mvar|A}} be a compact, Hermitian operator on a Hilbert space H. Recall that the spectrum of such an operator (the set of eigenvalues) is a set of real numbers whose only possible cluster point is zero.

It is thus convenient to list the positive eigenvalues of {{mvar|A}} as

: $\cdots \le \lambda_k \le \cdots \le \lambda_1,$

where entries are repeated with multiplicity, as in the matrix case. (To emphasize that the sequence is decreasing, we may write $\lambda_k = \lambda_k^\downarrow$ .)

When H is infinite-dimensional, the above sequence of eigenvalues is necessarily infinite.

We now apply the same reasoning as in the matrix case. Letting S_k ⊂ H be a k dimensional subspace, we can obtain the following theorem.

:Theorem (Min-Max). Let {{mvar|A}} be a compact, self-adjoint operator on a Hilbert space {{mvar|H}}, whose positive eigenvalues are listed in decreasing order {{math|... ≤ λ_k ≤ ... ≤ λ₁}}. Then:

:: $\begin{align}
\max_{S_k} \min_{x \in S_k, \|x\| = 1} (Ax,x) &= \lambda_k ^{\downarrow}, \\
\min_{S_{k-1}} \max_{x \in S_{k-1}^{\perp}, \|x\|=1} (Ax, x) &= \lambda_k^{\downarrow}.
\end{align}$

A similar pair of equalities hold for negative eigenvalues.

{{Math proof|drop=hidden|proof=

Let S' be the closure of the linear span $S' =\operatorname{span}\{u_k,u_{k+1},\ldots\}$ .

The subspace S' has codimension k − 1. By the same dimension count argument as in the matrix case, S' ∩ S_k has positive dimension. So there exists x ∈ S' ∩ S_k with $\|x\|=1$ . Since it is an element of S' , such an x necessarily satisfy

: $(Ax, x) \le \lambda_k.$

Therefore, for all S_k

: $\inf_{x \in S_k, \|x\| = 1}(Ax,x) \le \lambda_k$

But {{mvar|A}} is compact, therefore the function f(x) = (Ax, x) is weakly continuous. Furthermore, any bounded set in H is weakly compact. This lets us replace the infimum by minimum:

: $\min_{x \in S_k, \|x\| = 1}(Ax,x) \le \lambda_k.$

: $\sup_{S_k} \min_{x \in S_k, \|x\| = 1}(Ax,x) \le \lambda_k.$

Because equality is achieved when $S_k=\operatorname{span}\{u_1,\ldots,u_k\}$ ,

: $\max_{S_k} \min_{x \in S_k, \|x\| = 1}(Ax,x) = \lambda_k.$

This is the first part of min-max theorem for compact self-adjoint operators.

Analogously, consider now a {{math|(k − 1)}}-dimensional subspace S_k−1, whose the orthogonal complement is denoted by S_k−1^⊥. If S' = span{u₁...u_k},

: $S' \cap S_{k-1}^{\perp} \ne {0}.$

: $\exists x \in S_{k-1}^{\perp} \, \|x\| = 1, (Ax, x) \ge \lambda_k.$

This implies

: $\max_{x \in S_{k-1}^{\perp}, \|x\| = 1} (Ax, x) \ge \lambda_k$

where the compactness of A was applied. Index the above by the collection of k-1-dimensional subspaces gives

: $\inf_{S_{k-1}} \max_{x \in S_{k-1}^{\perp}, \|x\|=1} (Ax, x) \ge \lambda_k.$

Pick S_k−1 = span{u₁, ..., u_k−1} and we deduce

: $\min_{S_{k-1}} \max_{x \in S_{k-1}^{\perp}, \|x\|=1} (Ax, x) = \lambda_k.$

}}

Self-adjoint operators

The min-max theorem also applies to (possibly unbounded) self-adjoint operators.G. Teschl, Mathematical Methods in Quantum Mechanics (GSM 99) https://www.mat.univie.ac.at/~gerald/ftp/book-schroe/schroe.pdf{{cite book |last1=Lieb |last2=Loss |title=Analysis |edition=2nd |series=GSM |volume=14 |location=Providence |publisher=American Mathematical Society |year=2001 |isbn=0-8218-2783-9 }} Recall the essential spectrum is the spectrum without isolated eigenvalues of finite multiplicity.

Sometimes we have some eigenvalues below the essential spectrum, and we would like to approximate the eigenvalues and eigenfunctions.

:Theorem (Min-Max). Let A be self-adjoint, and let $E_1\le E_2\le E_3\le\cdots$ be the eigenvalues of A below the essential spectrum. Then

$E_n=\min_{\psi_1,\ldots,\psi_{n}}\max\{\langle\psi,A\psi\rangle:\psi\in\operatorname{span}(\psi_1,\ldots,\psi_{n}), \, \| \psi \| = 1\}$ .

If we only have N eigenvalues and hence run out of eigenvalues, then we let $E_n:=\inf\sigma_{ess}(A)$ (the bottom of the essential spectrum) for n>N, and the above statement holds after replacing min-max with inf-sup.

:Theorem (Max-Min). Let A be self-adjoint, and let $E_1\le E_2\le E_3\le\cdots$ be the eigenvalues of A below the essential spectrum. Then

$E_n=\max_{\psi_1,\ldots,\psi_{n-1}}\min\{\langle\psi,A\psi\rangle:\psi\perp\psi_1,\ldots,\psi_{n-1}, \, \| \psi \| = 1\}$ .

If we only have N eigenvalues and hence run out of eigenvalues, then we let $E_n:=\inf\sigma_{ess}(A)$ (the bottom of the essential spectrum) for n > N, and the above statement holds after replacing max-min with sup-inf.

The proofs use the following results about self-adjoint operators:

:Theorem. Let A be self-adjoint. Then $(A-E)\ge0$ for $E\in\mathbb{R}$ if and only if $\sigma(A)\subseteq[E,\infty)$ .{{rp|77}}

:Theorem. If A is self-adjoint, then

$\inf\sigma(A)=\inf_{\psi\in\mathfrak{D}(A),\|\psi\|=1}\langle\psi,A\psi\rangle$

and

$\sup\sigma(A)=\sup_{\psi\in\mathfrak{D}(A),\|\psi\|=1}\langle\psi,A\psi\rangle$ .{{rp|77}}

Min-max theorem#Cauchy interlacing theorem

Matrices

= Min-max theorem =

= Counterexample in the non-Hermitian case =

Applications

= Min-max principle for singular values =

= Cauchy interlacing theorem =

= Lidskii's inequality =

Compact operators

Self-adjoint operators

See also

References

Min-max theorem#Cauchy interlacing theorem

Matrices

= Min-max theorem =

= Counterexample in the non-Hermitian case =

Applications

= Min-max principle for singular values =

= Cauchy interlacing theorem =

= Lidskii's inequality =

Compact operators

Self-adjoint operators

See also

References

External links and citations to related work