Sinkhorn's theorem

{{short description|Every square matrix with positive entries can be written in a certain standard form}}

Sinkhorn's theorem states that every square matrix with positive entries can be written in a certain standard form.

Theorem

If A is an n × n matrix with strictly positive elements, then there exist diagonal matrices D₁ and D₂ with strictly positive diagonal elements such that D₁AD₂ is doubly stochastic. The matrices D₁ and D₂ are unique modulo multiplying the first matrix by a positive number and dividing the second one by the same number.Sinkhorn, Richard. (1964). "A relationship between arbitrary positive matrices and doubly stochastic matrices." Ann. Math. Statist. 35, 876–879. {{doi|10.1214/aoms/1177703591}}

Marshall, A.W., & Olkin, I. (1967). "Scaling of matrices to achieve specified row and column sums." Numerische Mathematik. 12(1), 83–90. {{doi|10.1007/BF02170999}}

Sinkhorn–Knopp algorithm

A simple iterative method to approach the double stochastic matrix is to alternately rescale all rows and all columns of A to sum to 1. Sinkhorn and Knopp presented this algorithm and analyzed its convergence.Sinkhorn, Richard, & Knopp, Paul. (1967). "Concerning nonnegative matrices and doubly stochastic matrices". Pacific J. Math. 21, 343–348.

This is essentially the same as the Iterative proportional fitting algorithm, well known in survey statistics.

Analogues and extensions

The following analogue for unitary matrices is also true: for every unitary matrix U there exist two diagonal unitary matrices L and R such that LUR has each of its columns and rows summing to 1.{{cite journal|last1=Idel|first1=Martin|last2=Wolf|first2=Michael M.|title=Sinkhorn normal form for unitary matrices|journal=Linear Algebra and Its Applications|date=2015|volume=471|pages=76–84|doi=10.1016/j.laa.2014.12.031|arxiv=1408.5728|s2cid=119175915 }}

The following extension to maps between matrices is also true (see Theorem 5{{cite journal|last1=Georgiou|first1=Tryphon|last2=Pavon|first2=Michele|title=Positive contraction mappings for classical and quantum Schrödinger systems|journal=Journal of Mathematical Physics|date=2015|volume=56|issue=3 |pages=033301–1–24|doi=10.1063/1.4915289|arxiv=1405.6650|bibcode=2015JMP....56c3301G|s2cid=119707158 }} and also Theorem 4.7{{cite journal|last1=Gurvits|first1=Leonid|title=Classical complexity and quantum entanglement|journal=Journal of Computational Science|date=2004|volume=69|issue=3 |pages=448–484|doi=10.1016/j.jcss.2004.06.003|doi-access=free}}): given a Kraus operator

that represents the quantum operation Φ mapping a density matrix into another,

: $S \mapsto \Phi(S) = \sum_i B_i S B_i^*,$

that is trace preserving,

: $\sum_i B_i^* B_i = I,$

and, in addition, whose range is in the interior of the positive definite cone (strict positivity), there exist scalings x_j, for j in {0,1}, that are positive definite so that the rescaled Kraus operator

: $S \mapsto x_1\Phi(x_0^{-1}Sx_0^{-1})x_1 = \sum_i (x_1B_ix_0^{-1}) S (x_1B_ix_0^{-1})^*$

is doubly stochastic. In other words, it is such that both,

: $x_1\Phi(x_0^{-1}I x_0^{-1})x_1 = I,$

as well as for the adjoint,

: $x_0^{-1}\Phi^*(x_1I x_1)x_0^{-1} = I,$

where I denotes the identity operator.

Applications

In the 2010s Sinkhorn's theorem came to be used to find solutions of entropy-regularised optimal transport problems.{{cite conference |url= |title=Sinkhorn distances: Lightspeed computation of optimal transport |last1=Cuturi |first1=Marco |date=2013 |book-title=Advances in neural information processing systems |pages=2292–2300}} This has been of interest in machine learning because such "Sinkhorn distances" can be used to evaluate the difference between data distributions and permutations.{{cite conference |title=Geometric losses for distributional learning |author1=Mensch, Arthur |author2=Blondel, Mathieu |author3=Peyré, Gabriel |date=2019 |arxiv=1905.06005 |book-title=Proc ICML 2019}}{{cite conference |title=Sinkhorn networks: Using optimal transport techniques to learn permutations |author1=Mena, Gonzalo |author2=Belanger, David |author3=Munoz, Gonzalo |author4=Snoek, Jasper |date=2017 |book-title=NIPS Workshop in Optimal Transport and Machine Learning}}{{cite conference |url=https://aclanthology.org/2020.conll-1.3 |title=Neural Proof Nets |author1=Kogkalidis, Konstantinos |author2=Moortgat, Michael |author3=Moot, Richard |date=2020 |book-title=Proceedings of the 24th Conference on Computational Natural Language Learning}} This improves the training of machine learning algorithms, in situations where maximum likelihood training may not be the best method.

References

Category:Matrix theory

Category:Theorems in linear algebra