Cobham's theorem

Cobham's theorem is a theorem in combinatorics on words that has important connections with number theory, notably transcendental numbers, and automata theory. Informally, the theorem gives the condition for the members of a set S of natural numbers written in bases b₁ and base b₂ to be recognised by finite automata. Specifically, consider bases b₁ and b₂ such that they are not powers of the same integer. Cobham's theorem states that S written in bases b₁ and b₂ is recognised by finite automata if and only if S differs by a finite set from a finite union of arithmetic progressions. The theorem was proved by Alan Cobham in 1969{{Cite journal|last=Cobham|first=Alan|year=1969|title=On the base-dependence of sets of numbers recognizable by finite automata|journal=Mathematical Systems Theory|volume=3|issue=2|pages=186–192 |doi=10.1007/BF01746527|doi-access=free|mr=0250789}} and has since given rise to many extensions and generalisations.{{Cite book|first1=Fabien|first2=Michel|last1=Durand|last2=Rigo|title=Automata: from Mathematics to Applications|publisher=European Mathematical Society|year=2010|orig-date=Chapter originally written 2010|chapter=On Cobham’s Theorem|editor-last=Pin|editor-first=J.-É.|chapter-url=https://www-fourier.ujf-grenoble.fr/sites/default/files/files/fichiers/cobham010711.pdf}}{{Cite book|first1=Boris|first2=Jason|last1=Adamczewski|last2=Bell|title=Automata: from Mathematics to Applications|publisher=European Mathematical Society|year=2010|orig-date=Chapter originally written 2010|chapter=Automata in number theory|editor-last=Pin|editor-first=J.-É.|chapter-url=https://adamczewski.perso.math.cnrs.fr/Chapter25.pdf}}

Definitions

Let $n>0$ be an integer. The representation of a natural number $n$ in base $b$ is the sequence of digits $n_0n_1\cdots n_h$ such that

: $n=n_0+n_1b+\cdots+n_hb^h$

where $0\le n_0,n_1,\ldots,n_h < b$ and $n_h>0$ . The word $n_0n_1\cdots n_h$ is often denoted $\langle n\rangle_b$ , or more simply, $n_b$ .

A set of natural numbers S is recognisable in base $b$ or more simply $b$ -recognisable or $b$ -automatic if the set $\{n_b\mid n\in S\}$ of the representations of its elements in base $b$ is a language recognisable by a finite automaton on the alphabet $\{0,1,\ldots,b-1\}$ .

Two positive integers $k$ and $\ell$ are multiplicatively independent if there are no non-negative integers $p$ and $q$ such that $k^p=\ell^q$ . For example, 2 and 3 are multiplicatively independent, but 8 and 16 are not since $8^4=16^3$ . Two integers are multiplicatively dependent if and only if they are powers of a same third integer.

Problem statements

= Original problem statement =

More equivalent statements of the theorem have been given. The original version by Cobham is the following: {{Math theorem|Let $S$ be a set of non-negative integers and let $m$ and $n$ be multiplicatively independent positive integers. Then $S$ is recognizable by finite automata in both $m$ -ary and $n$ -ary notation if and only if it is ultimately periodic.

| name = Theorem (Cobham 1969)

}}Another way to state the theorem is by using automatic sequences. Cobham himself calls them "uniform tag sequences.".{{Cite journal|last=Cobham|first=Alan|year=1972|title=Uniform tag sequences|journal=Mathematical Systems Theory|volume=6|issue=1–2|pages=164–192 |doi=10.1007/BF01706087|doi-access=free|mr=0457011}} The following form is found in Allouche and Shallit's book:{{Cite book|last1=Allouche|first1=Jean-Paul|author-link1=:fr:Jean-Paul Allouche|title=Automatic Sequences: theory, applications, generalizations|last2=Shallit|first2=Jeffrey|author-link2=Jeffrey Shallit|publisher=Cambridge University Press|year=2003|isbn=0-521-82332-3|location=Cambridge|page=350}}{{Math theorem|Theorem|Let $k$ and $\ell$ be two multiplicatively independent integers. A sequence is both $k$ -automatic and $\ell$ -automatic only if it is $1$ -automaticA "1-automatic" sequence is a sequence that is ultimately periodic

}}We can show that the characteristic sequence of a set of natural numbers S recognisable by finite automata in base k is a k-automatic sequence and that conversely, for all k-automatic sequences $u$ and all integers $0\le i, the set S_i of natural numbers s such that u_s=i is recognisable in base k .$

= Formulation in logic =

Cobham's theorem can be formulated in first-order logic using a theorem proven by Büchi in 1960.{{cite book | first=J. R. | last=Büchi | chapter=Weak Second-Order Arithmetic and Finite Automata | date=1990 | title=The Collected Works of J. Richard Büchi | series=Z. Math. Logik Grundlagen Math. | volume=6 | page=87 | doi=10.1007/978-1-4613-8928-6_22 | isbn=978-1-4613-8930-9 }} This formulation in logic allows for extensions and generalisations. The logical expression uses the theory{{Cite web|last1=Bruyère|first1=Véronique|author1-link=Véronique Bruyère|year=2010|title=Around Cobham's theorem and some of its extensions|url=http://www.mat.uc.pt/~amgc/daast/slides/daast-bruyere|access-date=19 January 2017|website=Dynamical Aspects of Automata and Semigroup Theories|publisher=Satellite Workshop of Highlights of AutoMathA}}

: $\langle N, +, V_r\rangle$

of natural integers equipped with addition and the function $V_r$ defined by $V_r(0)=1$ and for any positive integer $n$ , $V_r(n)=r^m$ if $r^m$ is the largest power of $r$ that divides $n$ . For example, $V_2(20)=4$ , and $V_3(20)=1$ .

A set of integers $S$ is definable in first-order logic in $\langle N, +, V_r\rangle$ if it can be described by a first-order formula with equality, addition, and $V_r$ .

Examples:

The set of odd numbers is definable (without $V_r$ ) by the formula $(\exists y)(x=y+y+1)$
The set $\{2^n\mid n\ge0\}$ of the powers of 2 is definable by the simple formula $V_2(x)=x$ .

{{Math theorem|Let S be a set of natural numbers, and let $k$ and $\ell$ be two multiplicatively independent positive integers. Then S is first-order definable in

$\langle N, +, V_k\rangle$ and in $\langle N, +, V_\ell\rangle$ if and only if S is ultimately periodic.

| name = Cobham's theorem reformulated

}}We can push the analogy with logic further by noting that S is first-order definable in Presburger arithmetic if and only if it is ultimately periodic. So, a set S is definable in the logics $\langle N, +, V_k\rangle$ and $\langle N, +, V_\ell\rangle$ if and only if it is definable in Presburger arithmetic.

Generalisations

= Approach by morphisms =

An automatic sequence is a particular morphic word, whose morphism is uniform, meaning that the length of the images generated by the morphism for each letter of its input alphabet is the same. A set of integers is hence k-recognisable if and only if its characteristic sequence is generated by a uniform morphism followed by a coding, where a coding is a morphism that maps each letter of the input alphabet to a letter of the output alphabet. For example, the characteristic sequence of the powers of 2 is produced by the 2-uniform morphism (meaning each letter is mapped to a word of length 2) over the alphabet $B=\{a,0,1\}$ defined by

: $a \mapsto a1\ ,\quad 1\mapsto 10\ ,\quad 0\mapsto 00$

which generates the infinite word

: $a11010001\cdots$ ,

followed by the coding (that is, letter to letter) that maps $a$ to $0$ and leaves $0$ and $1$ unchanged, giving

: $011010001\cdots$ .

The notion has been extended as follows:{{Cite journal|last=Durand|first=Fabien|year=2011|title=Cobham's theorem for substitutions|url=http://www.ems-ph.org/journals/show_pdf.php?issn=1435-9855&vol=13&iss=6&rank=8|journal=Journal of the European Mathematical Society|volume=13|issue=6|pages=1797–1812|arxiv=1010.4009|doi=10.4171/JEMS/294|doi-access=free}} a morphic word $s$ is $\alpha$ -substitutive for a certain number $\alpha$ if when written in the form

: $s=\pi(f^\omega(b))$

where the morphism $f:B^*\to B^*$ , prolongable in $b$ , has the following properties:

all letters of $B$ occur in $f^\omega(b)$ , and
$\alpha>1$ is the dominant eigenvalue of the matrix of morphism $f$ , namely, the matrix $M(f)=(m_{x,y})_{x\in B,y\in A}$ , where $m_{x,y}$ is the number of occurrences of the letter $x$ in the word $f(y)$ .

A set S of natural numbers is $\alpha$ -recognisable if its characteristic sequence $s$ is $\alpha$ -substitutive.

A last definition: a Perron number is an algebraic number $z > 1$ such that all its conjugates belong to the disc $\{z' \in \Complex,|z'|< z\}$ . These are exactly the dominant eigenvalues of the primitive matrices of positive integers.

We then have the following statement:{{Math theorem|Let α et β be two multiplicatively independent Perron numbers. Then a sequence x with elements belonging to a finite set is both α-substitutive and β-substitutive if and only if x is ultimately periodic.

| name = Cobham's theorem for substitutions

}}

= Logic approach =

The logic equivalent permits to consider more general situations: the automatic sequences over the natural numbers $\N$ or recognisable sets have been extended to the integers $\Z$ , to the Cartesian products $\N^m$ , to the real numbers $\R$ and to the Cartesian products $\R^m$ .

; Extension to $\Z$

We code the base $k$ integers by prepending to the representation of a positive integer the digit $0$ , and by representing negative integers by $k-1$ followed by the number's $k$ -complement. For example, in base 2, the integer $-6=-8+2$ is represented as $1010$ . The powers of 2 are written as $010^*$ , and their negatives $110^*$ (since $11000$ is the representation of $-16+8=-8$ ).

; Extension to $\N^m$

A subset $X$ of $N^m$ is recognisable in base $k$ if the elements of $X$ , written as vectors with $m$ components, are recognisable over the resulting alphabet.

For example, in base 2, we have $3=11_2$ and $9=1001_2$ ; the vector $\begin{pmatrix}3\\9\end{pmatrix}$ is written as $\begin{pmatrix}0011\\1001\end{pmatrix}=\begin{pmatrix}0\\1\end{pmatrix}\begin{pmatrix}0\\0\end{pmatrix}\begin{pmatrix}1\\0\end{pmatrix}\begin{pmatrix}1\\1\end{pmatrix}$ .{{Math theorem|Let $r$ and $s$ be two multiplicatively independent positive integers. A subset $S$ of $N^m$ is $r$ -recognisable and $s$ -recognisable if and only if $S$ is describable in Presburger arithmetic.

| name = Semenov's theorem (1977){{Cite journal|last=Semenov|first=Alexei Lvovich|year=1977|title=Predicates regular in two number systems are Presburger|journal=Sib. Mat. Zh.|language=ru|volume=18|pages=403–418|doi=10.1007/BF00967164 |mr=0450050|zbl=0369.02023|s2cid=119658350 }}

}}An elegant proof of this theorem is given by Muchnik in 1991 by induction on $m$ .{{Cite journal|last=Muchnik|journal=Theoretical Computer Science|title=The definable criterion for definability in Presburger arithmetic and its applications|year=2003|url=https://core.ac.uk/download/pdf/82333239.pdf|doi=10.1016/S0304-3975(02)00047-6|volume=290|issue=3|pages=1433–1444 |doi-access=free}}

Other extensions have been given to the real numbers and vectors of real numbers.

Proofs

Samuel Eilenberg announced the theorem without proof in his book;{{Cite book|last=Eilenberg|first=Samuel|title=Automata, Languages and Machines, Vol. A|publisher=Academic Press|year=1974|isbn=978-0-12-234001-7|series=Pure and Applied Mathematics|location=New York|pages=xvi+451|issue=58}}.

he says "The proof is correct, long, and hard. It is a challenge to find a more reasonable proof of this fine theorem." Georges Hansel proposed a more simple proof, published in the not-easily accessible proceedings of a conference.{{Cite book|title=Actes de la Fête des mots|place=Rouen|publisher=Greco de programmation, CNRS|year=1982|pages=55–59|last1=Hansel|first1=Georges|language=fr|chapter=À propos d'un théorème de Cobham|editor-last=Perrin|editor-first=D.}} The proof of Dominique Perrin{{Cite book|first1=Dominique|last1=Perrin|editor-last=van Leeuwen|editor-first=Jan|title=Handbook of Theoretical Computer Science|volume=B: Formal Models and Semantics|publisher=Elsevier|year=1990|isbn=978-0444880741|chapter=Finite Automata|pages=1–57}} and that of Allouche and Shallit's book{{Cite book|last1=Allouche|first1=Jean-Paul|author-link1=:fr:Jean-Paul Allouche|title=Automatic Sequences: theory, applications, generalizations|last2=Shallit|first2=Jeffrey|author-link2=Jeffrey Shallit|publisher=Cambridge University Press|year=2003|isbn=0-521-82332-3|location=Cambridge}} contains the same error in one of the lemmas, mentioned in the list of errata of the book.{{cite web |url=https://cs.uwaterloo.ca/~shallit/as-errata.pdf |title=Errata for Automatic Sequences: Theory, Applications, Generalizations |last1=Shallit |first1=Jeffrey |last2=Allouche |first2=Jean-Paul |date=31 March 2020 |access-date=25 June 2021}} This error was uncovered in a note by Tomi Kärki,{{Cite web|last=Tomi Kärki|year=2005|title=A Note on the Proof of Cobham's Theorem|url=http://www.tucs.fi/publications/attachment.php?fname=TR713.pdf|access-date=23 January 2017|website=Rapport Technique n° 713|publisher=University of Turku}} and corrected by Michel Rigo and Laurent Waxweiler.{{Cite journal|last1=Michel Rigo|last2=Laurent Waxweiler|year=2006|title=A Note on Syndeticity, Recognizable Sets and Cobham's Theorem.|url=http://eatcs.org/images/bulletin/beatcs88.pdf|journal=Bulletin of the EATCS|volume=88|pages=169–173 |arxiv=0907.0624|mr=2222340|zbl=1169.68490|access-date=23 January 2017}} This part of the proof has been recently written.{{Cite web|last=Paul Fermé, Willy Quach and Yassine Hamoudi|year=2015|title=Le théorème de Cobham|language=French |trans-title=Cobham's Theorem|url=http://perso.ens-lyon.fr/yassine.hamoudi/wp-content/uploads/2016/07/Cobham.pdf|access-date=24 January 2017|archive-date=2017-02-02|archive-url=https://web.archive.org/web/20170202033005/http://perso.ens-lyon.fr/yassine.hamoudi/wp-content/uploads/2016/07/Cobham.pdf}}

In January 2018, Thijmen J. P. Krebs announced, on Arxiv, a simplified proof of the original theorem, based on Dirichlet's approximation criterion instead of that of Kronecker; the article appeared in 2021.{{Cite journal|last=Krebs|first=Thijmen J. P.|year=2021|title=A More Reasonable Proof of Cobham's Theorem|journal=International Journal of Foundations of Computer Science|volume=32|issue=2|page=203207|arxiv=1801.06704|doi=10.1142/S0129054121500118|s2cid=39850911 |issn=0129-0541}} The employed method has been refined and used by Mol, Rampersad, Shallit and Stipulanti.{{Cite journal|last1=Mol|first1=Lucas|last2=Rampersad|first2=Narad|last3=Shallit|first3=Jeffrey|last4=Stipulanti|first4=Manon|year=2019|title=Cobham's Theorem and Automaticity|journal=International Journal of Foundations of Computer Science|volume=30|issue=8|pages=1363–1379|arxiv=1809.00679|doi=10.1142/S0129054119500308|s2cid=52156852 |issn=0129-0541}}

Notes and references

Bibliography

{{Cite book|last1=Allouche|first1=Jean-Paul|author-link1=:fr:Jean-Paul Allouche|title=Automatic Sequences: theory, applications, generalizations|last2=Shallit|first2=Jeffrey|author-link2=Jeffrey Shallit|publisher=Cambridge University Press|year=2003|isbn=0-521-82332-3|location=Cambridge}}

Category:Theorems in combinatorics

Category:Number theory

Category:Theoretical computer science

Category:Finite-state machines

Category:Information theory

Category:Combinatorics on words