square-free word

In combinatorics, a square-free word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form {{mvar|XX}}, where {{mvar|X}} is not empty. Thus, a square-free word can also be defined as a word that avoids the pattern {{mvar|XX}}.

Finite square-free words

= Binary alphabet =

Over a binary alphabet $\{0,1\}$ , the only square-free words are the empty word $\epsilon,0,1,01,10,010$ , and $101$ .

= Ternary alphabet =

Over a ternary alphabet $\{0,1,2\}$ , there are infinitely many square-free words. It is possible to count the number $c(n)$ of ternary square-free words of length {{mvar|n}}.

class="wikitable" \|+The number of ternary square-free words of length n{{Cite web\|url=https://oeis.org/A006156\|title=A006156 - OEIS\|website=oeis.org\|access-date=2019-03-28}} !{{mvar\|n}} \|0 \|1 \|2 \|3 \|4 \|5 \|6 \|7 \|8 \|9 \|10 \|11 \|12
{{tmath\|c(n)}} \|1 \|3 \|6 \|12 \|18 \|30 \|42 \|60 \|78 \|108 \|144 \|204 \|264

class="wikitable"

|+The number of ternary square-free words of length n{{Cite web|url=https://oeis.org/A006156|title=A006156 - OEIS|website=oeis.org|access-date=2019-03-28}}

!{{mvar|n}}

|10

|11

|12

|18

|30

|42

|60

|78

|108

|144

|204

|264

This number is bounded by $c(n) = \Theta(\alpha^n)$ , where $1.3017597 < \alpha < 1.3017619$ .{{Cite journal|last=Shur|first=Arseny|date=2011|title=Growth properties of power-free languages|journal=Computer Science Review|volume=6|issue=5–6|pages=28–43|doi=10.1016/j.cosrev.2012.09.001}} The upper bound on $\alpha$ can be found via Fekete's Lemma and approximation by automata. The lower bound can be found by finding a substitution that preserves square-freeness.

= Alphabet with more than three letters =

Since there are infinitely many square-free words over three-letter alphabets, this implies there are also infinitely many square-free words over an alphabet with more than three letters.

The following table shows the exact growth rate of the {{mvar|k}}-ary square-free words, rounded off to 7 digits after the decimal point, for {{mvar|k}} in the range from 4 to 15:

class="wikitable" \|+Growth rate of the {{mvar\|k}}-ary square-free words !alphabet size ({{mvar\|k}}) \|4 \|5 \|6 \|7 \|8 \|9
growth rate \|2.6215080 \|3.7325386 \|4.7914069 \|5.8284661 \|6.8541173 \|7.8729902
alphabet size ({{mvar\|k}}) \|10 \|11 \|12 \|13 \|14 \|15
growth rate \|8.8874856 \|9.8989813 \|10.9083279 \|11.9160804 \|12.9226167 \|13.9282035

class="wikitable"

|+Growth rate of the {{mvar|k}}-ary square-free words

!alphabet size ({{mvar|k}})

growth rate

|2.6215080

|3.7325386

|4.7914069

|5.8284661

|6.8541173

|7.8729902

alphabet size ({{mvar|k}})

|10

|11

|12

|13

|14

|15

growth rate

|8.8874856

|9.8989813

|10.9083279

|11.9160804

|12.9226167

|13.9282035

== 2-dimensional words ==

Consider a map $\textbf{w}$ from $\mathbb{N}^2$ to {{mvar|A}}, where {{mvar|A}} is an alphabet and $\textbf{w}$ is called a 2-dimensional word. Let $w_{m,n}$ be the entry $\textbf{w}(m,n)$ . A word $\textbf{x}$ is a line of $\textbf{w}$ if there exists $i_1,i_2,j_1, j_2$ such that $\text{gcd}(j_1, j_2) = 1$ , and for $t \ge 0, x_t = w_{{i_1}+{j_1t},{i_2}+{j_2t}}$ .{{Citation|chapter=Preface|pages=xi–xviii|editor-last=Berthe|editor-first=Valerie|publisher=Cambridge University Press|isbn=9781139924733|editor2-last=Rigo|editor2-first=Michel|doi=10.1017/cbo9781139924733.001|title=Combinatorics, Words and Symbolic Dynamics|year=2016}}

Carpi{{Cite journal|last=Carpi|first=Arturo|date=1988|title=Multidimensional unrepetitive configurations|journal=Theoretical Computer Science|volume=56|issue=2|pages=233–241|doi=10.1016/0304-3975(88)90080-1|issn=0304-3975|doi-access=free}} proves that there exists a 2-dimensional word $\textbf{w}$ over a 16-letter alphabet such that every line of $\textbf{w}$ is square-free. A computer search shows that there are no 2-dimensional words $\textbf{w}$ over a 7-letter alphabet, such that every line of $\textbf{w}$ is square-free.

= Generating finite square-free words =

Shur{{Cite journal|last=Shur|first=Arseny|date=2015|title=Generating square-free words efficiently|journal=Theoretical Computer Science|volume=601|pages=67–72|doi=10.1016/j.tcs.2015.07.027|doi-access=free|hdl=10995/92700|hdl-access=free}} proposes an algorithm called R2F (random-t(w)o-free) that can generate a square-free word of length {{mvar|n}} over any alphabet with three or more letters. This algorithm is based on a modification of entropy compression: it randomly selects letters from a k-letter alphabet to generate a {{tmath|(k+1)}}-ary square-free word.

algorithm R2F is

input: alphabet size $k \ge 2$ ,

word length $n > 1$

output: a {{tmath|(k+1)}}-ary square-free word {{mvar|w}} of length {{mvar|n}}.

{{gray|(Note that $\color{gray}\Sigma_{k+1}$ is the alphabet with letters $\color{gray}\{1,...,k+1\}$ .)

(For a word $\color{gray}w \in \Sigma_k$ , $\color{gray}\chi_w$ is the permutation of $\color{gray}\Sigma_k$ such that {{mvar|a}} precedes {{mvar|b}} in $\color{gray}\chi_w$ if the

right most position of {{mvar|a}} in {{mvar|w}} is to the right of the rightmost position of {{mvar|b}} in {{mvar|w}}.

For example, $\color{gray}w=136263163\in \Sigma_6$ has $\color{gray}\chi_w=361245$ .)}}

choose $w[1]$ in $\Sigma_{k+1}$ uniformly at random

set $\chi_w$ to $w[1]$ followed by all other letters of $\Sigma_{k+1}$ in increasing order

set the number {{mvar|N}} of iterations to 0

while $|w| < n$ do

choose {{mvar|j}} in $\Sigma_{k}$ uniformly at random

append $a = \chi_w[j+1]$ to the end of {{mvar|w}}

update $\chi_w$ shifting the first {{mvar|j}} elements to the right and setting $\chi_w[1] = a$

increment {{mvar|N}} by {{mvar|1}}

if {{mvar|w}} ends with a square of rank {{mvar|r}} then

delete the last {{mvar|r}} letters of {{mvar|w}}

return {{mvar|w}}

Every (k+1)-ary square-free word can be the output of Algorithm R2F, because on each iteration it can append any letter except for the last letter of {{mvar|w}}.

The expected number of random k-ary letters used by Algorithm R2F to construct a {{tmath|(k+1)}}-ary square-free word of length {{mvar|n}} is $N=n\left(1 + \frac 2 {k^2} + \frac 1 {k^3} + \frac 4 {k^4} + O\left(\frac 1 {k^5}\right)\right)+O(1).$ Note that there exists an algorithm that can verify the square-freeness of a word of length {{mvar|n}} in $O(n \log n)$ time. Apostolico and Preparata{{Cite journal|last1=Apostolico|first1=A.|last2=Preparata|first2=F.P.|date=Feb 1983|title=Optimal off-line detection of repetitions in a string|journal=Theoretical Computer Science|volume=22|issue=3|pages=297–315|doi=10.1016/0304-3975(83)90109-3|issn=0304-3975|doi-access=free}} give an algorithm using suffix trees. Crochemore{{Cite journal|last=Crochemore|first=Max|date=Oct 1981|title=An optimal algorithm for computing the repetitions in a word|journal=Information Processing Letters|volume=12|issue=5|pages=244–250|doi=10.1016/0020-0190(81)90024-7|issn=0020-0190}} uses partitioning in his algorithm. Main and Lorentz{{Cite journal|last1=Main|first1=Michael G|last2=Lorentz|first2=Richard J|date=Sep 1984|title=An O(n log n) algorithm for finding all repetitions in a string|journal=Journal of Algorithms|volume=5|issue=3|pages=422–432|doi=10.1016/0196-6774(84)90021-x|issn=0196-6774}} provide an algorithm based on the divide-and-conquer method. A naive implementation may require $O(n^2)$ time to verify the square-freeness of a word of length {{mvar|n}}.

Infinite square-free words

There exist infinitely long square-free words in any alphabet with three or more letters, as proved by Axel Thue.{{Cite book|title=Axel Thue's papers on repetitions in words a translation|last=Berstel|first=Jean|publisher=Départements de mathématiques et d'informatique, Université du Québec à Montréal|year=1994|isbn=978-2892761405|oclc=494791187}}

=Examples=

== First difference of the [[Thue–Morse sequence]] ==

One example of an infinite square-free word over an alphabet of size 3 is the word over the alphabet $\{-1,0,+1\}$ obtained by taking the first difference of the Thue–Morse sequence. That is, from the Thue–Morse sequence

: $0, 1 ,1, 0 ,1 ,0 ,0 ,1, 1 ,0 ,0 ,1, 0 ,1 ,1, 0 ...$

one forms a new sequence in which each term is the difference of two consecutive terms of the Thue–Morse sequence. The resulting square-free word is

: $1,0,-1,1,-1,0,1,0,-1,0,1,-1,1,0,-1,...$ {{OEIS|id=A029883}}.

== [[John Leech (mathematician)|Leech]]'s [[morphism]] ==

Another example found by John Leech{{cite journal|last=Leech|first=J.|author-link=John Leech (mathematician)|year=1957|title=A problem on strings of beads|journal=Math. Gaz.|volume=41|pages=277–278|zbl=0079.01101|doi=10.1017/S0025557200236115|s2cid=126406225 }} is defined recursively over the alphabet $\{0,1,2\}$ . Let {{tmath|w_1}} be any square-free word starting with the letter {{val|0}}. Define the words $\{w_i \mid i \in \mathbb{N} \}$ recursively as follows: the word $w_{i+1}$ is obtained from {{tmath|w_i}} by replacing each {{val|0}} in {{tmath|w_i}} with {{val|fmt=none|0121021201210}}, each {{val|1}} with {{val|fmt=none|1202102012021}}, and each {{val|2}} with {{val|fmt=none|2010210120102}}. It is possible to prove that the sequence converges to the infinite square-free word

:{{math|0121021201210120210201202120102101201021202102012021...}}

= Generating infinite square-free words =

Infinite square-free words can be generated by square-free morphism. A morphism is called square-free if the image of every square-free word is square-free. A morphism is called k–square-free if the image of every square-free word of length k is square-free.

Crochemore{{Cite book|last=Berstel|first=Jean|date=April 1984|chapter=Some Recent Results on Squarefree Words|title=Annual Symposium on Theoretical Aspects of Computer Science|volume=166|pages=14–25|doi=10.1007/3-540-12920-0_2|series=Lecture Notes in Computer Science|isbn=978-3-540-12920-2|chapter-url=http://www.numdam.org/item/PDML_1985___2B_21_0/}} proves that a uniform morphism {{mvar|h}} is square-free if and only if it is 3-square-free. In other words, {{mvar|h}} is square-free if and only if $h(w)$ is square-free for all square-free {{mvar|w}} of length 3. It is possible to find a square-free morphism by brute-force search.

algorithm square-free_morphism is

output: a square-free morphism with the lowest possible rank {{mvar|k}}.

set $k = 3$

while True do

set k_sf_words to the list of all square-free words of length {{mvar|k}} over a ternary alphabet

for each $h(0)$ in k_sf_words do

for each $h(1)$ in k_sf_words do

for each $h(2)$ in k_sf_words do

if $h(1) = h(2)$ then

break from the current loop (advance to next $h(1)$ )

if $h(0) \ne h(1)$ and $h(2) \ne h(0)$ then

if $h(w)$ is square-free for all square-free {{mvar|w}} of length {{val|3}} then

return $h(0), h(1), h(2)$

increment {{mvar|k}} by {{val|1}}

Over a ternary alphabet, there are exactly 144 uniform square-free morphisms of rank 11 and no uniform square-free morphisms with a lower rank than 11.

To obtain an infinite square-free words, start with any square-free word such as {{val|0}}, and successively apply a square-free morphism {{mvar|h}} to it. The resulting words preserve the property of square-freeness. For example, let {{mvar|h}} be a square-free morphism, then as $w \to \infty$ , $h^{w}(0)$ is an infinite square-free word.

Note that, if a morphism over a ternary alphabet is not uniform, then this morphism is square-free if and only if it is 5-square-free.

Letter combinations in square-free words

File:Squarefreeness-3--Bound.png

= Avoid two-letter combinations =

Over a ternary alphabet, a square-free word of length more than 13 contains all the square-free two-letter combinations.{{Cite arXiv |eprint=1505.00019|last1=Zolotov|first1=Boris|title=Another Solution to the Thue Problem of Non-Repeating Words|class=math.CO|year=2015}}

This can be proved by constructing a square-free word without the two-letter combination {{mvar|ab}}. As a result, {{mvar|bcba}}{{mvar|cbca}}{{mvar|cbaca}} is the longest square-free word without the combination {{mvar|ab}} and its length is equal to 13.

Note that over a more than three-letter alphabet there are square-free words of any length without an arbitrary two-letter combination.

= Avoid three-letter combinations =

Over a ternary alphabet, a square-free word of length more than 36 contains all the square-free three-letter combinations.

Note that over a more than three-letter alphabet there are square-free words of any length without an arbitrary three-letter combination.

= Density of a letter =

The density of a letter {{mvar|a}} in a finite word {{mvar|w}} is defined as $\frac$

w|_a}{|w

where

|w|_a

is the number of occurrences of {{mvar|a}} in

w

and

|w|

is the length of the word. The density of a letter {{mvar|a}} in an infinite word is

\liminf_{l\to \infty}\frac

w_l|_a}{|w_l

where

w_l

is the prefix of the word {{mvar|w}} of length {{mvar|l}}.{{Cite journal|last=Khalyavin|first=Andrey|date=2007|title=The minimal density of a letter in an infinite ternary square-free word is 883/3215|url=http://emis.de/journals/JIS/VOL10/Khalyavin/khalyavin13.pdf|journal=Journal of Integer Sequences|volume=10|issue=2|pages=3|bibcode=2007JIntS..10...65K}}

The minimal density of a letter {{mvar|a}} in an infinite ternary square-free word is equal to $\frac{883}{3215} \approx 0.2747$ .

The maximum density of a letter {{mvar|a}} in an infinite ternary square-free word is equal to $\frac{255}{653} \approx 0.3905$ .{{Cite journal|last=Ochem|first=Pascal|date=2007|title=Letter frequency in infinite repetition-free words|journal=Theoretical Computer Science|volume=380|issue=3|pages=388–392|doi=10.1016/j.tcs.2007.03.027|issn=0304-3975|doi-access=}}

Notes

References

{{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | author1-link = Jean Berstel | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | location=Providence, RI | publisher=American Mathematical Society | year=2009 | isbn=978-0-8218-4480-9 | url=http://www.ams.org/bookpages/crmm-27 | zbl=1161.68043 }}
{{cite book | last=Lothaire | first=M. | author-link=M. Lothaire | title=Combinatorics on words | publisher=Cambridge University Press | location=Cambridge | year= 1997 | isbn= 978-0-521-59924-5 }}.
{{cite book | last=Lothaire | first=M. | author-link=M. Lothaire | title=Algebraic combinatorics on words | others=With preface by Jean Berstel and Dominique Perrin | edition=Reprint of the 2002 hardback | series=Encyclopedia of Mathematics and Its Applications | volume=90| publisher=Cambridge University Press | year=2011 | isbn=978-0-521-18071-9 | zbl=1221.68183 }}
{{cite book | last=Pytheas Fogg | first=N. | editor1=Berthé, Valérie|editor1-link=Valérie Berthé|editor2=Ferenczi, Sébastien|editor3=Mauduit, Christian|editor4=Siegel, Anne | title=Substitutions in dynamics, arithmetics and combinatorics | series=Lecture Notes in Mathematics | volume=1794 | location=Berlin | publisher=Springer-Verlag | year=2002 | isbn=978-3-540-44141-0 | zbl=1014.11015 }}

Category:Formal languages

Category:Combinatorics on words