square-free word

{{CS1 config|mode=cs1}}

In combinatorics, a square-free word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form {{mvar|XX}}, where {{mvar|X}} is not empty. Thus, a square-free word can also be defined as a word that avoids the pattern {{mvar|XX}}.

Finite square-free words

= Binary alphabet =

Over a binary alphabet \{0,1\}, the only square-free words are the empty word \epsilon,0,1,01,10,010, and 101.

= Ternary alphabet =

Over a ternary alphabet \{0,1,2\}, there are infinitely many square-free words. It is possible to count the number c(n) of ternary square-free words of length {{mvar|n}}.

class="wikitable"

|+The number of ternary square-free words of length n{{Cite web|url=https://oeis.org/A006156|title=A006156 - OEIS|website=oeis.org|access-date=2019-03-28}}

!{{mvar|n}}

|0

|1

|2

|3

|4

|5

|6

|7

|8

|9

|10

|11

|12

{{tmath|c(n)}}

|1

|3

|6

|12

|18

|30

|42

|60

|78

|108

|144

|204

|264

This number is bounded by c(n) = \Theta(\alpha^n) , where 1.3017597 < \alpha < 1.3017619 .{{Cite journal|last=Shur|first=Arseny|date=2011|title=Growth properties of power-free languages|journal=Computer Science Review|volume=6|issue=5–6|pages=28–43|doi=10.1016/j.cosrev.2012.09.001}} The upper bound on \alpha can be found via Fekete's Lemma and approximation by automata. The lower bound can be found by finding a substitution that preserves square-freeness.

= Alphabet with more than three letters =

Since there are infinitely many square-free words over three-letter alphabets, this implies there are also infinitely many square-free words over an alphabet with more than three letters.

The following table shows the exact growth rate of the {{mvar|k}}-ary square-free words, rounded off to 7 digits after the decimal point, for {{mvar|k}} in the range from 4 to 15:

class="wikitable"

|+Growth rate of the {{mvar|k}}-ary square-free words

!alphabet size ({{mvar|k}})

|4

|5

|6

|7

|8

|9

growth rate

|2.6215080

|3.7325386

|4.7914069

|5.8284661

|6.8541173

|7.8729902

alphabet size ({{mvar|k}})

|10

|11

|12

|13

|14

|15

growth rate

|8.8874856

|9.8989813

|10.9083279

|11.9160804

|12.9226167

|13.9282035

== 2-dimensional words ==

Consider a map \textbf{w} from \mathbb{N}^2 to {{mvar|A}}, where {{mvar|A}} is an alphabet and \textbf{w} is called a 2-dimensional word. Let w_{m,n} be the entry \textbf{w}(m,n). A word \textbf{x} is a line of \textbf{w} if there exists i_1,i_2,j_1, j_2such that \text{gcd}(j_1, j_2) = 1, and for t \ge 0, x_t = w_{{i_1}+{j_1t},{i_2}+{j_2t}}.{{Citation|chapter=Preface|pages=xi–xviii|editor-last=Berthe|editor-first=Valerie|publisher=Cambridge University Press|isbn=9781139924733|editor2-last=Rigo|editor2-first=Michel|doi=10.1017/cbo9781139924733.001|title=Combinatorics, Words and Symbolic Dynamics|year=2016}}

Carpi{{Cite journal|last=Carpi|first=Arturo|date=1988|title=Multidimensional unrepetitive configurations|journal=Theoretical Computer Science|volume=56|issue=2|pages=233–241|doi=10.1016/0304-3975(88)90080-1|issn=0304-3975|doi-access=free}} proves that there exists a 2-dimensional word \textbf{w} over a 16-letter alphabet such that every line of \textbf{w} is square-free. A computer search shows that there are no 2-dimensional words \textbf{w}over a 7-letter alphabet, such that every line of \textbf{w} is square-free.

= Generating finite square-free words =

Shur{{Cite journal|last=Shur|first=Arseny|date=2015|title=Generating square-free words efficiently|journal=Theoretical Computer Science|volume=601|pages=67–72|doi=10.1016/j.tcs.2015.07.027|doi-access=free|hdl=10995/92700|hdl-access=free}} proposes an algorithm called R2F (random-t(w)o-free) that can generate a square-free word of length {{mvar|n}} over any alphabet with three or more letters. This algorithm is based on a modification of entropy compression: it randomly selects letters from a k-letter alphabet to generate a {{tmath|(k+1)}}-ary square-free word.

algorithm R2F is

input: alphabet size k \ge 2,

word length n > 1

output: a {{tmath|(k+1)}}-ary square-free word {{mvar|w}} of length {{mvar|n}}.

{{gray|(Note that \color{gray}\Sigma_{k+1} is the alphabet with letters \color{gray}\{1,...,k+1\}.)

(For a word \color{gray}w \in \Sigma_k, \color{gray}\chi_w is the permutation of \color{gray}\Sigma_k such that {{mvar|a}} precedes {{mvar|b}} in \color{gray}\chi_w if the

right most position of {{mvar|a}} in {{mvar|w}} is to the right of the rightmost position of {{mvar|b}} in {{mvar|w}}.

For example, \color{gray}w=136263163\in \Sigma_6 has \color{gray}\chi_w=361245.)}}

choose w[1] in \Sigma_{k+1} uniformly at random

set \chi_w to w[1] followed by all other letters of \Sigma_{k+1} in increasing order

set the number {{mvar|N}} of iterations to 0

while |w| < n do

choose {{mvar|j}} in \Sigma_{k} uniformly at random

append a = \chi_w[j+1] to the end of {{mvar|w}}

update \chi_w shifting the first {{mvar|j}} elements to the right and setting \chi_w[1] = a

increment {{mvar|N}} by {{mvar|1}}

if {{mvar|w}} ends with a square of rank {{mvar|r}} then

delete the last {{mvar|r}} letters of {{mvar|w}}

return {{mvar|w}}

Every (k+1)-ary square-free word can be the output of Algorithm R2F, because on each iteration it can append any letter except for the last letter of {{mvar|w}}.

The expected number of random k-ary letters used by Algorithm R2F to construct a {{tmath|(k+1)}}-ary square-free word of length {{mvar|n}} isN=n\left(1 + \frac 2 {k^2} + \frac 1 {k^3} + \frac 4 {k^4} + O\left(\frac 1 {k^5}\right)\right)+O(1).Note that there exists an algorithm that can verify the square-freeness of a word of length {{mvar|n}} in O(n \log n) time. Apostolico and Preparata{{Cite journal|last1=Apostolico|first1=A.|last2=Preparata|first2=F.P.|date=Feb 1983|title=Optimal off-line detection of repetitions in a string|journal=Theoretical Computer Science|volume=22|issue=3|pages=297–315|doi=10.1016/0304-3975(83)90109-3|issn=0304-3975|doi-access=free}} give an algorithm using suffix trees. Crochemore{{Cite journal|last=Crochemore|first=Max|date=Oct 1981|title=An optimal algorithm for computing the repetitions in a word|journal=Information Processing Letters|volume=12|issue=5|pages=244–250|doi=10.1016/0020-0190(81)90024-7|issn=0020-0190}} uses partitioning in his algorithm. Main and Lorentz{{Cite journal|last1=Main|first1=Michael G|last2=Lorentz|first2=Richard J|date=Sep 1984|title=An O(n log n) algorithm for finding all repetitions in a string|journal=Journal of Algorithms|volume=5|issue=3|pages=422–432|doi=10.1016/0196-6774(84)90021-x|issn=0196-6774}} provide an algorithm based on the divide-and-conquer method. A naive implementation may require O(n^2) time to verify the square-freeness of a word of length {{mvar|n}}.

Infinite square-free words

There exist infinitely long square-free words in any alphabet with three or more letters, as proved by Axel Thue.{{Cite book|title=Axel Thue's papers on repetitions in words a translation|last=Berstel|first=Jean|publisher=Départements de mathématiques et d'informatique, Université du Québec à Montréal|year=1994|isbn=978-2892761405|oclc=494791187}}

=Examples=

== First difference of the [[Thue–Morse sequence]] ==

One example of an infinite square-free word over an alphabet of size 3 is the word over the alphabet \{-1,0,+1\} obtained by taking the first difference of the Thue–Morse sequence. That is, from the Thue–Morse sequence

: 0, 1 ,1, 0 ,1 ,0 ,0 ,1, 1 ,0 ,0 ,1, 0 ,1 ,1, 0 ...

one forms a new sequence in which each term is the difference of two consecutive terms of the Thue–Morse sequence. The resulting square-free word is

: 1,0,-1,1,-1,0,1,0,-1,0,1,-1,1,0,-1,...{{OEIS|id=A029883}}.

== [[John Leech (mathematician)|Leech]]'s [[morphism]] ==

Another example found by John Leech{{cite journal|last=Leech|first=J.|author-link=John Leech (mathematician)|year=1957|title=A problem on strings of beads|journal=Math. Gaz.|volume=41|pages=277–278|zbl=0079.01101|doi=10.1017/S0025557200236115|s2cid=126406225 }} is defined recursively over the alphabet \{0,1,2\}. Let {{tmath|w_1}} be any square-free word starting with the letter {{val|0}}. Define the words \{w_i \mid i \in \mathbb{N} \} recursively as follows: the word w_{i+1} is obtained from {{tmath|w_i}} by replacing each {{val|0}} in {{tmath|w_i}} with {{val|fmt=none|0121021201210}}, each {{val|1}} with {{val|fmt=none|1202102012021}}, and each {{val|2}} with {{val|fmt=none|2010210120102}}. It is possible to prove that the sequence converges to the infinite square-free word

:{{math|0121021201210120210201202120102101201021202102012021...}}

= Generating infinite square-free words =

Infinite square-free words can be generated by square-free morphism. A morphism is called square-free if the image of every square-free word is square-free. A morphism is called k–square-free if the image of every square-free word of length k is square-free.

Crochemore{{Cite book|last=Berstel|first=Jean|date=April 1984|chapter=Some Recent Results on Squarefree Words|title=Annual Symposium on Theoretical Aspects of Computer Science|volume=166|pages=14–25|doi=10.1007/3-540-12920-0_2|series=Lecture Notes in Computer Science|isbn=978-3-540-12920-2|chapter-url=http://www.numdam.org/item/PDML_1985___2B_21_0/}} proves that a uniform morphism {{mvar|h}} is square-free if and only if it is 3-square-free. In other words, {{mvar|h}} is square-free if and only if h(w) is square-free for all square-free {{mvar|w}} of length 3. It is possible to find a square-free morphism by brute-force search.

algorithm square-free_morphism is

output: a square-free morphism with the lowest possible rank {{mvar|k}}.

set k = 3

while True do

set k_sf_words to the list of all square-free words of length {{mvar|k}} over a ternary alphabet

for each h(0) in k_sf_words do

for each h(1) in k_sf_words do

for each h(2) in k_sf_words do

if h(1) = h(2) then

break from the current loop (advance to next h(1))

if h(0) \ne h(1) and h(2) \ne h(0) then

if h(w) is square-free for all square-free {{mvar|w}} of length {{val|3}} then

return h(0), h(1), h(2)

increment {{mvar|k}} by {{val|1}}

Over a ternary alphabet, there are exactly 144 uniform square-free morphisms of rank 11 and no uniform square-free morphisms with a lower rank than 11.

To obtain an infinite square-free words, start with any square-free word such as {{val|0}}, and successively apply a square-free morphism {{mvar|h}} to it. The resulting words preserve the property of square-freeness. For example, let {{mvar|h}} be a square-free morphism, then as w \to \infty, h^{w}(0) is an infinite square-free word.

Note that, if a morphism over a ternary alphabet is not uniform, then this morphism is square-free if and only if it is 5-square-free.

Letter combinations in square-free words

= Avoid two-letter combinations =

Over a ternary alphabet, a square-free word of length more than 13 contains all the square-free two-letter combinations.{{Cite arXiv |eprint=1505.00019|last1=Zolotov|first1=Boris|title=Another Solution to the Thue Problem of Non-Repeating Words|class=math.CO|year=2015}}

This can be proved by constructing a square-free word without the two-letter combination {{mvar|ab}}. As a result, {{mvar|bcba}}{{mvar|cbca}}{{mvar|cbaca}} is the longest square-free word without the combination {{mvar|ab}} and its length is equal to 13.

Note that over a more than three-letter alphabet there are square-free words of any length without an arbitrary two-letter combination.

= Avoid three-letter combinations =

Over a ternary alphabet, a square-free word of length more than 36 contains all the square-free three-letter combinations.

Note that over a more than three-letter alphabet there are square-free words of any length without an arbitrary three-letter combination.

= Density of a letter =

The density of a letter {{mvar|a}} in a finite word {{mvar|w}} is defined as \frac

w|_a}{|w
where |w|_a

is the number of occurrences of {{mvar|a}} in w

and |w|

is the length of the word. The density of a letter {{mvar|a}} in an infinite word is \liminf_{l\to \infty}\frac

w_l|_a}{|w_l
where w_l

is the prefix of the word {{mvar|w}} of length {{mvar|l}}.{{Cite journal|last=Khalyavin|first=Andrey|date=2007|title=The minimal density of a letter in an infinite ternary square-free word is 883/3215|url=http://emis.de/journals/JIS/VOL10/Khalyavin/khalyavin13.pdf|journal=Journal of Integer Sequences|volume=10|issue=2|pages=3|bibcode=2007JIntS..10...65K}}

The minimal density of a letter {{mvar|a}} in an infinite ternary square-free word is equal to \frac{883}{3215} \approx 0.2747.

The maximum density of a letter {{mvar|a}} in an infinite ternary square-free word is equal to \frac{255}{653} \approx 0.3905.{{Cite journal|last=Ochem|first=Pascal|date=2007|title=Letter frequency in infinite repetition-free words|journal=Theoretical Computer Science|volume=380|issue=3|pages=388–392|doi=10.1016/j.tcs.2007.03.027|issn=0304-3975|doi-access=}}

Notes

References

{{Reflist}}

  • {{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | author1-link = Jean Berstel | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | location=Providence, RI | publisher=American Mathematical Society | year=2009 | isbn=978-0-8218-4480-9 | url=http://www.ams.org/bookpages/crmm-27 | zbl=1161.68043 }}
  • {{cite book | last=Lothaire | first=M. | author-link=M. Lothaire | title=Combinatorics on words | publisher=Cambridge University Press | location=Cambridge | year= 1997 | isbn= 978-0-521-59924-5 }}.
  • {{cite book | last=Lothaire | first=M. | author-link=M. Lothaire | title=Algebraic combinatorics on words | others=With preface by Jean Berstel and Dominique Perrin | edition=Reprint of the 2002 hardback | series=Encyclopedia of Mathematics and Its Applications | volume=90| publisher=Cambridge University Press | year=2011 | isbn=978-0-521-18071-9 | zbl=1221.68183 }}
  • {{cite book | last=Pytheas Fogg | first=N. | editor1=Berthé, Valérie|editor1-link=Valérie Berthé|editor2=Ferenczi, Sébastien|editor3=Mauduit, Christian|editor4=Siegel, Anne | title=Substitutions in dynamics, arithmetics and combinatorics | series=Lecture Notes in Mathematics | volume=1794 | location=Berlin | publisher=Springer-Verlag | year=2002 | isbn=978-3-540-44141-0 | zbl=1014.11015 }}

Category:Formal languages

Category:Combinatorics on words