unavoidable pattern

In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet.

Definitions

= Pattern =

Like a word, a pattern (also called term) is a sequence of symbols over some alphabet.

The minimum multiplicity of the pattern p is m(p)=\min(\mathrm{count_p}(x):x \in p) where \mathrm{count_p}(x) is the number of occurrence of symbol x in pattern p. In other words, it is the number of occurrences in p of the least frequently occurring symbol in p.

= Instance =

Given finite alphabets \Sigma and \Delta, a word x\in\Sigma^* is an instance of the pattern p\in\Delta^* if there exists a non-erasing semigroup morphism f:\Delta^*\rightarrow\Sigma^* such that f(p)=x, where \Sigma^* denotes the Kleene star of \Sigma. Non-erasing means that f(a)\neq\varepsilon for all a\in\Delta, where \varepsilon denotes the empty string.

= Avoidance / Matching =

A word w is said to match, or encounter, a pattern p if a factor (also called subword or substring) of w is an instance of p. Otherwise, w is said to avoid p, or to be p-free. This definition can be generalized to the case of an infinite w, based on a generalized definition of "substring".

= Avoidability / Unavoidability on a specific alphabet =

A pattern p is unavoidable on a finite alphabet \Sigma if each sufficiently long word x\in\Sigma^* must match p; formally: if \exists n\in \Nu. \ \forall x\in\Sigma^*. \ (|x|\geq n \implies x \text{ matches } p). Otherwise, p is avoidable on \Sigma, which implies there exist infinitely many words over the alphabet \Sigma that avoid p.

By Kőnig's lemma, pattern p is avoidable on \Sigma if and only if there exists an infinite word w\in \Sigma^\omega that avoids p.

= Maximal ''p''-free word =

Given a pattern p and an alphabet \Sigma. A p-free word w\in\Sigma^* is a maximal p-free word over \Sigma if aw and wa match p \forall a\in\Sigma.

= Avoidable / Unavoidable pattern =

A pattern p is an unavoidable pattern (also called blocking term) if p is unavoidable on any finite alphabet.

If a pattern is unavoidable and not limited to a specific alphabet, then it is unavoidable for any finite alphabet by default. Conversely, if a pattern is said to be avoidable and not limited to a specific alphabet, then it is avoidable on some finite alphabet by default.

= ''k''-avoidable / ''k''-unavoidable =

A pattern p is k-avoidable if p is avoidable on an alphabet \Sigma of size k. Otherwise, p is k-unavoidable, which means p is unavoidable on every alphabet of size k.{{Cite book|url=https://books.google.com/books?id=3w9eR3u8GN4C&q=Combinatorics+on+words.+Christoffel+words+and+repetitions+in+words+2009|title=Combinatorics on Words: Christoffel Words and Repetitions in Words|publisher=American Mathematical Soc.|isbn=978-0-8218-7325-0|pages=127|language=en}}

If pattern p is k-avoidable, then p is g-avoidable for all g\geq k.

Given a finite set of avoidable patterns S=\{p_1,p_2,...,p_i \}, there exists an infinite word w\in\Sigma^\omega such that w avoids all patterns of S. Let \mu(S) denote the size of the minimal alphabet \Sigma'such that \exist w' \in {\Sigma'}^\omega avoiding all patterns of S.

= Avoidability index =

The avoidability index of a pattern p is the smallest k such that p is k-avoidable, and \infin if p is unavoidable.

Properties

  • A pattern q is avoidable if q is an instance of an avoidable pattern p.{{Cite journal|last=Schmidt|first=Ursula|date=1987-08-01|title=Long unavoidable patterns|journal=Acta Informatica|language=en|volume=24|issue=4|pages=433–445|doi=10.1007/BF00292112|s2cid=7928450|issn=1432-0525}}
  • Let avoidable pattern p be a factor of pattern q, then q is also avoidable.
  • A pattern q is unavoidable if and only if q is a factor of some unavoidable pattern p.
  • Given an unavoidable pattern p and a symbol a not in p, then pap is unavoidable.
  • Given an unavoidable pattern p, then the reversal p^R is unavoidable.
  • Given an unavoidable pattern p, there exists a symbol a such that a occurs exactly once in p.
  • Let n\in\Nu represent the number of distinct symbols of pattern p. If |p|\geq2^n, then p is avoidable.

Zimin words

{{main|Sesquipower}}

Given alphabet \Delta=\{x_1,x_2,...\}, Zimin words (patterns) are defined recursively Z_{n+1}=Z_nx_{n+1}Z_n for n\in\Zeta^+ and Z_1=x_1.

= Unavoidability =

All Zimin words are unavoidable.

A word w is unavoidable if and only if it is a factor of a Zimin word.{{Cite journal|last=Zimin|first=A. I.|title=Blocking Sets of Terms|date=1984|journal=Mathematics of the USSR-Sbornik|language=en|volume=47|issue=2|pages=353–364|doi=10.1070/SM1984v047n02ABEH002647|bibcode=1984SbMat..47..353Z|issn=0025-5734}}

Given a finite alphabet \Sigma, let f(n,|\Sigma|) represent the smallest m\in\Zeta^+ such that w matches Z_n for all w\in \Sigma^m. We have following properties:{{cite arXiv | eprint=1409.3080 | last1=Cooper | first1=Joshua | last2=Rorabaugh | first2=Danny | title=Bounds on Zimin Word Avoidance | date=2014 | class=math.CO }}

  • f(1,q)=1
  • f(2,q)=2q+1
  • f(3,2)=29
  • f(n,q)\leq{^{n-1}q}=\underbrace{q^{q^{\cdot^{\cdot^{q}}}}}_{n-1}

Z_n is the longest unavoidable pattern constructed by alphabet \Delta=\{x_1,x_2,...,x_n\} since |Z_n|=2^n -1.

Pattern reduction

= Free letter =

Given a pattern p over some alphabet \Delta, we say x\in\Delta is free for p if there exist subsets A,B of \Delta such that the following hold:

  1. uv is a factor of p and u\in Auv is a factor of p and v\in B
  2. x\in A\backslash B\cup B\backslash A

For example, let p=abcbab, then b is free for p since there exist A=ac,B=b satisfying the conditions above.

= Reduce =

A pattern p\in \Delta^* reduces to pattern q if there exists a symbol x\in \Delta such that x is free for p, and q can be obtained by removing all occurrence of x from p. Denote this relation by p\stackrel{x}{\rightarrow}q.

For example, let p=abcbab, then p can reduce to q=aca since b is free for p.

= Locked =

A word w is said to be locked if w has no free letter; hence w can not be reduced.{{Cite journal|last1=Baker|first1=Kirby A.|last2=McNulty|first2=George F.|last3=Taylor|first3=Walter|date=1989-12-18|title=Growth problems for avoidable words|journal=Theoretical Computer Science|language=en|volume=69|issue=3|pages=319–345|doi=10.1016/0304-3975(89)90071-6|issn=0304-3975|doi-access=free}}

= Transitivity =

Given patterns p,q,r, if p reduces to q and q reduces to r, then p reduces to r. Denote this relation by p\stackrel{*}{\rightarrow}r.

= Unavoidability =

A pattern p is unavoidable if and only if p reduces to a word of length one; hence \exist w such that |w|=1 and p\stackrel{*}{\rightarrow}w.{{Cite journal|last1=Bean|first1=Dwight R.|last2=Ehrenfeucht|first2=Andrzej|last3=McNulty|first3=George F.|date=1979|title=Avoidable patterns in strings of symbols.|url=https://projecteuclid.org/euclid.pjm/1102783913|journal=Pacific Journal of Mathematics|language=en|volume=85|issue=2|pages=261–294|doi=10.2140/pjm.1979.85.261|issn=0030-8730|doi-access=free}}

Graph pattern avoidance<ref name=":4">{{Cite journal|last=Grytczuk|first=Jarosław|date=2007-05-28|title=Pattern avoidance on graphs|journal=Discrete Mathematics|series=The Fourth Caracow Conference on Graph Theory|language=en|volume=307|issue=11|pages=1341–1346|doi=10.1016/j.disc.2005.11.071|issn=0012-365X|doi-access=free}}</ref>

= Avoidance / Matching on a specific graph =

Given a simple graph G=(V,E), a edge coloring c:E\rightarrow \Delta matches pattern p if there exists a simple path P=[e_1,e_2,...,e_r] in G such that the sequence c(P)=[c(e_1),c(e_2),...,c(e_r)] matches p. Otherwise, c is said to avoid p or be p-free.

Similarly, a vertex coloring c:V\rightarrow \Delta matches pattern p if there exists a simple path P=[c_1,c_2,...,c_r] in G such that the sequence c(P) matches p.

= Pattern chromatic number =

The pattern chromatic number \pi_p(G) is the minimal number of distinct colors needed for a p-free vertex coloring c over the graph G.

Let \pi_p(n)=\max\{\pi_p(G):G\in G_n\} where G_n is the set of all simple graphs with a maximum degree no more than n.

Similarly, \pi_p'(G) and \pi_p'(n) are defined for edge colorings.

= Avoidability / Unavoidability on graphs =

A pattern p is avoidable on graphs if \pi_p(n) is bounded by c_p, where c_p only depends on p.

  • Avoidance on words can be expressed as a specific case of avoidance on graphs; hence a pattern p is avoidable on any finite alphabet if and only if \pi_p(P_n) \leq c_p for all n \in\Zeta ^+ , where P_n is a graph of n vertices concatenated.

= Probabilistic bound on ''{{pi}}{{sub|p}}(n)'' =

There exists an absolute constant c, such that \pi_p(n)\leq cn^{\frac{m(p)}{m(p)-1}}\leq cn^2 for all patterns p with m(p)\geq2.

Given a pattern p, let n represent the number of distinct symbols of p. If |p|\geq2^n, then p is avoidable on graphs.

= Explicit colorings =

Given a pattern p such that count_p(x) is even for all x\in p, then \pi_p'(K_2^k)\leq2^k-1 for all k\geq 1, where K_n is the complete graph of n vertices.

Given a pattern p such that m(p)\geq2, and an arbitrary tree T, let S be the set of all avoidable subpatterns and their reflections of p. Then \pi_p(T)\leq 3\mu(S).

Given a pattern p such that m(p)\geq2, and a tree T with degree n\geq2. Let S be the set of all avoidable subpatterns and their reflections of p, then \pi_p'(T)\leq 2(n-1)\mu(S).

Examples

  • The Thue–Morse sequence is cube-free and overlap-free; hence it avoids the patterns xxx and xyxyx.
  • A square-free word is one avoiding the pattern xx. The word over the alphabet \{0,\pm1\} obtained by taking the first difference of the Thue–Morse sequence is an example of an infinite square-free word.{{Cite book|url=https://books.google.com/books?id=3w9eR3u8GN4C&q=Combinatorics+on+words.+Christoffel+words+and+repetitions+in+words+2009|title=Combinatorics on Words: Christoffel Words and Repetitions in Words|publisher=American Mathematical Soc.|isbn=978-0-8218-7325-0|pages=97|language=en}}{{Cite book|last=Fogg|first=N. Pytheas|url=https://books.google.com/books?id=qBIsuwEACAAJ&q=Substitutions+in+dynamics,+arithmetics+and+combinatorics.|title=Substitutions in Dynamics, Arithmetics and Combinatorics|date=2002-09-23|publisher=Springer Science & Business Media|isbn=978-3-540-44141-0|pages=104|language=en}}
  • The patterns x and xyx are unavoidable on any alphabet, since they are factors of the Zimin words.{{Cite book|last1=Allouche|first1=Jean-Paul|url=https://books.google.com/books?id=2ZsSUStt96sC&dq=Automatic+Sequences%3A+Theory%2C+Applications%2C+Generalizations&pg=PR13|title=Automatic Sequences: Theory, Applications, Generalizations|last2=Shallit|first2=Jeffrey|last3=Shallit|first3=Professor Jeffrey|date=2003-07-21|publisher=Cambridge University Press|isbn=978-0-521-82332-6|pages=24|language=en}}
  • The power patterns x^n for n\geq 3 are 2-avoidable.
  • All binary patterns can be divided into three categories:{{cite book|last1=Lothaire|first1=M.|url=https://archive.org/details/algebraiccombina0000loth|url-access=registration|title=Algebraic Combinatorics on Words|publisher=Cambridge University Press|year=2002|isbn=9780521812207}}
  • \varepsilon,x,xyx are unavoidable.
  • xx,xxy,xyy,xxyx,xxyy,xyxx,xyxy,xyyx,xxyxx,xxyxy,xyxyy have avoidability index of 3.
  • others have avoidability index of 2.
  • abwbaxbcyaczca has avoidability index of 4, as well as other locked words.
  • abvacwbaxbcycdazdcd has avoidability index of 5.{{Cite journal|last=Clark|first=Ronald J.|date=2006-04-01|title=The existence of a pattern which is 5-avoidable but 4-unavoidable|journal=International Journal of Algebra and Computation|volume=16|issue=2|pages=351–367|doi=10.1142/S0218196706002950|issn=0218-1967}}
  • The repetitive threshold RT(n) is the infimum of exponents k such that x^k is avoidable on an alphabet of size n. Also see Dejean's theorem.

Open problems

  • Is there an avoidable pattern p such that the avoidability index of p is 6?
  • Given an arbitrarily pattern p, is there an algorithm to determine the avoidability index of p?

References

{{reflist}}

  • {{cite book | last1 = Allouche | first1 = Jean-Paul | last2 = Shallit | first2 = Jeffrey | author2-link = Jeffrey Shallit | isbn = 978-0-521-82332-6 | publisher = Cambridge University Press | title = Automatic Sequences: Theory, Applications, Generalizations | year = 2003 | zbl=1086.11015 }}
  • {{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | location=Providence, RI | publisher=American Mathematical Society | year=2009 | isbn=978-0-8218-4480-9 | zbl=1161.68043 }}
  • {{cite book | last=Lothaire | first=M. | author-link=M. Lothaire | title=Algebraic combinatorics on words | others=With preface by Jean Berstel and Dominique Perrin | edition=Reprint of the 2002 hardback | series=Encyclopedia of Mathematics and Its Applications | volume=90| publisher=Cambridge University Press | year=2011 | isbn=978-0-521-18071-9 | zbl=1221.68183 }}
  • {{cite book | last=Pytheas Fogg | first=N. | editor1-first=Valérie | editor1-last = Berthé |editor1-link = Valérie Berthé| editor2-last = Ferenczi | editor2-first= Sébastien | editor3-last= Mauduit | editor3-first= Christian | editor4-last= Siegel | editor4-first= A. | title=Substitutions in dynamics, arithmetics and combinatorics | series=Lecture Notes in Mathematics | volume=1794 | location=Berlin | publisher=Springer-Verlag | year=2002 | isbn=3-540-44141-7 | zbl=1014.11015 }}

Category:Semigroup theory

Category:Formal languages

Category:Combinatorics on words