Pumping lemma for context-free languages#Usage of the lemma

{{Short description|Type of pumping lemma}}

In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma,{{Cite book |last=Kreowski |first=Hans-Jörg |date=1979 |editor-last=Claus |editor-first=Volker |editor2-last=Ehrig |editor2-first=Hartmut |editor2-link=Hartmut Ehrig |editor3-last=Rozenberg |editor3-first=Grzegorz |contribution=A pumping lemma for context-free graph languages |url=https://link.springer.com/chapter/10.1007%2FBFb0025726 |title=Graph-Grammars and Their Application to Computer Science and Biology|series=Lecture Notes in Computer Science |volume=73 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=270–283 |doi=10.1007/BFb0025726 |isbn=978-3-540-35091-0}} is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages.

The pumping lemma can be used to construct a refutation by contradiction that a specific language is not context-free. Conversely, the pumping lemma does not suffice to guarantee that a language is context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.

Formal statement

File:Pumping lemma for context-free languages.svg w.r.t. a Chomsky normal form grammar must contain some nonterminal N twice on some tree path (upper picture). Repeating n times the derivation part N ⇒...⇒ vNx obtains a derivation for uv^n wx^n y (lower left and right picture for n=0 and 2, respectively).]]

If a language L is context-free, then there exists some integer p \geq 1 (called a "pumping length"){{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | location=Providence, RI | publisher=American Mathematical Society | year=2009 | isbn=978-0-8218-4480-9 | zbl=1161.68043 | page=90 | url=http://webpages.math.luc.edu/~lauve/papers/wordsbook.pdf }} (Also see [www-igm.univ-mlv.fr/~berstel/ Aaron Berstel's website) such that every string s in L that has a length of p or more symbols (i.e. with |s| \geq p) can be written as

: s = uvwxy

with substrings u, v, w, x and y, such that

: 1. |vx| \geq 1,

: 2. |vwx| \leq p, and

: 3. uv^n wx^n y \in L for all n \geq 0.

Below is a formal expression of the Pumping Lemma.

\begin{array}{l}

(\forall L\subseteq \Sigma^*) \\

\quad (\mbox{context free}(L) \Rightarrow \\

\quad ((\exists p\geq 1) ((\forall s\in L) ((|s|\geq p) \Rightarrow \\

\quad ((\exists u,v,w,x,y \in \Sigma^*) (s=uvwxy \land |vx|\geq 1 \land |vwx|\leq p \land (\forall n \geq 0)

(uv^nwx^ny\in L)))))))

\end{array}

Informal statement and explanation

The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least p, where p is a constant—called the pumping length—that varies between context-free languages.

Say s is a string of length at least p that is in the language.

The pumping lemma states that s can be split into five substrings, s = uvwxy, where vx is non-empty and the length of vwx is at most p, such that repeating v and x the same number of times (n) in s produces a string that is still in the language. It is often useful to repeat zero times, which removes v and x from the string. This process of "pumping up" s with additional copies of v and x is what gives the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p equal to the maximum string length in L plus one. As there are no strings of this length the pumping lemma is not violated.

Usage of the lemma

The pumping lemma is often used to prove that a given language {{mvar|L}} is non-context-free, by showing that arbitrarily long strings {{mvar|s}} are in {{mvar|L}} that cannot be "pumped" without producing strings outside {{mvar|L}}.

For example, if S\subset \N is infinite but does not contain an (infinite) arithmetic progression, then L = \{a^{n} : n\in S\} is not context-free. In particular, neither the prime numbers nor the square numbers are context-free.

For example, the language L = \{ a^n b^n c^n | n > 0 \} can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that {{mvar|L}} is context free. By the pumping lemma, there exists an integer {{mvar|p}} which is the pumping length of language {{mvar|L}}. Consider the string s = a^p b^p c^p in {{mvar|L}}. The pumping lemma tells us that {{mvar|s}} can be written in the form s = uvwxy, where {{mvar|u, v, w, x}}, and {{mvar|y}} are substrings, such that |vx| \geq 1, |vwx| \leq p, and uv^i wx^i y \in L for every integer i \geq 0. By the choice of {{mvar|s}} and the fact that |vwx| \leq p, it is easily seen that the substring {{mvar|vwx}} can contain no more than two distinct symbols. That is, we have one of five possibilities for {{mvar|vwx}}:

  1. vwx = a^j for some j \leq p.
  2. vwx = a^j b^k for some {{mvar|j}} and {{mvar|k}} with j+k \leq p
  3. vwx = b^j for some j \leq p.
  4. vwx = b^j c^k for some {{mvar|j}} and {{mvar|k}} with j+k \leq p.
  5. vwx = c^j for some j \leq p.

For each case, it is easily verified that uv^i wx^i y does not contain equal numbers of each letter for any i \neq 1. Thus, uv^2 wx^2 y does not have the form a^i b^i c^i. This contradicts the definition of {{mvar|L}}. Therefore, our initial assumption that {{mvar|L}} is context free must be false.

In 1960, Scheinberg proved that L = \{ a^n b^n a^n | n > 0 \} is not context-free using a precursor of the pumping lemma.{{cite journal |author=Stephen Scheinberg |year=1960 |title=Note on the Boolean Properties of Context Free Languages |url=https://core.ac.uk/download/pdf/82210847.pdf |journal=Information and Control |volume=3 |issue=4 |pages=372–375 |doi=10.1016/s0019-9958(60)90965-7 |doi-access=free}} Here: Lemma 3, and its use on p.374-375.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example

: L = \{ b^j c^k d^l | j, k, l \in \mathbb{N} \} \cup \{ a^i b^j c^j d^j | i, j \in \mathbb{N}, i \ge 1\}

for {{math|1=s=bjckdl}} with e.g. j≥1 choose {{mvar|vwx}} to consist only of b{{'}}s, for {{math|1=s=aibjcjdj}} choose {{mvar|vwx}} to consist only of a{{'}}s; in both cases all pumped strings are still in L.{{cite book| author=John E. Hopcroft, Jeffrey D. Ullman| title=Introduction to Automata Theory, Languages, and Computation| year=1979| publisher=Addison-Wesley| isbn=0-201-02988-X| url=https://archive.org/details/introductiontoau00hopc}} Here: sect.6.1, p.129

References

{{Reflist}}

  • {{cite journal| last=Bar-Hillel|first=Y.|author-link=Yehoshua Bar-Hillel |author2=Micha Perles |author2-link=Micha Perles |author3=Eli Shamir |author3-link=Eli Shamir |title=On formal properties of simple phrase-structure grammars|journal=Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikationsforschung|volume=14|issue=2|pages=143–172|year=1961}} — Reprinted in: {{cite book|author=Y. Bar-Hillel |year=1964 |title=Language and Information: Selected Essays on their Theory and Application |publisher=Addison-Wesley |series=Addison-Wesley series in logic |oclc=783543642 |isbn=0201003732 |pages=116–150}}
  • {{cite book | author = Michael Sipser | author-link = Michael Sipser | year = 1997 | title = Introduction to the Theory of Computation | publisher = PWS Publishing | isbn = 0-534-94728-X | url = https://archive.org/details/introductiontoth00sips }} Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.

{{Formal languages and grammars |state=collapsed}}

{{DEFAULTSORT:Pumping Lemma For Context-Free Languages}}

Category:Formal languages

Category:Lemmas