Ogden's lemma#Undecidability

In the theory of formal languages, Ogden's lemma (named after William F. Ogden){{Cite journal |last=Ogden |first=William |date=September 1968 |title=A helpful result for proving inherent ambiguity |url=http://dx.doi.org/10.1007/bf01694004 |journal=Mathematical Systems Theory |volume=2 |issue=3 |pages=191–194 |doi=10.1007/bf01694004 |s2cid=13197551 |issn=0025-5661|url-access=subscription }} is a generalization of the pumping lemma for context-free languages.

Despite Ogden's lemma being a strengthening of the pumping lemma, it is insufficient to fully characterize the class of context-free languages.{{cite conference |first=Marcus |last=Kracht |title=Too Many Languages Satisfy Ogden's Lemma |conference=Proceedings of the 27th Pennsylvania Linguistics Colloquium |pages=115–121 |date=2004 |location=Philadelphia |url=http://wwwhomes.uni-bielefeld.de/mkracht/html/ogden.pdf |access-date=16 May 2024}} This is in contrast to the Myhill–Nerode theorem, which unlike the pumping lemma for regular languages is a necessary and sufficient condition for regularity.

Statement

{{Math theorem

| name = Ogden's lemma

| note =

| math_statement = If a language L is generated by a context-free grammar, then there exists some p\geq 1 such that \forall w \in L with length \geq p, and any way of marking \geq p positions of w as "marked", there exists a nonterminal A and a way to split w into 5 segments uxzyv, such that

S \Rightarrow^* uAv \Rightarrow^* uxAyv \Rightarrow^* uxzyv

z contains at least one marked position.

xzy contains at most p marked positions.

u, x both contain marked positions, or y, v both contain marked positions.

}}We will use underlines to indicate "marked" positions.

= Special cases =

Ogden's lemma is often stated in the following form, which can be obtained by "forgetting about" the grammar, and concentrating on the language itself:

If a language {{mvar|L}} is context-free, then there exists some number p\geq 1 (where {{mvar|p}} may or may not be a pumping length) such that for any string {{mvar|s}} of length at least {{mvar|p}} in {{mvar|L}} and every way of "marking" {{mvar|p}} or more of the positions in {{mvar|s}}, {{mvar|s}} can be written as

:s = uvwxy

with strings {{mvar|u, v, w, x,}} and {{mvar|y}}, such that

  1. {{mvar|vx}} has at least one marked position,
  2. {{mvar|vwx}} has at most {{mvar|p}} marked positions, and
  3. uv^n wx^n y \in L for all n \geq 0.

In the special case where every position is marked, Ogden's lemma is equivalent to the pumping lemma for context-free languages. Ogden's lemma can be used to show that certain languages are not context-free in cases where the pumping lemma is not sufficient. An example is the language \{a^i b^j c^k d^l : i = 0 \text{ or } j = k = l\}.

Example applications

= Non-context-freeness =

The special case of Ogden's lemma is often sufficient to prove some languages are not context-free. For example, \{ a^m b^n c^m d^n | m, n \geq 1\} is a standard example of non-context-free language,{{Cite book |last=Hopcroft |first=John E. |url=https://www.worldcat.org/oclc/4549363 |title=Introduction to automata theory, languages, and computation |date=1979 |publisher=Addison-Wesley |others=Jeffrey D. Ullman |isbn=0-201-02988-X |location=Reading, Mass. |oclc=4549363|page=128}}

{{Math proof|title=Proof|proof=

Suppose the language is generated by a context-free grammar, then let p be the length required in Ogden's lemma, then consider the word a^p\underline{b^pc^p}d^p in the language. Then the three conditions implied by Ogden's lemma cannot all be satisfied.

}}

Similarly, one can prove the "copy twice" language L=\{w^2 | w \in \{a, b\}^*\} is not context-free, by using Ogden's lemma on a^{2p}\underline{b^{2p}}a^{2p}b^{2p}.

And the given example last section \{a^i b^j c^k d^l : i = 0 \text{ or } j = k = l\} is not context-free by using Ogden's lemma on ab^{2p} \underline{c^{2p}}d^{2p}.

= Inherent ambiguity =

Ogden's lemma can be used to prove the inherent ambiguity of some languages, which is implied by the title of Ogden's paper.

Example: Let L_0 = \{a^nb^mc^m | m, n \geq 1\}, L_1 = \{a^mb^mc^n | m, n \geq 1\}. The language L = L_0 \cup L_1 is inherently ambiguous. (Example from page 3 of Ogden's paper.)

{{Math proof|title=Proof|proof=

Let p be the pumping length needed for Ogden's lemma, and apply it to the sentence a^{p!+p}\underline{b^pc^p}.

By routine checking of the conditions of Ogden's lemma, we find that the derivation is

S \Rightarrow^* uAv \Rightarrow^* uxAyv \Rightarrow^* uxzyv

where u = a^{p! + p}b^{p - s - k}, x = b^{k}, z = b^{s}c^{s'}, y = c^k, v = c^{p - s' - k}, satisfying s+s' \geq 1 and k \geq 1 and p \geq s+s' + 2k.

Thus, we obtain a derivation of a^{p!+p}b^{p!+p}c^{p!+p} by interpolating the derivation with p!/k copies of A \Rightarrow^* xAy. According to this derivation, an entire sub-sentence x^{p!/k+1}zy^{p!/k+1} = b^{p!+k+s}c^{p!+k+s'} is the descendent of one node A in the derivation tree.

Symmetrically, we can obtain another derivation of a^{p!+p}b^{p!+p}c^{p!+p}, according to which there is an entire sub-sentence a^{p!+k+s}b^{p!+k+s'} being the descendent of one node in the derivation tree.

Since (p!+k+s) + (p!+k+s') > 2p! +1 > p! + p, the two sub-sentences have nonempty intersection, and since neither contains the other, the two derivation trees are different.

}}

Similarly, L^* is inherently ambiguous, and for any CFG of the language, letting p be the constant for Ogden's lemma, we find that (a^{p!+p}b^{p!+p}c^{p!+p})^n has at least 2^n different parses. Thus L^* has an unbounded degree of inherent ambiguity.

= Undecidability =

The proof can be extended to show that deciding whether a CFG is inherently ambiguous is undecidable, by reduction to the Post correspondence problem. It can also show that deciding whether a CFG has an unbounded degree of inherent ambiguity is undecidable. (page 4 of Ogden's paper)

{{Math proof|title=Construction|proof=

Given any Post correspondence problem over binary strings, we reduce it to a decision problem over a CFG.

Given any two lists of binary strings \xi_1, ..., \xi_n and \eta_1, ..., \eta_n, rewrite the binary alphabet to \{d, e\}.

Let L\left(\xi_1, \cdots, \xi_{n}\right) be the language over alphabet \{d,e,f\}, generated by the CFG with rules S \to \xi_iS d^i e| f for every i = 1, ..., n. Similarly define L\left(\eta_1, \cdots, \eta_{n}\right).

Now, by the same argument as above, the language (L_0 \cdot L\left(\xi_1, \cdots, \xi_{n}\right))\cup (L_1 \cdot L\left(\eta_1, \cdots, \eta_{n}\right)) is inherently ambiguous iff the Post correspondence problem has a solution.

And the language ((L_0 \cdot L\left(\xi_1, \cdots, \xi_{n}\right))\cup (L_1 \cdot L\left(\eta_1, \cdots, \eta_{n}\right)))^* has an unbounded degree of inherent ambiguity iff the Post correspondence problem has a solution.

}}

Generalized condition

Bader and Moura have generalized the lemma{{cite journal |last1=Bader |first1=Christopher |last2=Moura |first2=Arnaldo |title=A Generalization of Ogden's Lemma |journal= Journal of the ACM|date=April 1982|volume=29 |issue=2 |pages=404–407 |doi=10.1145/322307.322315 |s2cid=33988796 |doi-access=free }} to allow marking some positions that are not to be included in {{mvar|vx}}. Their dependence of the parameters was later improved by Dömösi and Kudlek.{{Citation |last1=Dömösi |first1=Pál |title=Strong iteration lemmata for regular, linear, context-free, and linear indexed languages |date=1999 |url=http://dx.doi.org/10.1007/3-540-48321-7_18 |work=Fundamentals of Computation Theory |pages=226–233 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-66412-3 |access-date=2023-02-26 |last2=Kudlek |first2=Manfred|doi=10.1007/3-540-48321-7_18 |url-access=subscription }} If we denote the number of such excluded positions by {{mvar|e}}, then the number {{mvar|d}} of marked positions of which we want to include some in {{mvar|vx}} must satisfy d\geq p(e+1), where {{mvar|p}} is some constant that depends only on the language. The statement becomes that every {{mvar|s}} can be written as

:s = uvwxy

with strings {{mvar|u, v, w, x,}} and {{mvar|y}}, such that

  1. {{mvar|vx}} has at least one marked position and no excluded position,
  2. {{mvar|vwx}} has at most p^{(e+1)} marked positions, and
  3. uv^n wx^n y \in L for all n \geq 0.

Moreover, either each of {{mvar|u,v,w}} has a marked position, or each of w,x,y has a marked position.

References