Ogden's lemma#Undecidability

In the theory of formal languages, Ogden's lemma (named after William F. Ogden){{Cite journal |last=Ogden |first=William |date=September 1968 |title=A helpful result for proving inherent ambiguity |url=http://dx.doi.org/10.1007/bf01694004 |journal=Mathematical Systems Theory |volume=2 |issue=3 |pages=191–194 |doi=10.1007/bf01694004 |s2cid=13197551 |issn=0025-5661|url-access=subscription }} is a generalization of the pumping lemma for context-free languages.

Despite Ogden's lemma being a strengthening of the pumping lemma, it is insufficient to fully characterize the class of context-free languages.{{cite conference |first=Marcus |last=Kracht |title=Too Many Languages Satisfy Ogden's Lemma |conference=Proceedings of the 27th Pennsylvania Linguistics Colloquium |pages=115–121 |date=2004 |location=Philadelphia |url=http://wwwhomes.uni-bielefeld.de/mkracht/html/ogden.pdf |access-date=16 May 2024}} This is in contrast to the Myhill–Nerode theorem, which unlike the pumping lemma for regular languages is a necessary and sufficient condition for regularity.

Statement

{{Math theorem

| name = Ogden's lemma

| note =

| math_statement = If a language $L$ is generated by a context-free grammar, then there exists some $p\geq 1$ such that $\forall w \in L$ with length $\geq p$ , and any way of marking $\geq p$ positions of $w$ as "marked", there exists a nonterminal $A$ and a way to split $w$ into 5 segments $uxzyv$ , such that

$S \Rightarrow^* uAv \Rightarrow^* uxAyv \Rightarrow^* uxzyv$

$z$ contains at least one marked position.

$xzy$ contains at most $p$ marked positions.

$u, x$ both contain marked positions, or $y, v$ both contain marked positions.

}}We will use underlines to indicate "marked" positions.

= Special cases =

Ogden's lemma is often stated in the following form, which can be obtained by "forgetting about" the grammar, and concentrating on the language itself:

If a language {{mvar|L}} is context-free, then there exists some number $p\geq 1$ (where {{mvar|p}} may or may not be a pumping length) such that for any string {{mvar|s}} of length at least {{mvar|p}} in {{mvar|L}} and every way of "marking" {{mvar|p}} or more of the positions in {{mvar|s}}, {{mvar|s}} can be written as

: $s = uvwxy$

with strings {{mvar|u, v, w, x,}} and {{mvar|y}}, such that

{{mvar|vx}} has at least one marked position,
{{mvar|vwx}} has at most {{mvar|p}} marked positions, and
$uv^n wx^n y \in L$ for all $n \geq 0$ .

In the special case where every position is marked, Ogden's lemma is equivalent to the pumping lemma for context-free languages. Ogden's lemma can be used to show that certain languages are not context-free in cases where the pumping lemma is not sufficient. An example is the language $\{a^i b^j c^k d^l : i = 0 \text{ or } j = k = l\}$ .

Example applications

= Non-context-freeness =

The special case of Ogden's lemma is often sufficient to prove some languages are not context-free. For example, $\{ a^m b^n c^m d^n | m, n \geq 1\}$ is a standard example of non-context-free language,{{Cite book |last=Hopcroft |first=John E. |url=https://www.worldcat.org/oclc/4549363 |title=Introduction to automata theory, languages, and computation |date=1979 |publisher=Addison-Wesley |others=Jeffrey D. Ullman |isbn=0-201-02988-X |location=Reading, Mass. |oclc=4549363|page=128}}

{{Math proof|title=Proof|proof=

Suppose the language is generated by a context-free grammar, then let $p$ be the length required in Ogden's lemma, then consider the word $a^p\underline{b^pc^p}d^p$ in the language. Then the three conditions implied by Ogden's lemma cannot all be satisfied.

}}

Similarly, one can prove the "copy twice" language $L=\{w^2 | w \in \{a, b\}^*\}$ is not context-free, by using Ogden's lemma on $a^{2p}\underline{b^{2p}}a^{2p}b^{2p}$ .

And the given example last section $\{a^i b^j c^k d^l : i = 0 \text{ or } j = k = l\}$ is not context-free by using Ogden's lemma on $ab^{2p} \underline{c^{2p}}d^{2p}$ .

= Inherent ambiguity =

Ogden's lemma can be used to prove the inherent ambiguity of some languages, which is implied by the title of Ogden's paper.

Example: Let $L_0 = \{a^nb^mc^m | m, n \geq 1\}, L_1 = \{a^mb^mc^n | m, n \geq 1\}$ . The language $L = L_0 \cup L_1$ is inherently ambiguous. (Example from page 3 of Ogden's paper.)

{{Math proof|title=Proof|proof=

Let $p$ be the pumping length needed for Ogden's lemma, and apply it to the sentence $a^{p!+p}\underline{b^pc^p}$ .

By routine checking of the conditions of Ogden's lemma, we find that the derivation is

$S \Rightarrow^* uAv \Rightarrow^* uxAyv \Rightarrow^* uxzyv$

where $u = a^{p! + p}b^{p - s - k}, x = b^{k}, z = b^{s}c^{s'}, y = c^k, v = c^{p - s' - k}$ , satisfying $s+s' \geq 1$ and $k \geq 1$ and $p \geq s+s' + 2k$ .

Thus, we obtain a derivation of $a^{p!+p}b^{p!+p}c^{p!+p}$ by interpolating the derivation with $p!/k$ copies of $A \Rightarrow^* xAy$ . According to this derivation, an entire sub-sentence $x^{p!/k+1}zy^{p!/k+1} = b^{p!+k+s}c^{p!+k+s'}$ is the descendent of one node $A$ in the derivation tree.

Symmetrically, we can obtain another derivation of $a^{p!+p}b^{p!+p}c^{p!+p}$ , according to which there is an entire sub-sentence $a^{p!+k$ +s}b^{p!+k+s'} being the descendent of one node in the derivation tree.

Since $(p!+k+s) + (p!+k$ +s') > 2p! +1 > p! + p, the two sub-sentences have nonempty intersection, and since neither contains the other, the two derivation trees are different.

}}

Similarly, $L^*$ is inherently ambiguous, and for any CFG of the language, letting $p$ be the constant for Ogden's lemma, we find that $(a^{p!+p}b^{p!+p}c^{p!+p})^n$ has at least $2^n$ different parses. Thus $L^*$ has an unbounded degree of inherent ambiguity.

= Undecidability =

The proof can be extended to show that deciding whether a CFG is inherently ambiguous is undecidable, by reduction to the Post correspondence problem. It can also show that deciding whether a CFG has an unbounded degree of inherent ambiguity is undecidable. (page 4 of Ogden's paper)

{{Math proof|title=Construction|proof=

Given any Post correspondence problem over binary strings, we reduce it to a decision problem over a CFG.

Given any two lists of binary strings $\xi_1, ..., \xi_n$ and $\eta_1, ..., \eta_n$ , rewrite the binary alphabet to $\{d, e\}$ .

Let $L\left(\xi_1, \cdots, \xi_{n}\right)$ be the language over alphabet $\{d,e,f\}$ , generated by the CFG with rules $S \to \xi_iS d^i e| f$ for every $i = 1, ..., n$ . Similarly define $L\left(\eta_1, \cdots, \eta_{n}\right)$ .

Now, by the same argument as above, the language $(L_0 \cdot L\left(\xi_1, \cdots, \xi_{n}\right))\cup (L_1 \cdot L\left(\eta_1, \cdots, \eta_{n}\right))$ is inherently ambiguous iff the Post correspondence problem has a solution.

And the language $((L_0 \cdot L\left(\xi_1, \cdots, \xi_{n}\right))\cup (L_1 \cdot L\left(\eta_1, \cdots, \eta_{n}\right)))^*$ has an unbounded degree of inherent ambiguity iff the Post correspondence problem has a solution.

}}

Generalized condition

Bader and Moura have generalized the lemma{{cite journal |last1=Bader |first1=Christopher |last2=Moura |first2=Arnaldo |title=A Generalization of Ogden's Lemma |journal= Journal of the ACM|date=April 1982|volume=29 |issue=2 |pages=404–407 |doi=10.1145/322307.322315 |s2cid=33988796 |doi-access=free }} to allow marking some positions that are not to be included in {{mvar|vx}}. Their dependence of the parameters was later improved by Dömösi and Kudlek.{{Citation |last1=Dömösi |first1=Pál |title=Strong iteration lemmata for regular, linear, context-free, and linear indexed languages |date=1999 |url=http://dx.doi.org/10.1007/3-540-48321-7_18 |work=Fundamentals of Computation Theory |pages=226–233 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-66412-3 |access-date=2023-02-26 |last2=Kudlek |first2=Manfred|doi=10.1007/3-540-48321-7_18 |url-access=subscription }} If we denote the number of such excluded positions by {{mvar|e}}, then the number {{mvar|d}} of marked positions of which we want to include some in {{mvar|vx}} must satisfy $d\geq p(e+1)$ , where {{mvar|p}} is some constant that depends only on the language. The statement becomes that every {{mvar|s}} can be written as

: $s = uvwxy$

with strings {{mvar|u, v, w, x,}} and {{mvar|y}}, such that

{{mvar|vx}} has at least one marked position and no excluded position,
{{mvar|vwx}} has at most $p^{(e+1)}$ marked positions, and
$uv^n wx^n y \in L$ for all $n \geq 0$ .

Moreover, either each of {{mvar|u,v,w}} has a marked position, or each of $w,x,y$ has a marked position.

References

Category:Formal languages

Category:Lemmas