Generalized context-free grammar#Linear Context-free Rewriting Systems (LCFRSs)

{{Short description|Abstract language theory concept}}{{Use American English|date=January 2019}}

Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context-free composition functions to rewrite rules.{{cite thesis| type=Ph.D.| author=Weir, David Jeremy| title=Characterizing mildly context-sensitive grammar formalisms| date=Sep 1988| volume=AAI8908403| publisher=University of Pennsylvania Ann Arbor| series=Paper| url=http://www.sussex.ac.uk/Users/davidw/resources/papers/dissertation.pdf}} Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

Description

A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form f(\langle x_1, ..., x_m \rangle, \langle y_1, ..., y_n \rangle, ...) = \gamma, where \gamma is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like X \to f(Y, Z, ...), where Y, Z, ... are string tuples or non-terminal symbols.

The rewrite semantics of GCFGs is fairly straightforward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, successively reducing the tuples to a single tuple.

Example

A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in ({{EquationNote|1}}), which generates the palindrome language \{ ww^R : w \in \{a, b\}^{*} \}, where w^R is the string reverse of w, we can define the composition function conc as in ({{EquationNote|2a}}) and the rewrite rules as in ({{EquationNote|2b}}).

{{NumBlk|:|S \to \epsilon ~|~ aSa ~|~ bSb|{{EquationRef|1}}}}

{{NumBlk|:|conc(\langle x \rangle, \langle y \rangle, \langle z \rangle) = \langle xyz \rangle|{{EquationRef|2a}}}}

{{NumBlk|:| S \to conc(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle) ~|~ conc(\langle a \rangle, S, \langle a \rangle) ~|~ conc(\langle b \rangle, S, \langle b \rangle)|{{EquationRef|2b}}}}

The CF production of {{not a typo|abbbba}} is

: {{not a typo|S}}

: {{not a typo|aSa}}

: {{not a typo|abSba}}

: {{not a typo|abbSbba}}

: {{not a typo|abbbba}}

and the corresponding GCFG production is

: S \to conc(\langle a \rangle, S, \langle a \rangle)

: conc(\langle a \rangle, conc(\langle b \rangle, S, \langle b \rangle), \langle a \rangle)

: conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, S, \langle b \rangle), \langle b \rangle), \langle a \rangle)

: conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, conc(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle), \langle b \rangle), \langle b \rangle), \langle a \rangle)

: conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, \langle \epsilon \rangle, \langle b \rangle), \langle b \rangle), \langle a \rangle)

: conc(\langle a \rangle, conc(\langle b \rangle, \langle bb \rangle, \langle b \rangle), \langle a \rangle)

: conc(\langle a \rangle, \langle bbbb \rangle, \langle a \rangle)

: \langle abbbba \rangle

Linear Context-free Rewriting Systems (LCFRSs)

Weir (1988) describes two properties of composition functions, linearity and regularity. A function defined as f(x_1, ..., x_n) = ... is linear if and only if each variable appears at most once on either side of the =, making f(x) = g(x, y) linear but not f(x) = g(x, x). A function defined as f(x_1, ..., x_n) = ... is regular if the left hand side and right hand side have exactly the same variables, making f(x, y) = g(y, x) regular but not f(x) = g(x, y) or f(x, y) = g(x).

A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS). LCFRS is a proper subclass of the GCFGs, i.e. it has strictly less computational power than the GCFGs as a whole.

On the other hand, LCFRSs are strictly more expressive than linear-indexed grammars and their weakly equivalent variant tree adjoining grammars (TAGs).{{cite book|author=Laura Kallmeyer|title=Parsing Beyond Context-Free Grammars|url=https://books.google.com/books?id=F5wC0dko1L4C&pg=PA33|year=2010|publisher=Springer Science & Business Media|isbn=978-3-642-14846-0|page=33}} Head grammar is another example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.

LCFRS are weakly equivalent to (set-local) multicomponent TAGs (MCTAGs) and also with multiple context-free grammar (MCFGs [http://www.labri.fr/perso/salvati/downloads/cours/esslli/course1.pdf]).{{cite book|author=Laura Kallmeyer|title=Parsing Beyond Context-Free Grammars|url=https://books.google.com/books?id=F5wC0dko1L4C&pg=PA35|year=2010|publisher=Springer Science & Business Media|isbn=978-3-642-14846-0|page=35-36}} and minimalist grammars (MGs). The languages generated by LCFRS (and their weakly equivalents) can be parsed in polynomial time.{{cite book|author1=Johan F.A.K. van Benthem|author2=Alice ter Meulen|author2-link=Alice ter Meulen|title=Handbook of Logic and Language|url=https://books.google.com/books?id=K7yJLmZCbFUC&pg=PA404|year=2010|publisher=Elsevier|isbn=978-0-444-53727-0|page=404|edition=2nd}}

See also

References

{{Formal languages and grammars}}

{{DEFAULTSORT:Generalized Context-Free Grammar}}

Category:Formal languages

Category:Grammar frameworks