Synchronous context-free grammar

Synchronous context-free grammars (SynCFG or SCFG; not to be confused with stochastic CFGs) are a type of formal grammar designed for use in transfer-based machine translation. Rules in these grammars apply to two languages at the same time, capturing grammatical structures that are each other's translations.

The theory of SynCFGs borrows from syntax-directed transduction and syntax-based machine translation, modeling the reordering of clauses that occurs when translating a sentence by correspondences between phrase-structure rules in the source and target languages. Performance of SCFG-based MT systems has been found comparable with, or even better than, state-of-the-art phrase-based machine translation systems.{{cite journal

|last1 = Chiang

|first1 = David

|year = 2007

|title = Hierarchical phrase-based translation

|journal= Computational Linguistics

|volume = 33

|number = 2

|pages = 201–228

|doi = 10.1162/coli.2007.33.2.201

|s2cid = 3505719

|doi-access = free

}}

Several algorithms exist to perform translation using SynCFGs.{{cite conference

|last1 = Venugopal

|first1 = Ashish

|last2 = Zollmann

|first2 = Andreas

|last3 = Vogel

|first3 = Stephan

|year = 2007

|title = An efficient two-pass approach to Synchronous-CFG driven statistical MT

|book-title=Proc. NAACL HLT

|pages = 500–507

|url = https://www.aclweb.org/anthology/N07-1063

}}

Formalism

Rules in a SynCFG are superficially similar to CFG rules, except that they specify the structure of two phrases at the same time; one in the source language (the language being translated) and one in the target language. Numeric indices indicate correspondences between non-terminals in both constituent trees. Chiang gives the Chinese/English example:

: {{math|X →}} (yu {{math|X1}} you {{math|X2}}, have {{math|X2}} with {{math|X1}})

This rule indicates that an {{mvar|X}} phrase can be formed in Chinese with the structure "yu {{math|X1}} you {{math|X2}}", where {{math|X1}} and {{math|X2}} are variables standing in for subphrases; and that the corresponding structure in English is "have {{math|X2}} with {{math|X1}}" where {{math|X1}} and {{math|X2}} are independently translated to English.

Software

  • [http://cdec-decoder.org cdec], MT decoding package that supports SynCFGs
  • [http://joshua-decoder.org Joshua], a machine translation decoding system written in Java

References