Inside–outside algorithm

{{Short description|Parameter estimation method for probabilistic context-free grammars}}

For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).

Inside and outside probabilities

The inside probability \beta_j(p,q) is the total probability of generating words w_p \cdots w_q, given the root nonterminal N^j and a grammar G:{{cite book |first=Christopher D. | last=Manning |author2=Hinrich Schütze |title=Foundations of Statistical Natural Language Processing |url=https://archive.org/details/foundationsstati00mann_118 |url-access=limited |publisher=MIT Press | location=Cambridge, MA, USA | year=1999 |isbn=0-262-13360-1 | pages=[https://archive.org/details/foundationsstati00mann_118/page/n427 388]–402}}

:\beta_j(p,q) = P(w_{pq}|N^j_{pq}, G)

The outside probability \alpha_j(p,q) is the total probability of beginning with the start symbol N^1 and generating the nonterminal N^j_{pq} and all the words outside w_p \cdots w_q, given a grammar G:

:\alpha_j(p,q) = P(w_{1(p-1)}, N^j_{pq}, w_{(q+1)m}|G)

Computing inside probabilities

Base Case:

\beta_j(p,p) = P(w_{p}|N^j, G)

General case:

Suppose there is a rule N_j \rightarrow N_r N_s in the grammar, then the probability of generating w_p \cdots w_q starting with a subtree rooted at N_j is:

\sum_{k=p}^{k=q-1} P(N_j \rightarrow N_r N_s)\beta_r(p,k) \beta_s(k+1,q)

The inside probability \beta_j(p,q) is just the sum over all such possible rules:

\beta_j(p,q) = \sum_{N_r,N_s} \sum_{k=p}^{k=q-1} P(N_j \rightarrow N_r N_s)\beta_r(p,k) \beta_s(k+1,q)

Computing outside probabilities

Base Case:

\alpha_j(1,n) = \begin{cases}

1 & \mbox{if } j=1 \\

0 & \mbox{otherwise}

\end{cases}

Here the start symbol is N_1.

General case:

Suppose there is a rule N_r \rightarrow N_j N_s in the grammar that generates N_j.

Then the left contribution of that rule to the outside probability \alpha_j(p,q) is:

\sum_{k=q+1}^{k=n} P(N_r \rightarrow N_j N_s)\alpha_r(p,k) \beta_s(q+1,k)

Now suppose there is a rule N_r \rightarrow N_s N_j in the grammar. Then the right

contribution of that rule to the outside probability \alpha_j(p,q) is:

\sum_{k=1}^{k=p-1} P(N_r \rightarrow N_s N_j)\alpha_r(k,q) \beta_s(k,p-1)

The outside probability \alpha_j(p,q) is the sum of the left and right

contributions over all such rules:

\alpha_j(p,q) =

\sum_{N_r,N_s} \sum_{k=q+1}^{k=n} P(N_r \rightarrow N_j N_s)\alpha_r(p,k) \beta_s(q+1,k)

+

\sum_{N_r,N_s} \sum_{k=1}^{k=p-1} P(N_r \rightarrow N_s N_j)\alpha_r(k,q) \beta_s(k,p-1)

References

{{Reflist}}

  • J. Baker (1979): [https://asa.scitation.org/doi/abs/10.1121/1.2017061 Trainable grammars for speech recognition]. In J. J. Wolf and D. H. Klatt, editors, Speech communication papers presented at the 97th meeting of the Acoustical Society of America, pages 547–550, Cambridge, MA, June 1979. MIT.
  • Karim Lari, Steve J. Young (1990): [http://courses.cs.washington.edu/courses/cse599d1/16sp/lari-young-90.pdf The estimation of stochastic context-free grammars using the inside–outside algorithm]. Computer Speech and Language, 4:35–56.
  • Karim Lari, Steve J. Young (1991): [https://www.sciencedirect.com/science/article/pii/088523089190009F Applications of stochastic context-free grammars using the Inside–Outside algorithm]. Computer Speech and Language, 5:237–257.
  • Fernando Pereira, Yves Schabes (1992): [http://www.aclweb.org/anthology/P92-1017 Inside–outside reestimation from partially bracketed corpora]. Proceedings of the 30th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, 128–135.