Reduction strategy#Optimal reduction

{{Short description|Relation specifying a rewrite for each object, compatible with a reduction relation}}

In rewriting, a reduction strategy or rewriting strategy is a relation specifying a rewrite for each object or term, compatible with a given reduction relation. Some authors use the term to refer to an evaluation strategy.{{cite journal |last1=Selinger |first1=Peter |last2=Valiron |first2=Benoît |title=Quantum Lambda Calculus |journal=Semantic Techniques in Quantum Computation |date=2009 |page=23 |doi=10.1017/CBO9781139193313.005 |isbn=9780521513746 |url=https://www.mscs.dal.ca/~selinger/papers/qlambdabook.pdf |access-date=21 August 2021}}

Definitions

Formally, for an abstract rewriting system (A, \to), a reduction strategy \to_S is a binary relation on A with \to_S \subseteq \overset{+}{\to} , where \overset{+}{\to} is the transitive closure of \to (but not the reflexive closure).{{cite book |first1=Hélène |last1=Kirchner |editor-last1=Martí-Oliet |editor-first1=Narciso |editor-last2=Ölveczky |editor-first2=Peter Csaba |editor-last3=Talcott |editor-first3=Carolyn|chapter=Rewriting Strategies and Strategic Rewrite Programs |title=Logic, Rewriting, and Concurrency: Essays Dedicated to José Meseguer on the Occasion of His 65th Birthday |date=26 August 2015 |publisher=Springer |isbn=978-3-319-23165-5 |chapter-url=https://hal.inria.fr/hal-01143486v2/document | url=https://books.google.com/books?id=6lx1CgAAQBAJ&dq=H%C3%A9l%C3%A8ne%20Kirchner.%20Rewriting%20Strategies%20and%20Strategic%20Rewrite%20Programs&pg=PA380 |access-date=14 August 2021 |language=en}} In addition the normal forms of the strategy must be the same as the normal forms of the original rewriting system, i.e. for all a, there exists a b with a\to b iff \exists b'. a\to_S b'.{{cite book |last1=Klop |first1=Jan Willem |last2=van Oostrom |first2=Vincent |last3=van Raamsdonk |first3=Femke |title=Rewriting, Computation and Proof |chapter=Reduction Strategies and Acyclicity |series=Lecture Notes in Computer Science |date=2007 |volume=4600 |pages=89–112 |doi=10.1007/978-3-540-73147-4_5|isbn=978-3-540-73146-7 |citeseerx=10.1.1.104.9139|chapter-url=https://www.cs.vu.nl/~femke/ps/jpj60.pdf#page=6}}

A one step reduction strategy is one where \to_S \subseteq \to. Otherwise it is a many step strategy.{{cite web |last1=Klop |first1=J. W. |title=Term Rewriting Systems |url=http://www.cs.tau.ac.il/~nachum/papers/klop.pdf |website=Papers by Nachum Dershowitz and students |publisher=Tel Aviv University |access-date=14 August 2021 |page=77}}

A deterministic strategy is one where \to_S is a partial function, i.e. for each a\in A there is at most one b such that a \to_S b. Otherwise it is a nondeterministic strategy.

Term rewriting

In a term rewriting system a rewriting strategy specifies, out of all the reducible subterms (redexes), which one should be reduced (contracted) within a term.

One-step strategies for term rewriting include:

  • leftmost-innermost: in each step the leftmost of the innermost redexes is contracted, where an innermost redex is a redex not containing any redexes{{cite web |last1=Horwitz |first1=Susan B. |title=Lambda Calculus |url=http://pages.cs.wisc.edu/~horwitz/CS704-NOTES/1.LAMBDA-CALCULUS.html |website=CS704 Notes |publisher=University of Wisconsin Madison |access-date=19 August 2021}}
  • leftmost-outermost: in each step the leftmost of the outermost redexes is contracted, where an outermost redex is a redex not contained in any redexes
  • rightmost-innermost, rightmost-outermost: similarly

Many-step strategies include:

  • parallel-innermost: reduces all innermost redexes simultaneously. This is well-defined because the redexes are pairwise disjoint.
  • parallel-outermost: similarly
  • Gross-Knuth reduction,{{cite conference |last1=Barendregt |first1=H. P. |last2=Eekelen |first2=M. C. J. D. |last3=Glauert |first3=J. R. W. |last4=Kennaway |first4=J. R. |last5=Plasmeijer |first5=M. J. |last6=Sleep |first6=M. R. |title=Term graph rewriting |conference=Parallel Architectures and Languages Europe |date=1987 |volume=259 |pages=141–158 |doi=10.1007/3-540-17945-3_8}} also called full substitution or Kleene reduction: all redexes in the term are simultaneously reduced

Parallel outermost and Gross-Knuth reduction are hypernormalizing for all almost-orthogonal term rewriting systems, meaning that these strategies will eventually reach a normal form if it exists, even when performing (finitely many) arbitrary reductions between successive applications of the strategy.{{cite journal |last1=Antoy |first1=Sergio |last2=Middeldorp |first2=Aart |title=A sequential reduction strategy |journal=Theoretical Computer Science |date=September 1996 |volume=165 |issue=1 |pages=75–95 |doi=10.1016/0304-3975(96)00041-2 |url=https://web.cecs.pdx.edu/~antoy/homepage/publications/tcs96/paper.pdf |access-date=8 September 2021}}

Stratego is a domain-specific language designed specifically for programming term rewriting strategies.{{cite journal |last1=Kieburtz |first1=Richard B. |title=A Logic for Rewriting Strategies |journal=Electronic Notes in Theoretical Computer Science |date=November 2001 |volume=58 |issue=2 |pages=138–154 |doi=10.1016/S1571-0661(04)00283-X |doi-access=free }}

Lambda calculus

{{See also|Lambda calculus#Reduction strategies}}

In the context of the lambda calculus, normal-order reduction refers to leftmost-outermost reduction in the sense given above. Normal-order reduction is normalizing, in the sense that if a term has a normal form, then normal‐order reduction will eventually reach it, hence the name normal. This is known as the standardization theorem.{{cite book|last1=Curry|first1=Haskell B.|author-link1=Haskell Curry|last2=Feys|first2=Robert|author-link2=Robert Feys|title=Combinatory Logic|volume=I|year=1958|publisher=North Holland|location=Amsterdam|isbn=0-7204-2208-6|pages=139–142}}{{cite web |last1=Kashima |first1=Ryo |title=A Proof of the Standardization Theorem in λ-Calculus |url=http://www.is.titech.ac.jp/~kashima/pub/C-145.pdf |publisher=Tokyo Institute of Technology |access-date=19 August 2021}}

Leftmost reduction is sometimes used to refer to normal order reduction, as with a pre-order traversal the notions coincide, and similarly the leftmost-outermost redex is the redex with leftmost starting character when the lambda term is considered as a string of characters.{{cite thesis |type=PhD |last=Vial |first=Pierre |date=7 December 2017 |title=Non-Idempotent Typing Operators, beyond the λ-Calculus |publisher=Sorbonne Paris Cité |url=https://pierrevial.github.io/these-vial.pdf | page=62}}{{cite thesis |last1=Partain |first1=William D. |title=Graph Reduction Without Pointers |url=http://www.cs.unc.edu/techreports/89-045.pdf#page=29|type=PhD|date=December 1989 |publisher=University of North Carolina at Chapel Hill |access-date=10 January 2022}} When "leftmost" is defined using an in-order traversal the notions are distinct. For example, in the term (\lambda x. x \Omega) (\lambda y. I) with \Omega, I defined here, the leftmost redex of the in-order traversal is \Omega while the leftmost-outermost redex is the entire expression.{{cite conference |last1=Van Oostrom |first1=Vincent |last2=Toyama |first2=Yoshihito |title=Normalisation by Random Descent |conference=1st International Conference on Formal Structures for Computation and Deduction |date=2016 |page=32:3 |doi=10.4230/LIPIcs.FSCD.2016.32 |doi-access=free |url=https://drops.dagstuhl.de/opus/volltexte/2016/5986/pdf/LIPIcs-FSCD-2016-32.pdf}}

Applicative order reduction refers to leftmost-innermost reduction.{{cite book |last1=Mazzola |first1=Guerino |last2=Milmeister |first2=Gérard |last3=Weissmann |first3=Jody |title=Comprehensive Mathematics for Computer Scientists 2 |date=21 October 2004 |publisher=Springer Science & Business Media |isbn=978-3-540-20861-7 |page=323 |url=https://books.google.com/books?id=SMWx5fQZcEUC&dq=applicative%20order%20normal%20order&pg=PA323 |language=en}}

In contrast to normal order, applicative order reduction may not terminate, even when the term has a normal form. For example, using applicative order reduction, the following sequence of reductions is possible:

\begin{align}

&(\mathbf{\lambda} x . z) ((\lambda w. w w w) (\lambda w. w w w)) \\

\rightarrow &(\lambda x . z) ((\lambda w. w w w) (\lambda w. w w w) (\lambda w. w w w))\\

\rightarrow &(\lambda x . z) ((\lambda w. w w w) (\lambda w. w w w) (\lambda w. w w w) (\lambda w. w w w))\\

\rightarrow &(\lambda x . z) ((\lambda w. w w w) (\lambda w. w w w) (\lambda w. w w w) (\lambda w. w w w) (\lambda w. w w w))\\

&\ldots

\end{align}

But using normal-order reduction, the same starting point reduces quickly to normal form:

(\mathbf{\lambda} x . z) ((\lambda w. w w w) (\lambda w. w w w))

\rightarrow z

Full β-reduction refers to the nondeterministic one-step strategy that allows reducing any redex at each step.{{cite book |last=Pierce |first=Benjamin C. |author-link=Benjamin C. Pierce |title=Types and Programming Languages |year=2002 |publisher=MIT Press |isbn=0-262-16209-1 | page=56|url=https://books.google.com/books?id=ti6zoAC9Ph8C&pg=PA56}} Takahashi's parallel β-reduction is the strategy that reduces all redexes in the term simultaneously.{{cite journal |last1=Takahashi |first1=M. |title=Parallel Reductions in λ-Calculus |journal=Information and Computation |date=April 1995 |volume=118 |issue=1 |pages=120–127 |doi=10.1006/inco.1995.1057 |doi-access=free }}

= Weak reduction =

Normal and applicative order reduction are strong in that they allow reduction under lambda abstractions. In contrast, weak reduction does not reduce under a lambda abstraction.{{cite book |last1=Blanc |first1=Tomasz |last2=Lévy |first2=Jean-Jacques |last3=Maranget |first3=Luc |title=Processes, Terms and Cycles: Steps on the Road to Infinity: Essays Dedicated to Jan Willem Klop on the Occasion of His 60th Birthday |date=2005 |publisher=Springer |isbn=978-3-540-32425-6 |pages=70–87 |language=en |chapter=Sharing in the Weak Lambda-Calculus|doi=10.1007/11601548_7|citeseerx=10.1.1.129.147}} Call-by-name reduction is the weak reduction strategy that reduces the leftmost outermost redex not inside a lambda abstraction, while call-by-value reduction is the weak reduction strategy that reduces the leftmost innermost redex not inside a lambda abstraction. These strategies were devised to reflect the call-by-name and call-by-value evaluation strategies.{{Cite book | last = Sestoft | first = Peter | chapter = Demonstrating Lambda Calculus Reduction | editor1-last = Mogensen | editor1-first = T | editor2-last = Schmidt | editor2-first = D | editor3-last = Sudborough | editor3-first = I. H. | publisher = Springer-Verlag | title = The Essence of Computation: Complexity, Analysis, Transformation. Essays Dedicated to Neil D. Jones | series = Lecture Notes in Computer Science | volume = 2566 | year = 2002 | pages = 420–435 | chapter-url = http://www.itu.dk/people/sestoft/papers/sestoft-lamreduce.pdf | isbn = 3-540-00326-6}} In fact, applicative order reduction was also originally introduced to model the call-by-value parameter passing technique found in Algol 60 and modern programming languages. When combined with the idea of weak reduction, the resulting call-by-value reduction is indeed a faithful approximation.{{cite book |last1=Felleisen |first1=Matthias |title=Semantics engineering with PLT Redex |date=2009 |publisher=MIT Press |location=Cambridge, Mass. |isbn=978-0262062756 | page=42}}

Unfortunately, weak reduction is not confluent, and the traditional reduction equations of the lambda calculus are useless, because they suggest relationships that violate the weak evaluation regime. However, it is possible to extend the system to be confluent by allowing a restricted form of reduction under an abstraction, in particular when the redex does not involve the variable bound by the abstraction. For example, {{Mono|λx.(λy.x)z}} is in normal form for a weak reduction strategy because the redex {{Mono|(λy.x)z}} is contained in a lambda abstraction. But the term {{Mono|λx.(λy.y)z}} can still be reduced under the extended weak reduction strategy, because the redex {{Mono|(λy.y)z}} does not refer to {{Mono|x}}.{{cite conference |last1=Sestini |first1=Filippo |title=Normalization by Evaluation for Typed Weak lambda-Reduction |conference=24th International Conference on Types for Proofs and Programs (TYPES 2018) |date=2019 |doi=10.4230/LIPIcs.TYPES.2018.6 |doi-access=free |url=https://drops.dagstuhl.de/opus/volltexte/2019/11410/pdf/LIPIcs-TYPES-2018-6.pdf}}

= Optimal reduction =

Optimal reduction is motivated by the existence of lambda terms where there does not exist a sequence of reductions which reduces them without duplicating work. For example, consider

((λg.(g(g(λx.x))))

(λh.((λf.(f(f(λz.z))))

(λw.(h(w(λy.y)))))))

It is composed of three nested terms, {{Mono|1=x=((λg. ... ) (λh.y))}}, {{Mono|1=y=((λf. ...) (λw.z) )}}, and {{Mono|1=z=λw.(h(w(λy.y)))}}. There are only two possible β-reductions to be done here, on x and on y. Reducing the outer x term first results in the inner y term being duplicated, and each copy will have to be reduced, but reducing the inner y term first will duplicate its argument z, which will cause work to be duplicated when the values of h and w are made known.{{efn|Incidentally, the above term reduces to the identity function {{Mono|(λy.y)}}, and is constructed by making wrappers which make the identity function available to the binders {{Mono|1=g=λh...}}, {{Mono|1=f=λw...}}, {{Mono|1=h=λx.x}} (at first), and {{Mono|1=w=λz.z}} (at first), all of which are applied to the innermost term {{Mono|λy.y}}.}}

Optimal reduction is not a reduction strategy for the lambda calculus in a narrow sense because performing β-reduction loses the information about the substituted redexes being shared. Instead it is defined for the labelled lambda calculus, an annotated lambda calculus which captures a precise notion of the work that should be shared.{{cite book |last1=Asperti |first1=Andrea |last2=Guerrini |first2=Stefano |title=The optimal implementation of functional programming languages |date=1998 |publisher=Cambridge University Press |location=Cambridge, UK |isbn=0521621127}}{{rp|113–114}}

Labels consist of a countably infinite set of atomic labels, and concatenations ab, overlinings \overline{a} and underlinings \underline{a} of labels. A labelled term is a lambda calculus term where each subterm has a label. The standard initial labeling of a lambda term gives each subterm a unique atomic label.{{rp|132}} Labelled β-reduction is given by:{{cite journal |last1=Fernández |first1=Maribel |last2=Siafakas |first2=Nikolaos |title=Labelled Lambda-calculi with Explicit Copy and Erase |journal=Electronic Proceedings in Theoretical Computer Science |date=30 March 2010 |volume=22 |pages=49–64 |doi=10.4204/EPTCS.22.5 |arxiv=1003.5515v1 |s2cid=15500633 }}

((\lambda x. M )^\alpha N)^\beta \to \beta \overline{\alpha} \cdot M[x \mapsto \underline{\alpha}\cdot N]

where \cdot concatenates labels, \beta\cdot T^\alpha = T^{\beta\alpha}, and substitution M[x \mapsto N] is defined as follows (using the Barendregt convention):

\begin{align}

x^{\alpha}[x\mapsto N] & = \alpha\cdot N & \quad (\lambda y.M)^{\alpha}[x\mapsto N] & = (\lambda y.M[x\mapsto N])^{\alpha} \\

y^{\alpha}[x\mapsto N] & = y^{\alpha} & \quad (MN)^{\alpha}[x\mapsto P] & = (M[x\mapsto P]N[x\mapsto P])^{\alpha}

\end{align}

The system can be proven to be confluent. Optimal reduction is then defined to be normal order or leftmost-outermost reduction using reduction by families, i.e. the parallel reduction of all redexes with the same function part label.{{cite conference |last1=Lévy |first1=Jean-Jacques |title=Sharing in the Evaluation of lambda Expressions|conference=Second Franco-Japanese Symposium on Programming of Future Generation Computers|location=Cannes, France|date=9–11 November 1987|url=http://pauillac.inria.fr/~levy/pubs/88icot.pdf|page=187|isbn=0444705260}} The strategy is optimal in the sense that it performs the optimal (minimal) number of family reduction steps.{{cite book |author1=Terese |title=Term rewriting systems |date=2003 |publisher=Cambridge University Press |location=Cambridge, UK |isbn=978-0-521-39115-3 |page=518}}

A practical algorithm for optimal reduction was first described in 1989,{{cite conference |last1=Lamping |first1=John |title=An algorithm for optimal lambda calculus reduction |conference=17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '90 |date=1990 |pages=16–30 |doi=10.1145/96709.96711|url=https://thelackthereof.org/docs/library/unsorted/programming/lambda_calc/p16-lamping.pdf}} more than a decade after optimal reduction was first defined in 1974.{{cite thesis |last=Lévy |first=Jean-Jacques |date=June 1974 |title=Réductions sures dans le lambda-calcul |type=PhD |publisher=Université Paris VII |oclc=476040273 |url=http://pauillac.inria.fr/~levy/pubs/74phd-cycle3.pdf |access-date=17 August 2021 |language=fr|pages=81–109 }} The Bologna Optimal Higher-order Machine (BOHM) is a prototype implementation of an extension of the technique to interaction nets.{{rp|362}}{{cite web |last1=Asperti |first1=Andrea |title=Bologna Optimal Higher-Order Machine, Version 1.1 |url=https://github.com/asperti/BOHM1.1/ |website=GitHub |language=en}} Lambdascope is a more recent implementation of optimal reduction, also using interaction nets.{{cite conference|first1=Vincent|last1=van Oostrom|first2=Kees-Jan|last2=van de Looij|first3=Marijn |last3=Zwitserlood|title=] (Lambdascope): Another optimal implementation of the lambda-calculus|conference=Workshop on Algebra and Logic on Programming Systems (ALPS)|date=2004|url=http://www.phil.uu.nl/~oostrom/publication/pdf/lambdascope.pdf}}{{efn|A summary of recent research on optimal reduction can be found in the short article [https://arxiv.org/abs/1701.04240v1 About the efficient reduction of lambda terms].}}

= Call by need reduction =

Call by need reduction can be defined similarly to optimal reduction as weak leftmost-outermost reduction using parallel reduction of redexes with the same label, for a slightly different labelled lambda calculus. An alternate definition changes the beta rule to an operation that finds the next "needed" computation, evaluates it, and substitutes the result into all locations. This requires extending the beta rule to allow reducing terms that are not syntactically adjacent.{{cite book |last1=Chang |first1=Stephen |last2=Felleisen |first2=Matthias |title=Programming Languages and Systems |chapter=The Call-by-Need Lambda Calculus, Revisited |series=Lecture Notes in Computer Science |date=2012 |volume=7211 |pages=128–147 |doi=10.1007/978-3-642-28869-2_7 |isbn=978-3-642-28868-5 |s2cid=6350826 |chapter-url=https://www2.ccs.neu.edu/racket/pubs/esop12-cf.pdf}} As with call-by-name and call-by-value, call-by-need reduction was devised to mimic the behavior of the evaluation strategy known as "call-by-need" or lazy evaluation.

See also

Notes

{{notelist}}

References

{{Reflist}}