Program synthesis

{{Short description|Task to construct a program meeting a formal specification}}

In computer science, program synthesis is the task to construct a program that provably{{sic|hide=y}} satisfies a given high-level formal specification. In contrast to program verification, the program is to be constructed rather than given; however, both fields make use of formal proof techniques, and both comprise approaches of different degrees of automation. In contrast to automatic programming techniques, specifications in program synthesis are usually non-algorithmic statements in an appropriate logical calculus.{{cite book |last1= Basin |first1= D. |last2= Deville |first2= Y. |last3= Flener |first3 = P. |last4= Hamfelt |first4= A. |last5= Fischer Nilsson |first5= J. |chapter= Synthesis of programs in computational logic |editor=M. Bruynooghe and K.-K. Lau | publisher=Springer | series=LNCS | volume=3049 | pages=30–65 |citeseerx=10.1.1.62.4976 |title= Program Development in Computational Logic | year = 2004 }}

The primary application of program synthesis is to relieve the programmer of the burden of writing correct, efficient code that satisfies a specification. However, program synthesis also has applications to superoptimization and inference of loop invariants.{{harv|Alur|Singh|Fisman}}

Origin

During the Summer Institute of Symbolic Logic at Cornell University in 1957, Alonzo Church defined the problem to synthesize a circuit from mathematical requirements.{{cite journal| author=Alonzo Church| title=Applications of recursive arithmetic to the problem of circuit synthesis| journal=Summaries of the Summer Institute of Symbolic Logic|date=1957| volume=1| pages=3–50}} Even though the work only refers to circuits and not programs, the work is considered to be one of the earliest descriptions of program synthesis and some researchers refer to program synthesis as "Church's Problem". In the 1960s, a similar idea for an "automatic programmer" was explored by researchers in artificial intelligence.{{Citation needed|date=June 2010}}

Since then, various research communities considered the problem of program synthesis. Notable works include the 1969 automata-theoretic approach by Büchi and Landweber,{{cite journal| author=Richard Büchi, Lawrence Landweber| title=Solving Sequential Conditions by Finite-State Strategies| journal=Transactions of the American Mathematical Society|date=Apr 1969| volume=138| pages=295–311| doi=10.2307/1994916| url=http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1087&context=cstech| jstor=1994916| url-access=subscription}} and the works by Manna and Waldinger (c. 1980). The development of modern high-level programming languages can also be understood as a form of program synthesis.

21st century developments

{{expand section|with=a more detailed overview of contemporary approaches|date=December 2022}}

The early 21st century has seen a surge of practical interest in the idea of program synthesis in the formal verification community and related fields. Armando Solar-Lezama showed that it is possible to encode program synthesis problems in Boolean logic and use algorithms for the Boolean satisfiability problem to automatically find programs.{{harv|Solar-Lezama}}

= Syntax-guided synthesis =

In 2013, a unified framework for program synthesis problems called Syntax-guided Synthesis (stylized SyGuS) was proposed by researchers at UPenn, UC Berkeley, and MIT.{{cite conference |title =Syntax-guided Synthesis |last1 = Alur |first1 = Rajeev| last2 = al.| first2 = et| date = 2013| publisher = IEEE| book-title = Proceedings of Formal Methods in Computer-Aided Design| pages = 8}} The input to a SyGuS algorithm consists of a logical specification along with a context-free grammar of expressions that constrains the syntax of valid solutions.{{harv|David|Kroening}} For example, to synthesize a function {{var|f}} that returns the maximum of two integers, the logical specification might look like this:

{{math|(f(x,y) {{=}} x ∨ f(x,y) {{=}} y) ∧ f(x,y) ≥ x ∧ f(x,y) ≥ y}}

and the grammar might be:

::= x | y | 0 | 1 | + | ite(, , )

::= <=

where "ite" stands for "if-then-else". The expression

ite(x <= y, y, x)

would be a valid solution, because it conforms to the grammar and the specification.

From 2014 through 2019, the yearly Syntax-Guided Synthesis Competition (or SyGuS-Comp) compared the different algorithms for program synthesis in a competitive event.[https://sygus-org.github.io/comp/ SyGuS-Comp (Syntax-Guided Synthesis Competition)] The competition used a standardized input format, SyGuS-IF, based on SMT-Lib 2. For example, the following SyGuS-IF encodes the problem of synthesizing the maximum of two integers (as presented above):

(set-logic LIA)

(synth-fun f ((x Int) (y Int)) Int

((i Int) (c Int) (b Bool))

((i Int (c x y (+ i i) (ite b i i)))

(c Int (0 1))

(b Bool ((<= i i)))))

(declare-var x Int)

(declare-var y Int)

(constraint (>= (f x y) x))

(constraint (>= (f x y) y))

(constraint (or (= (f x y) x) (= (f x y) y)))

(check-synth)

A compliant solver might return the following output:

((define-fun f ((x Int) (y Int)) Int (ite (<= x y) y x)))

= Counter-example guided inductive synthesis =

Counter-example guided inductive synthesis (CEGIS) is an effective approach to building sound program synthesizers.{{harv|Solar-Lezama}}{{harv|David|Kroening}} CEGIS involves the interplay of two components: a generator which generates candidate programs, and a verifier which checks whether the candidates satisfy the specification.

Given a set of inputs {{var|I}}, a set of possible programs {{var|P}}, and a specification {{var|S}}, the goal of program synthesis is to find a program {{var|p}} in {{var|P}} such that for all inputs {{var|i}} in {{var|I}}, {{var|S}}({{var|p}}, {{var|i}}) holds. CEGIS is parameterized over a generator and a verifier:

The generator takes a set of inputs {{var|T}}, and outputs a candidate program {{var|c}} that is correct on all the inputs in {{var|T}}, that is, a candidate such that for all inputs {{var|t}} in {{var|T}}, {{var|S}}({{var|c}}, {{var|t}}) holds.
The verifier takes a candidate program {{var|c}} and returns true if the program satisfies {{var|S}} on all inputs, and otherwise returns a counterexample, that is, an input {{var|e}} in {{var|I}} such that {{var|S}}({{var|c}}, {{var|e}}) fails.

CEGIS runs the generator and verifier run in a loop, accumulating counter-examples:

algorithm cegis is

input: Program generator generate,

verifier verify,

specification spec,

output: Program that satisfies spec, or failure

inputs := empty set

loop

candidate := generate(spec, inputs)

if verify(spec, candidate) then

return candidate

else verify yields a counterexample e

add e to inputs

end if

Implementations of CEGIS typically use SMT solvers as verifiers.

CEGIS was inspired by counterexample-guided abstraction refinement (CEGAR).{{harv|Solar-Lezama}}

The framework of Manna and Waldinger

Nr	Assertions	Goals	Program	Origin
class="wikitable" style="float:right;" \|+ Non-clausal resolution rules (unifying substitutions not shown)
51	$E[p]$
52	$F[p]$
53		$G[p]$	s
54		$H[p]$	t
55	$E[\text{true}] \lor F[\text{false}]$			Resolve(51,52)
56		$\lnot F[\text{true}] \land G[\text{false}]$	s	Resolve(52,53)
57		$\lnot F[\text{false}] \land G[\text{true}]$	s	Resolve(53,52)
58		$G[\text{true}] \land H[\text{false}]$	p ? s : t	Resolve(53,54)

The framework of Manna and Waldinger, published in 1980,{{cite journal| author=Zohar Manna, Richard Waldinger| title=A Deductive Approach to Program Synthesis| journal=ACM Transactions on Programming Languages and Systems|date=Jan 1980| volume=2| pages=90–121| doi=10.1145/357084.357090| s2cid=14770735}}{{cite report | url=https://apps.dtic.mil/dtic/tr/fulltext/u2/a065558.pdf| archive-url=https://web.archive.org/web/20210127052044/https://apps.dtic.mil/dtic/tr/fulltext/u2/a065558.pdf| url-status=live| archive-date=January 27, 2021 | author=Zohar Manna and Richard Waldinger | title=A Deductive Approach to Program Synthesis | institution=SRI International | type=Technical Note | number=177 | date=Dec 1978 }} starts from a user-given first-order specification formula. For that formula, a proof is constructed, thereby also synthesizing a functional program from unifying substitutions.

The framework is presented in a table layout, the columns containing:

A line number ("Nr") for reference purposes
Formulas that already have been established, including axioms and preconditions, ("Assertions")
Formulas still to be proven, including postconditions, ("Goals"),The distinction "Assertions" / "Goals" is for convenience only; following the paradigm of proof by contradiction, a Goal $F$ is equivalent to an assertion $\lnot F$ .
Terms denoting a valid output value ("Program")When $F$ and $s$ is the Goal formula and the Program term in a line, respectively, then in all cases where $F$ holds, $s$ is a valid output of the program to be synthesized. This invariant is maintained by all proof rules. An Assertion formula usually is not associated with a Program term.
A justification for the current line ("Origin")

Initially, background knowledge, pre-conditions, and post-conditions are entered into the table. After that, appropriate proof rules are applied manually. The framework has been designed to enhance human readability of intermediate formulas: contrary to classical resolution, it does not require clausal normal form, but allows one to reason with formulas of arbitrary structure and containing any junctors ("non-clausal resolution"). The proof is complete when $\it true$ has been derived in the Goals column, or, equivalently, $\it false$ in the Assertions column. Programs obtained by this approach are guaranteed to satisfy the specification formula started from; in this sense they are correct by construction.See Manna, Waldinger (1980), p.100 for correctness of the resolution rules. Only a minimalist, yet Turing-complete,{{cite tech report |last1=Boyer |first1=Robert S. |last2=Moore |first2=J. Strother |title=A Mechanical Proof of the Turing Completeness of Pure Lisp |number=37 |institution=Institute for Computing Science, University of Texas at Austin |date=May 1983 |url=http://www.cs.utexas.edu/users/boyer/ftp/ics-reports/cmp37.pdf |format=PDF |url-status=live |archive-date=22 September 2017 |archive-url=https://web.archive.org/web/20170922044624/http://www.cs.utexas.edu/users/boyer/ftp/ics-reports/cmp37.pdf |df=dmy-all }} purely functional programming language, consisting of conditional, recursion, and arithmetic and other operatorsOnly the conditional operator (?:) is supported from the beginning. However, arbitrary new operators and relations can be added by providing their properties as axioms. In the toy example below, only the properties of $=$ and $\leq$ that are actually needed in the proof have been axiomatized, in line 1 to 3. is supported.

Case studies performed within this framework synthesized algorithms to compute e.g. division, remainder,Manna, Waldinger (1980), p.108-111 square root,{{cite journal | author=Zohar Manna and Richard Waldinger | title=The Origin of a Binary-Search Paradigm | journal=Science of Computer Programming | volume=9 | number=1 | pages=37–83 | date=Aug 1987 | doi=10.1016/0167-6423(87)90025-6 | doi-access= }} term unification,{{cite journal | author=Daniele Nardi | title=Formal Synthesis of a Unification Algorithm by the Deductive-Tableau Method | journal=Journal of Logic Programming | volume=7 | pages=1–43 | year=1989 | doi=10.1016/0743-1066(89)90008-3 | doi-access= }} answers to relational database queries{{cite book | author=Daniele Nardi and Riccardo Rosati | contribution=Deductive Synthesis of Programs for Query Answering | editor=Kung-Kiu Lau and Tim Clement | title=International Workshop on Logic Program Synthesis and Transformation (LOPSTR) | publisher=Springer | series=Workshops in Computing | pages=15–29 | year=1992 | doi=10.1007/978-1-4471-3560-9_2 | isbn=978-3-540-19806-2 }} and several sorting algorithms.{{cite book | author=Jonathan Traugott | contribution=Deductive Synthesis of Sorting Programs | title=Proceedings of the International Conference on Automated Deduction | publisher=Springer | series=LNCS | volume=230 | pages=641–660 | year=1986 }}{{cite journal | author=Jonathan Traugott | title=Deductive Synthesis of Sorting Programs | journal=Journal of Symbolic Computation | volume=7 | number=6 | pages=533–572 | date=Jun 1989 | doi=10.1016/S0747-7171(89)80040-9 | doi-access= }}

=Proof rules=

Proof rules include:

Non-clausal resolution (see table).

:For example, line 55 is obtained by resolving Assertion formulas $E$ from 51 and $F$ from 52 which both share some common subformula $p$ . The resolvent is formed as the disjunction of $E$ , with $p$ replaced by $\it true$ , and $F$ , with $p$ replaced by $\it false$ . This resolvent logically follows from the conjunction of $E$ and $F$ . More generally, $E$ and $F$ need to have only two unifiable subformulas $p_1$ and $p_2$ , respectively; their resolvent is then formed from $E \theta$ and $F \theta$ as before, where $\theta$ is the most general unifier of $p_1$ and $p_2$ . This rule generalizes resolution of clauses.Manna, Waldinger (1980), p.99

:Program terms of parent formulas are combined as shown in line 58 to form the output of the resolvent. In the general case, $\theta$ is applied to the latter also. Since the subformula $p$ appears in the output, care must be taken to resolve only on subformulas corresponding to computable properties.

Logical transformations.

:For example, $E \land (F \lor G)$ can be transformed to $(E \land F) \lor (E \land G)$ ) in Assertions as well as in Goals, since both are equivalent.

Splitting of conjunctive assertions and of disjunctive goals.

:An example is shown in lines 11 to 13 of the toy example below.

Structural induction.

:This rule allows for synthesis of recursive functions. For a given pre- and postcondition "Given $x$ such that $\textit{pre}(x)$ , find $f(x) = y$ such that $\textit{post}(x,y)$ ", and an appropriate user-given well-ordering $\prec$ of the domain of $x$ , it is always sound to add an Assertion " $x' \prec x \land \textit{pre}(x') \implies \textit{post}(x',f(x'))$ ".Manna, Waldinger (1980), p.104 Resolving with this assertion can introduce a recursive call to $f$ in the Program term.

:An example is given in Manna, Waldinger (1980), p.108-111, where an algorithm to compute quotient and remainder of two given integers is synthesized, using the well-order $(n',d') \prec (n,d)$ defined by $0 \leq n' < n$ (p.110).

Murray has shown these rules to be complete for first-order logic.Manna, Waldinger (1980), p.103, referring to: {{cite tech report| author=Neil V. Murray| title=A Proof Procedure for Quantifier-Free Non-Clausal First Order Logic|date=Feb 1979| number=2-79| institution=Syracuse Univ.| url=http://surface.syr.edu/cgi/viewcontent.cgi?article=1005&context=eecs_techreports}}

In 1986, Manna and Waldinger added generalized E-resolution and paramodulation rules to handle also equality;{{cite journal| author=Zohar Manna, Richard Waldinger| title=Special Relations in Automated Deduction| journal=Journal of the ACM| volume=33|date=Jan 1986| pages=1–59|doi=10.1145/4904.4905 | s2cid=15140138| doi-access=free}} later, these rules turned out to be incomplete (but nevertheless sound).{{cite book| author=Zohar Manna, Richard Waldinger| chapter=The Special-Relations Rules are Incomplete| title=Proc. CADE 11| year=1992| volume=607| pages=492–506| publisher=Springer| series=LNCS}}

=Example=

Nr	Assertions	Goals	Program	Origin
class="wikitable" style="float:right;" \|+ Example synthesis of maximum function
1	$A=A$			Axiom
2	$A \leq A$			Axiom
3	$A \leq B \lor B \leq A$			Axiom
10		$x \leq M \land y \leq M \land (x = M \lor y = M)$	M	Specification
11		$(x \leq M \land y \leq M \land x = M) \lor (x \leq M \land y \leq M \land y = M)$	M	Distr(10)
12		$x \leq M \land y \leq M \land x = M$	M	Split(11)
13		$x \leq M \land y \leq M \land y = M$	M	Split(11)
14		$x \leq x \land y \leq x$	x	Resolve(12,1)
15		$y \leq x$	x	Resolve(14,2)
16		$\lnot (x \leq y)$	x	Resolve(15,3)
17		$x \leq y \land y \leq y$	y	Resolve(13,1)
18		$x \leq y$	y	Resolve(17,2)
19		$\textit{true}$	x? y : x	Resolve(18,16)

As a toy example, a functional program to compute the maximum $M$ of two numbers $x$ and $y$ can be derived as follows.{{citation needed|date=May 2016}}

Starting from the requirement description "The maximum is larger than or equal to any given number, and is one of the given numbers", the first-order formula $\forall X \forall Y \exists M: X \leq M \land Y \leq M \land (X=M \lor Y=M)$ is obtained as its formal translation. This formula is to be proved. By reverse Skolemization,While ordinary Skolemization preserves satisfiability, reverse Skolemization, i.e. replacing universally quantified variables by functions, preserves validity. the specification in line 10 is obtained, an upper- and lower-case letter denoting a variable and a Skolem constant, respectively.

After applying a transformation rule for the distributive law in line 11, the proof goal is a disjunction, and hence can be split into two cases, viz. lines 12 and 13.

Turning to the first case, resolving line 12 with the axiom in line 1 leads to instantiation of the program variable $M$ in line 14. Intuitively, the last conjunct of line 12 prescribes the value that $M$ must take in this case. Formally, the non-clausal resolution rule shown in line 57 above is applied to lines 12 and 1, with

{{color|#008000|{{math|p}}}} being the common instance {{color|#008000|{{math|1=x=x}}}} of {{color|#408000|{{math|1=A=A}}}} and {{color|#008040|{{math|1=x=M}}}}, obtained by syntactically unifying the latter formulas,
{{color|#800000|{{math|F[}}}}{{color|#008000|{{math|p}}}}{{color|#800000|{{math|]}}}} being {{color|#800000|{{math|true ∧ }}}}{{color|#008000|{{math|1=x=x}}}}, obtained from instantiated line 1 (appropriately padded to make the context {{color|#800000|{{math|F[⋅]}}}} around {{color|#008000|{{math|p}}}} visible), and
{{color|#000080|{{math|G[}}}}{{color|#008000|{{math|p}}}}{{color|#000080|{{math|]}}}} being {{color|#000080|{{math|x ≤ x ∧ y ≤ x ∧ }}}}{{color|#008000|{{math|1=x = x}}}}, obtained from instantiated line 12,

yielding

$\lnot ($ {{color|#800000|{{math|true ∧ }}}}{{color|#008000|{{math|false}}}}{{math|) ∧ (}}{{color|#000080|{{math|x ≤ x ∧ y ≤ x ∧ }}}}{{color|#008000|{{math|true}}}} $)$ ,

which simplifies to $x \leq x \land y \leq x$ .

In a similar way, line 14 yields line 15 and then line 16 by resolution. Also, the second case, $x \leq M \land y \leq M \land y = M$ in line 13, is handled similarly, yielding eventually line 18.

In a last step, both cases (i.e. lines 16 and 18) are joined, using the resolution rule from line 58; to make that rule applicable, the preparatory step 15→16 was needed. Intuitively, line 18 could be read as "in case $x \leq y$ , the output $y$ is valid (with respect to the original specification), while line 15 says "in case $y \leq x$ , the output $x$ is valid; the step 15→16 established that both cases 16 and 18 are complementary.Axiom 3 was needed for that; in fact, if $\leq$ wasn't a total order, no maximum could be computed for uncomparable inputs $x,y$ . Since both line 16 and 18 comes with a program term, a conditional expression results in the program column. Since the goal formula $\textit{true}$ has been derived, the proof is done, and the program column of the " $\textit{true}$ " line contains the program.

Notes

References

{{Cite journal |last1=David |first1=Cristina |last2=Kroening |first2=Daniel |date=2017-10-13 |title=Program synthesis: challenges and opportunities |journal=Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences |language=en |volume=375 |issue=2104 |pages=20150403 |doi=10.1098/rsta.2015.0403 |issn=1364-503X |pmc=5597726 |pmid=28871052|bibcode=2017RSPTA.37550403D }}
{{Cite journal |last1=Alur |first1=Rajeev |last2=Singh |first2=Rishabh |last3=Fisman |first3=Dana |last4=Solar-Lezama |first4=Armando |date=2018-11-20 |title=Search-based program synthesis |url=https://doi.org/10.1145/3208071 |journal=Communications of the ACM |volume=61 |issue=12 |pages=84–93 |doi=10.1145/3208071 |issn=0001-0782|url-access=subscription }}
{{cite journal| author=Zohar Manna, Richard Waldinger| title=Knowledge and Reasoning in Program Synthesis| journal=Artificial Intelligence| year=1975| volume=6| issue=2| pages=175–208| doi=10.1016/0004-3702(75)90008-9}}
{{cite thesis |last=Solar-Lezama |first=Armando |date=2008 |title=Program synthesis by sketching |type=Ph.D. |publisher=University of California, Berkeley |url=https://people.csail.mit.edu/asolar/papers/thesis.pdf}}

Category:Programming paradigms