compiler correctness

In computing, compiler correctness is a subfield of computer science focused on proving that a compiler's output program preserves the semantics defined by the source language's specification.{{cite journal |last=Leroy |first=Xavier |title=Formal verification of a realistic compiler |journal=Communications of the ACM |volume=52 |issue=7 |year=2009 |pages=107–115 |doi=10.1145/1538788.1538814}} Techniques include developing the compiler using formal methods and using rigorous testing (often called compiler validation) on an existing compiler.

Formal verification

Formal verification of compiler correctness involves providing a machine-checked proof that a compiler preserves the semantics of the source program through all compilation phases. Unlike testing, which only samples behavior on specific inputs, formal verification ensures semantic preservation for all valid programs.

A prominent example is CompCert, a formally verified optimizing compiler for a large subset of the C programming language. CompCert uses the Coq proof assistant to ensure that for any source program S and compiled code C, if CompCert compiles S successfully, then every observable behavior of C corresponds to an allowed behavior of S.{{cite journal |last=Leroy |first=Xavier |title=Formal verification of a realistic compiler |journal=Communications of the ACM |volume=52 |issue=7 |year=2009 |pages=107–115 |doi=10.1145/1538788.1538814}}

The compiler is structured into multiple verified passes, including transformations from Clight (a simplified C subset) to intermediate languages and then to assembly code. Each transformation stage is proven to preserve semantics using simulation or bisimulation arguments.{{cite web |title=CompCert C Verified Compiler |url=https://compcert.org/ |website=CompCert |access-date=25 June 2025}}

CompCert’s correctness relies on:

A formal operational semantics for Clight, which models undefined behaviors and the execution model of C programs.{{cite web |title=The CompCert C Verified Compiler – Reference Manual |url=https://compcert.org/man/manual001.html |website=CompCert |access-date=25 June 2025}}
Formal semantics for each internal language.
Machine-checked proofs in Coq verifying that the semantics is preserved across each pass.{{cite web |last=Leroy |first=Xavier |title=A formally verified compiler back-end |url=https://xavierleroy.org/publi/compcert-CACM.pdf |website=Inria |access-date=25 June 2025}}

Empirical evaluation has shown that CompCert is significantly more reliable than mainstream C compilers: in testing with the Csmith program generator, CompCert was the only compiler for which no wrong-code bugs were found.{{cite web |title=CompCert Performance and Safety |url=https://www.absint.com/compcert/ |website=AbsInt |access-date=25 June 2025}}

Other formal compiler projects include:

The CakeML project, which verifies a compiler for a functional programming language all the way to machine code.{{cite journal |last=Kumar |first=Ramana |title=Compiling Higher-Order Programs to Machine Code |journal=ACM SIGPLAN Notices |volume=49 |issue=9 |year=2014 |pages=1–12 |doi=10.1145/2660193.2628144}}
Œuf, a verified compiler that translates a subset of Gallina (the language of Coq) to x86 assembly.{{cite web |title=Oeuf: A small proof-producing compiler |url=https://github.com/oeuf-lang/oeuf |website=GitHub |access-date=25 June 2025}}

= Advantages =

Eliminates miscompilation: verified compilers provably do not introduce bugs absent in the source code.
Verified compilation allows properties proved about source programs (e.g., safety or termination) to remain valid after compilation.
The formalization process often leads to improved understanding of programming language semantics and compiler design.

= Challenges =

Formal verification of compilers is labor-intensive and requires deep expertise in formal methods.
Verified compilers often support only subsets of full programming languages, limiting adoption for production-level software.
Changes in source or target language semantics may require re-verification of all compiler stages.

= Relation to other formal methods =

Compiler verification is part of the broader field of formal methods, which also includes model checking, abstract interpretation, and deductive verification using tools like Coq, Isabelle, and ACL2. Verified compilers help secure the toolchains used in critical systems, where correctness is essential.

Testing

Testing represents a significant portion of the effort in shipping a compiler, but receives comparatively little coverage in the standard literature. The 1986 edition of Aho, Sethi, & Ullman has a single-page section on compiler testing, with no named examples.Compilers: Principles, Techniques and Tools, infra 1986, p. 731. The 2006 edition omits the section on testing, but does emphasize its importance: “Optimizing compilers are so difficult to get right that we dare say that no optimizing compiler is completely error-free! Thus, the most important objective in writing a compiler is that it is correct.”ibid, 2006, p. 16.

Fraser & Hanson 1995 has a brief section on regression testing; source code is available.{{cite book | author= Christopher Fraser | author2= David Hanson | title= A Retargetable C compiler: Design and Implementation | publisher= Benjamin/Cummings Publishing | year= 1995 | isbn= 978-0-8053-1670-4 | url-access= registration | url= https://archive.org/details/retargetableccom00fras }}, pp. 531–3.

Bailey & Davidson 2003 cover testing of procedure calls{{cite journal |author=Mark W. Bailey |author2=Jack W. Davidson |title=Automatic Detection and Diagnosis of Faults in Generated Code for Procedure Calls |journal=IEEE Transactions on Software Engineering |volume=29 |issue=11 |year=2003 |url=http://big-oh.cs.hamilton.edu/~bailey/pubs/techreps/TR-2001-1.pdf |access-date=2009-03-24 |doi=10.1109/tse.2003.1245304 |pages=1031–1042 |citeseerx=10.1.1.15.4829 |archive-url=https://web.archive.org/web/20030428121512/http://big-oh.cs.hamilton.edu/~bailey/pubs/techreps/TR-2001-1.pdf |archive-date=2003-04-28 |url-status=dead }}, p. 1040.

A number of articles confirm that many released compilers have significant code-correctness bugs.E.g., {{cite conference |author= Christian Lindig |title= Random testing of C calling conventions |book-title= Proceedings of the Sixth International Workshop on Automated Debugging |publisher= ACM |isbn= 1-59593-050-7 |year= 2005 |url= http://quest-tester.googlecode.com/svn/trunk/doc/lindig-aadebug-2005.pdf |access-date= 2009-03-24 |archive-url= https://web.archive.org/web/20110711112027/http://quest-tester.googlecode.com/svn/trunk/doc/lindig-aadebug-2005.pdf |archive-date= 2011-07-11 |url-status= dead }}, and

{{cite conference |author= Eric Eide |author2= John Regehr |title= Volatiles are miscompiled, and what to do about it |book-title= Proceedings of the 7th ACM international conference on Embedded software |publisher= ACM |year= 2008 |isbn= 978-1-60558-468-3 |url= http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf |access-date= 2009-03-24 }}

Sheridan 2007 is probably the most recent journal article on general compiler testing.{{cite journal | author= Flash Sheridan |year= 2007 |title= Practical Testing of a C99 Compiler Using Output Comparison |journal=Software: Practice and Experience |volume=37 |issue=14 |pages=1475–1488 |doi=10.1002/spe.812 |arxiv= 2202.07390 |s2cid= 9752084 |url=http://pobox.com/~flash/Practical_Testing_of_C99.pdf |access-date = 2009-03-24}} Bibliography at {{Cite web | url=http://pobox.com/~flash/compiler_testing_bibliography.html | title = Compiler Testing Bibliography|access-date= 2009-03-13}}. For most purposes, the largest body of information available on compiler testing are the Fortran{{Cite web | url=http://www.itl.nist.gov/div897/ctg/fortran_form.htm | title=Source of Fortran validation suite | access-date= 2011-09-01}} and Cobol{{Cite web | url=http://www.itl.nist.gov/div897/ctg/cobol_form.htm | title=Source of Cobol validation suite | access-date= 2011-09-01}} validation suites.

Further common techniques when testing compilers are fuzzing{{Cite book|last1=Chen|first1=Yang|last2=Groce|first2=Alex|last3=Zhang|first3=Chaoqiang|last4=Wong|first4=Weng-Keen|last5=Fern|first5=Xiaoli|last6=Eide|first6=Eric|last7=Regehr|first7=John|title=Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation |chapter=Taming compiler fuzzers |date=2013|series=PLDI '13|location=New York, NY, USA|publisher=ACM|pages=197–208|doi=10.1145/2491956.2462173|isbn=9781450320146|citeseerx=10.1.1.308.5541|s2cid=207205614 }} (which generates random programs to try to find bugs in a compiler) and test case reduction (which tries to minimize a found test case to make it easier to understand).{{Cite book|last1=Regehr|first1=John|last2=Chen|first2=Yang|last3=Cuoq|first3=Pascal|last4=Eide|first4=Eric|last5=Ellison|first5=Chucky|last6=Yang|first6=Xuejun|title=Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation |chapter=Test-case reduction for C compiler bugs |date=2012|series=PLDI '12|location=New York, NY, USA|publisher=ACM|pages=335–346|doi=10.1145/2254064.2254104|isbn=9781450312059|s2cid=1025409 }}

References

Category:Compiler construction

Category:Compiler theory