e-graph
{{Short description|Graph data structure}}
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let be a set of uninterpreted functions, where is the subset of consisting of functions of arity . Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of to e-class IDs is denoted and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:{{harv|Willsey|Nandi|Wang|Flatt|2021}}
- A union-find structure representing equivalence classes of e-class IDs, with the usual operations , and . An e-class ID is canonical if ; an e-node is canonical if each is canonical ( in ).
- An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map that maps e-class IDs to e-classes, such that maps equivalent IDs to the same set of e-nodes:
=Invariants=
In addition to the above structure, a valid e-graph conforms to several data structure invariants.{{harv|Willsey|Nandi|Wang|Flatt|2021}} Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes are congruent when . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
=Operations=
{{expand section|date=June 2021}}
E-graphs expose wrappers around the , , and operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
=Equivalent formulations=
An e-graph can also be formulated as a bipartite graph where
- is the set of e-class IDs (as above),
- is the set of e-nodes, and
- is a set of directed edges.
There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.{{harv|Goharshady|Lam|Parreaux|2024}}
E-matching
Let be a set of variables and let be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, is the smallest set such that , , and when and , then . A term containing variables is called a pattern, a term without variables is called ground.
An e-graph represents a ground term if one of its e-classes represents . An e-class represents if some e-node does. An e-node represents a term if and each e-class represents the term ( in ).
e-matching is an operation that takes a pattern and an e-graph , and yields all pairs where is a substitution mapping the variables in to e-class IDs and is an e-class ID such that the term is represented by . There are several known algorithms for e-matching,{{harv|de Moura|Bjørner|2007}}{{Cite journal |last1=Moskal |first1=Michał |last2=Łopuszański |first2=Jakub |last3=Kiniry |first3=Joseph R. |date=2008-05-06 |title=E-matching for Fun and Profit |journal=Electronic Notes in Theoretical Computer Science |series=Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007) |language=en |volume=198 |issue=2 |pages=19–35 |doi=10.1016/j.entcs.2008.04.078 |issn=1571-0661|doi-access=free }} the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.{{Cite journal |last1=Zhang |first1=Yihong |last2=Wang |first2=Yisu Remy |last3=Willsey |first3=Max |last4=Tatlock |first4=Zachary |date=2022-01-12 |title=Relational e-matching |journal=Proceedings of the ACM on Programming Languages |volume=6 |issue=POPL |pages=35:1–35:22 |doi=10.1145/3498696|s2cid=236924583 |doi-access=free }}
Extraction
Given an e-class and a cost function that maps each function symbol in to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard.{{Cite thesis |last=Stepp |first=Michael Benjamin |title=Equality saturation: engineering challenges and applications |date=2011 |degree=PhD |publisher=University of California at San Diego |url=https://dl.acm.org/doi/book/10.5555/2395310 |place=USA |isbn=978-1-267-03827-2}} There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.{{harv|Goharshady|Lam|Parreaux|2024}}
Complexity
- An e-graph with n equalities can be constructed in O(n log n) time.{{harv|Flatt|Coward|Willsey|Tatlock|2022|p=2}}
Equality saturation
{{expand section|date=June 2021}}
Equality saturation is a technique for building optimizing compilers using e-graphs.{{harv|Tate|Stepp|Tatlock|Lerner|2009}} It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3{{Cite book|last1=de Moura|first1=Leonardo|last2=Bjørner|first2=Nikolaj|title=Tools and Algorithms for the Construction and Analysis of Systems |chapter=Z3: An Efficient SMT Solver |date=2008|editor-last=Ramakrishnan|editor-first=C. R.|editor2-last=Rehof|editor2-first=Jakob|series=Lecture Notes in Computer Science|volume=4963|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=337–340|doi=10.1007/978-3-540-78800-3_24|isbn=978-3-540-78800-3|doi-access=free}} and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.{{Cite conference |last=Rümmer |first=Philipp |date=2012 |editor-last=Bjørner |editor-first=Nikolaj |editor2-last=Voronkov |editor2-first=Andrei |chapter=E-Matching with Free Variables |conference= 18th International Conference, LPAR-18, Merida, Venezuela, March 11–15, 2012 |title=Logic for Programming, Artificial Intelligence, and Reasoning. Proceedings | series=Lecture Notes in Computer Science |volume=7180 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=359–374 |doi=10.1007/978-3-642-28717-6_28 |isbn=978-3-642-28717-6}} In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.{{harv|Flatt|Coward|Willsey|Tatlock|2022|p=2}} E-graphs are also used in the Simplify theorem prover of ESC/Java.{{Cite journal |last1=Detlefs |first1=David |last2=Nelson |first2=Greg |last3=Saxe |first3=James B. |date=May 2005 |title=Simplify: a theorem prover for program checking |journal=Journal of the ACM |volume=52 |issue=3 |pages=365–473 |doi=10.1145/1066100.1066102 |s2cid=9613854 |issn=0004-5411}}
Equality saturation is used in specialized optimizing compilers,{{Cite journal|last1=Joshi|first1=Rajeev|last2=Nelson|first2=Greg|last3=Randall|first3=Keith|date=2002-05-17|title=Denali: a goal-directed superoptimizer|journal=ACM SIGPLAN Notices|volume=37|issue=5|pages=304–314|doi=10.1145/543552.512566|issn=0362-1340}} e.g. for deep learning{{cite arXiv|last1=Yang|first1=Yichen|last2=Phothilimtha|first2=Phitchaya Mangpo|last3=Wang|first3=Yisu Remy|last4=Willsey|first4=Max|last5=Roy|first5=Sudip|last6=Pienaar|first6=Jacques|date=2021-03-17|title=Equality Saturation for Tensor Graph Superoptimization|class=cs.AI|eprint=2101.01332}} and linear algebra.{{cite arXiv|last1=Wang|first1=Yisu Remy|last2=Hutchison|first2=Shana|last3=Leang|first3=Jonathan|last4=Howe|first4=Bill|last5=Suciu|first5=Dan|date=2020-12-22|title=SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra|class=cs.DB|eprint=2002.07951}} Equality saturation has also been used for translation validation applied to the LLVM toolchain.{{Cite book|last1=Stepp|first1=Michael|last2=Tate|first2=Ross|last3=Lerner|first3=Sorin|title=Computer Aided Verification |chapter=Equality-Based Translation Validator for LLVM |date=2011|editor-last=Gopalakrishnan|editor-first=Ganesh|editor2-last=Qadeer|editor2-first=Shaz|series=Lecture Notes in Computer Science|volume=6806|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=737–742|doi=10.1007/978-3-642-22110-1_59|isbn=978-3-642-22110-1|doi-access=free}}
E-graphs have been applied to several problems in program analysis, including fuzzing,{{Cite web |title=Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022 |url=https://pldi22.sigplan.org/details/egraphs-2022-papers/3/Wasm-mutate-Fuzzing-WebAssembly-Compilers-with-E-Graphs |access-date=2023-02-03 |website=pldi22.sigplan.org}} abstract interpretation,{{Cite arXiv |last1=Coward |first1=Samuel |last2=Constantinides |first2=George A. |last3=Drane |first3=Theo |date=2022-03-17 |title=Abstract Interpretation on E-Graphs |class=cs.LO |eprint=2203.09191 }}
{{Cite arXiv |last1=Coward |first1=Samuel |last2=Constantinides |first2=George A. |last3=Drane |first3=Theo |date=2022-05-30 |title=Combining E-Graphs with Abstract Interpretation |class=cs.DS |eprint=2205.14989 }} and library learning.{{Cite journal |last1=Cao |first1=David |last2=Kunkel |first2=Rose |last3=Nandi |first3=Chandrakana |last4=Willsey |first4=Max |last5=Tatlock |first5=Zachary |last6=Polikarpova |first6=Nadia |date=2023-01-09 |title=babble: Learning Better Abstractions with E-Graphs and Anti-Unification |journal=Proceedings of the ACM on Programming Languages |volume=7 |issue=POPL |pages=396–424 |doi=10.1145/3571207 |arxiv=2212.04596 |s2cid=254536022 |issn=2475-1421}}
References
{{reflist}}
- {{Cite book|last1=de Moura|first1=Leonardo|last2=Bjørner|first2=Nikolaj|title=Automated Deduction – CADE-21 |chapter=Efficient E-Matching for SMT Solvers |date=2007|editor-last=Pfenning|editor-first=Frank|chapter-url=https://link.springer.com/chapter/10.1007/978-3-540-73595-3_13|series=Lecture Notes in Computer Science|volume=4603|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=183–198|doi=10.1007/978-3-540-73595-3_13|isbn=978-3-540-73595-3}}
- {{Cite journal|last1=Willsey|first1=Max|last2=Nandi|first2=Chandrakana|last3=Wang|first3=Yisu Remy|last4=Flatt|first4=Oliver|last5=Tatlock|first5=Zachary|last6=Panchekha|first6=Pavel|date=2021-01-04|title=egg: Fast and extensible equality saturation|url=https://doi.org/10.1145/3434304|journal=Proceedings of the ACM on Programming Languages|volume=5|issue=POPL|pages=23:1–23:29|doi=10.1145/3434304|arxiv=2004.03082|s2cid=226282597}}
- {{Cite book|last1=Tate|first1=Ross|last2=Stepp|first2=Michael|last3=Tatlock|first3=Zachary|last4=Lerner|first4=Sorin|title=Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages |chapter=Equality saturation |date=2009-01-21|chapter-url=https://doi.org/10.1145/1480881.1480915|series=POPL '09|location=Savannah, GA, USA|publisher=Association for Computing Machinery|pages=264–276|doi=10.1145/1480881.1480915|isbn=978-1-60558-379-2|s2cid=2138086}}
- {{Cite book |last1=Flatt |first1=Oliver |last2=Coward |first2=Samuel |last3=Willsey |first3=Max |last4=Tatlock |first4=Zachary |last5=Panchekha |first5=Pavel |date=October 2022 |chapter=Small Proofs from Congruence Closure |editor1=A. Griggio |editor2=N. Rungta |title=Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022 |chapter-url=https://repositum.tuwien.at/handle/20.500.12708/81325 |language=en |publisher=TU Wien Academic Press |pages=75–83 |doi=10.34727/2022/isbn.978-3-85448-053-2_13 |isbn=978-3-85448-053-2|s2cid=252118847 }}
- {{Cite journal |last1=Goharshady |first1=Amir Kafshdar |last2=Lam |first2=Chun Kit |last3=Parreaux |first3=Lionel |date=2024-10-08 |title=Fast and Optimal Extraction for Sparse Equality Graphs |url=https://dl.acm.org/doi/abs/10.1145/3689801 |journal=Proceedings of the ACM on Programming Languages |volume=8 |issue=OOPSLA2 |pages=361:2551–361:2577 |doi=10.1145/3689801}}
External links
- [https://egraphs-good.github.io/ The Egg Project]
- [https://colab.research.google.com/drive/1tNOQijJqe5tw-Pk9iqd6HHb2abC5aRid?usp=sharing A Colab notebook explaining e-graphs]
{{Program analysis}}