datalog

{{Infobox programming language

| paradigm = Logic, Declarative

| family = Prolog

| year = {{Start date and age|1977}}

| designers =

| latest release date =

| typing = Weak

| implementations =

| dialects = Datomic, .QL, Soufflé, XTDB, etc.

| influenced by = Prolog

| influenced = SQL

| website =

| wikibooks =

}}

{{Infobox file format

| extension = .dl

| mime = [https://www.iana.org/assignments/media-types/application/vnd.datalog text/vnd.datalog]

| url = {{URL|https://datalog-specs.info}}

}}

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

Example

A Datalog program consists of facts, which are statements that are held to be true, and rules, which say how to deduce new facts from known facts. For example, here are two facts that mean xerces is a parent of brooke and brooke is a parent of damocles:

parent(xerces, brooke).

parent(brooke, damocles).

The names are written in lowercase because strings beginning with an uppercase letter stand for variables. Here are two rules:

ancestor(X, Y) :- parent(X, Y).

ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

The :- symbol is read as "if", and the comma is read "and", so these rules mean:

X is an ancestor of Y if X is a parent of Y.
X is an ancestor of Y if X is a parent of some Z, and Z is an ancestor of Y.

The meaning of a program is defined to be the set of all of the facts that can be deduced using the initial facts and the rules. This program's meaning is given by the following facts:

parent(xerces, brooke).

parent(brooke, damocles).

ancestor(xerces, brooke).

ancestor(brooke, damocles).

ancestor(xerces, damocles).

Some Datalog implementations don't deduce all possible facts, but instead answer queries:

?- ancestor(xerces, X).

This query asks: Who are all the X that xerces is an ancestor of? For this example, it would return brooke and damocles.

Comparison to relational databases

The non-recursive subset of Datalog is closely related to query languages for relational databases, such as SQL. The following table maps between Datalog, relational algebra, and SQL concepts:

Datalog	Relational algebra	SQL
class="wikitable"
Relation	Relation	Table
Fact	Tuple	Row
Rule	n/a	Materialized view
Query	Select	Query

More formally, non-recursive Datalog corresponds precisely to unions of conjunctive queries, or equivalently, negation-free relational algebra.

{{Collapse top|title=Schematic translation from non-recursive Datalog into SQL}}

s(x, y).

t(y).

r(A, B) :- s(A, B), t(B).

CREATE TABLE s (

z0 TEXT NONNULL,

z1 TEXT NONNULL,

PRIMARY KEY (z0, z1)

);

CREATE TABLE t (

z0 TEXT NONNULL PRIMARY KEY

);

INSERT INTO s VALUES ('x', 'y');

INSERT INTO t VALUES ('y');

CREATE VIEW r AS

SELECT s.z0, s.z1

FROM s, t

WHERE s.z1 = t.z0;

Syntax

A Datalog program consists of a list of rules (Horn clauses).{{sfn|Ceri|Gottlob|Tanca|1989|p=146}} If constant and variable are two countable sets of constants and variables respectively and relation is a countable set of predicate symbols, then the following BNF grammar expresses the structure of a Datalog program:

::= | ""

::= ":-" "."

::= "(" ")"

::= | "," | ""

::= |

::= | "," | ""

Atoms are also referred to as {{dfni|literals}}. The atom to the left of the :- symbol is called the {{dfni|head}} of the rule; the atoms to the right are the {{dfni|body}}. Every Datalog program must satisfy the condition that every variable that appears in the head of a rule also appears in the body (this condition is sometimes called the {{dfni|range restriction}}).{{sfn|Ceri|Gottlob|Tanca|1989|p=146}}{{Cite book |last1=Eisner |first1=Jason |last2=Filardo |first2=Nathaniel W. |title=Datalog Reloaded |chapter=Dyna: Extending Datalog for Modern AI |date=2011 |editor-last=de Moor |editor-first=Oege |editor2-last=Gottlob |editor2-first=Georg |editor3-last=Furche |editor3-first=Tim |editor4-last=Sellers |editor4-first=Andrew |chapter-url=https://link.springer.com/chapter/10.1007/978-3-642-24206-9_11 |series=Lecture Notes in Computer Science |volume=6702 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=181–220 |doi=10.1007/978-3-642-24206-9_11 |isbn=978-3-642-24206-9}}

There are two common conventions for variable names: capitalizing variables, or prefixing them with a question mark ?.{{Citation |last1=Maier |first1=David |title=Datalog: concepts, history, and outlook |date=2018-09-01 |url=https://doi.org/10.1145/3191315.3191317 |work=Declarative Logic Programming: Theory, Systems, and Applications |volume=20 |pages=3–100 |publisher=Association for Computing Machinery and Morgan & Claypool |doi=10.1145/3191315.3191317 |isbn=978-1-970001-99-0 |access-date=2023-03-02 |last2=Tekle |first2=K. Tuncay |last3=Kifer |first3=Michael |last4=Warren |first4=David S.|s2cid=69379310 }}

Note that under this definition, Datalog does {{em|not}} include negation nor aggregates; see {{section link||Extensions}} for more information about those constructs.

Rules with empty bodies are called {{dfni|facts}}. For example, the following rule is a fact:

r(x) :- .

The set of facts is called the {{dfni|extensional database}} or {{dfni|EDB}} of the Datalog program. The set of tuples computed by evaluating the Datalog program is called the {{dfni|intensional database}} or {{dfni|IDB}}.

= Syntactic sugar =

Many implementations of logic programming extend the above grammar to allow writing facts without the :-, like so:

r(x).

Some also allow writing 0-ary relations without parentheses, like so:

p :- q.

These are merely abbreviations (syntactic sugar); they have no impact on the semantics of the program.

Semantics

{{Sidebar

| title = Herbrand universe, base, and model of a Datalog program

| style = width:20em;

| contentstyle = text-align: left;

| content1 =

Program:

edge(x, y).

edge(y, z).

path(A, B) :-

edge(A, B).

path(A, C) :-

path(A, B),

edge(B, C).

| content2 = Herbrand universe: x, y, z

| content3 = Herbrand base: edge(x, x), edge(x, y), ..., edge(z, z), path(x, x), ..., path(z, z)

| content4 = Herbrand model: edge(x, y), edge(y, z), path(x, y), path(y, z), path(x, z)

| belowstyle = text-align: left

}}

There are three widely-used approaches to the semantics of Datalog programs: model-theoretic, fixed-point, and proof-theoretic. These three approaches can be proven equivalent.{{Cite journal |last1=Van Emden |first1=M. H. |last2=Kowalski |first2=R. A. |date=1976-10-01 |title=The Semantics of Predicate Logic as a Programming Language |journal=Journal of the ACM |volume=23 |issue=4 |pages=733–742 |doi=10.1145/321978.321991 |s2cid=11048276 |issn=0004-5411|doi-access=free }}

An atom is called {{dfni|ground}} if none of its subterms are variables. Intuitively, each of the semantics define the meaning of a program to be the set of all ground atoms that can be deduced from the rules of the program, starting from the facts.

= Model theoretic =

A rule is called ground if all of its atoms (head and body) are ground. A ground rule R₁ is a ground instance of another rule R₂ if R₁ is the result of a substitution of constants for all the variables in R₂. The Herbrand base of a Datalog program is the set of all ground atoms that can be made with the constants appearing in the program. The {{dfni|Herbrand model}} of a Datalog program is the smallest subset of the Herbrand base such that, for each ground instance of each rule in the program, if the atoms in the body of the rule are in the set, then so is the head.{{sfn|Ceri|Gottlob|Tanca|1989|p=149}} The model-theoretic semantics define the minimal Herbrand model to be the meaning of the program.

= Fixed-point =

Let {{mvar|I}} be the power set of the Herbrand base of a program P. The immediate consequence operator for P is a map {{mvar|T}} from {{mvar|I}} to {{mvar|I}} that adds all of the new ground atoms that can be derived from the rules of the program in a single step. The least-fixed-point semantics define the least fixed point of {{mvar|T}} to be the meaning of the program; this coincides with the minimal Herbrand model.{{sfn|Ceri|Gottlob|Tanca|1989|p=150}}

The fixpoint semantics suggest an algorithm for computing the minimal model: Start with the set of ground facts in the program, then repeatedly add consequences of the rules until a fixpoint is reached. This algorithm is called naïve evaluation.

= Proof-theoretic =

[[Image:Proof tree for Datalog transitive closure computation.svg|thumb|Proof tree showing the derivation of the ground atom path(x, z) from the program

edge(x, y).

edge(y, z).

path(A, B) :-

edge(A, B).

path(A, C) :-

path(A, B),

edge(B, C).

]]

The proof-theoretic semantics defines the meaning of a Datalog program to be the set of facts with corresponding proof trees. Intuitively, a proof tree shows how to derive a fact from the facts and rules of a program.

One might be interested in knowing whether or not a particular ground atom appears in the minimal Herbrand model of a Datalog program, perhaps without caring much about the rest of the model. A top-down reading of the proof trees described above suggests an algorithm for computing the results of such queries. This reading informs the SLD resolution algorithm, which forms the basis for the evaluation of Prolog.

Evaluation

There are many different ways to evaluate a Datalog program, with different performance characteristics.

= Bottom-up evaluation strategies =

Bottom-up evaluation strategies start with the facts in the program and repeatedly apply the rules until either some goal or query is established, or until the complete minimal model of the program is produced.

== Naïve evaluation ==

Naïve evaluation mirrors the fixpoint semantics for Datalog programs. Naïve evaluation uses a set of "known facts", which is initialized to the facts in the program. It proceeds by repeatedly enumerating all ground instances of each rule in the program. If each atom in the body of the ground instance is in the set of known facts, then the head atom is added to the set of known facts. This process is repeated until a fixed point is reached, and no more facts may be deduced. Naïve evaluation produces the entire minimal model of the program.{{sfn|Ceri|Gottlob|Tanca|1989|p=154}}

== Semi-naïve evaluation ==

Semi-naïve evaluation is a bottom-up evaluation strategy that can be asymptotically faster than naïve evaluation.{{Cite book |last1=Alvarez-Picallo |first1=Mario |last2=Eyers-Taylor |first2=Alex |last3=Peyton Jones |first3=Michael |last4=Ong |first4=C.-H. Luke |title=Programming Languages and Systems |chapter=Fixing Incremental Computation: Derivatives of Fixpoints, and the Recursive Semantics of Datalog |date=2019 |editor-last=Caires |editor-first=Luís |chapter-url=https://link.springer.com/chapter/10.1007/978-3-030-17184-1_19 |series=Lecture Notes in Computer Science |volume=11423 |language=en |location=Cham |publisher=Springer International Publishing |pages=525–552 |doi=10.1007/978-3-030-17184-1_19 |isbn=978-3-030-17184-1|s2cid=53430789 }}

== Performance considerations ==

Image:Theta supercomputer - 389 071 002 (36954713450).jpg.{{Cite arXiv |last1=Gilray |first1=Thomas |last2=Sahebolamri |first2=Arash |last3=Kumar |first3=Sidharth |last4=Micinski |first4=Kristopher |date=2022-11-21 |title=Higher-Order, Data-Parallel Structured Deduction |class=cs.PL |eprint=2211.11573 }}]]

Naïve and semi-naïve evaluation both evaluate recursive Datalog rules by repeatedly applying them to a set of known facts until a fixed point is reached. In each iteration, rules are only run for "one step", i.e., non-recursively. As mentioned above, each non-recursive Datalog rule corresponds precisely to a conjunctive query. Therefore, many of the techniques from database theory used to speed up conjunctive queries are applicable to bottom-up evaluation of Datalog, such as

Index selection{{Cite journal |last1=Subotić |first1=Pavle |last2=Jordan |first2=Herbert |last3=Chang |first3=Lijun |last4=Fekete |first4=Alan |last5=Scholz |first5=Bernhard |date=2018-10-01 |title=Automatic index selection for large-scale datalog computation |url=https://doi.org/10.14778/3282495.3282500 |journal=Proceedings of the VLDB Endowment |volume=12 |issue=2 |pages=141–153 |doi=10.14778/3282495.3282500 |s2cid=53569679 |issn=2150-8097}}
Query optimization, especially join order{{Cite book |last1=Antoniadis |first1=Tony |last2=Triantafyllou |first2=Konstantinos |last3=Smaragdakis |first3=Yannis |title=Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis |chapter=Porting doop to Soufflé |date=2017-06-18 |chapter-url=https://doi.org/10.1145/3088515.3088522 |series=SOAP 2017 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=25–30 |doi=10.1145/3088515.3088522 |isbn=978-1-4503-5072-3|s2cid=3074689 }} "The LogicBlox engine performs full query optimization."{{Cite book |last1=Arch |first1=Samuel |last2=Hu |first2=Xiaowen |last3=Zhao |first3=David |last4=Subotić |first4=Pavle |last5=Scholz |first5=Bernhard |title=Logic-Based Program Synthesis and Transformation |chapter=Building a Join Optimizer for Soufflé |date=2022 |editor-last=Villanueva |editor-first=Alicia |chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-16767-6_5 |series=Lecture Notes in Computer Science |volume=13474 |language=en |location=Cham |publisher=Springer International Publishing |pages=83–102 |doi=10.1007/978-3-031-16767-6_5 |isbn=978-3-031-16767-6}}
Join algorithms
Selection of data structures used to store relations; common choices include hash tables and B-trees, other possibilities include disjoint set data structures (for storing equivalence relations),{{Cite book |chapter-url=https://ieeexplore.ieee.org/document/8891656 |access-date=2023-11-28 |doi=10.1109/PACT.2019.00015 |s2cid=204827819 |chapter=Fast Parallel Equivalence Relations in a Datalog Compiler |title=2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT) |date=2019 |last1=Nappa |first1=Patrick |last2=Zhao |first2=David |last3=Subotic |first3=Pavle |last4=Scholz |first4=Bernhard |pages=82–96 |isbn=978-1-7281-3613-4 }} bries (a variant of tries),{{Cite book |last1=Jordan |first1=Herbert |last2=Subotić |first2=Pavle |last3=Zhao |first3=David |last4=Scholz |first4=Bernhard |chapter=Brie: A Specialized Trie for Concurrent Datalog |date=2019-02-17 |title=Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores |chapter-url=https://doi.org/10.1145/3303084.3309490 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=31–40 |doi=10.1145/3303084.3309490 |isbn=978-1-4503-6290-0|s2cid=239258588 }} binary decision diagrams,{{Cite book |last1=Whaley |first1=John |last2=Avots |first2=Dzintars |last3=Carbin |first3=Michael |last4=Lam |first4=Monica S. |chapter=Using Datalog with Binary Decision Diagrams for Program Analysis |date=2005 |editor-last=Yi |editor-first=Kwangkeun |title=Programming Languages and Systems |chapter-url=https://link.springer.com/chapter/10.1007/11575467_8 |series=Lecture Notes in Computer Science |volume=3780 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=97–118 |doi=10.1007/11575467_8 |isbn=978-3-540-32247-4|s2cid=5223577 }} and even SMT formulas{{Cite book |last1=Hoder |first1=Kryštof |last2=Bjørner |first2=Nikolaj |last3=de Moura |first3=Leonardo |chapter=μZ– an Efficient Engine for Fixed Points with Constraints |date=2011 |editor-last=Gopalakrishnan |editor-first=Ganesh |editor2-last=Qadeer |editor2-first=Shaz |title=Computer Aided Verification |chapter-url=https://link.springer.com/chapter/10.1007/978-3-642-22110-1_36 |series=Lecture Notes in Computer Science |volume=6806 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=457–462 |doi=10.1007/978-3-642-22110-1_36 |isbn=978-3-642-22110-1}}

Many such techniques are implemented in modern bottom-up Datalog engines such as Soufflé. Some Datalog engines integrate SQL databases directly.{{Cite arXiv |last1=Fan |first1=Zhiwei |last2=Zhu |first2=Jianqiao |last3=Zhang |first3=Zuyu |last4=Albarghouthi |first4=Aws |last5=Koutris |first5=Paraschos |last6=Patel |first6=Jignesh |date=2018-12-10 |title=Scaling-Up In-Memory Datalog Processing: Observations and Techniques |class=cs.DB |eprint=1812.03975 }}

Bottom-up evaluation of Datalog is also amenable to parallelization. Parallel Datalog engines are generally divided into two paradigms:

In the shared-memory, multi-core setting, Datalog engines execute on a single node. Coordination between threads may be achieved using locking or lock-free data structures. The shared-memory setting may be further divided into single instruction, multiple data and multiple instruction, multiple data paradigms:
Datalog engines that execute on graphics processing units fall into the SIMD paradigm.{{Cite book |last1=Shovon |first1=Ahmedur Rahman |last2=Dyken |first2=Landon Richard |last3=Green |first3=Oded |last4=Gilray |first4=Thomas |last5=Kumar |first5=Sidharth |chapter=Accelerating Datalog applications with cuDF |date=November 2022 |title=2022 IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms (IA3) |chapter-url=https://ieeexplore.ieee.org/document/10027548 |publisher=IEEE |pages=41–45 |doi=10.1109/IA356718.2022.00012 |isbn=978-1-6654-7506-8|s2cid=256565728 }}
Datalog engines using OpenMP{{Cite book |last1=Jordan |first1=Herbert |last2=Subotić |first2=Pavle |last3=Zhao |first3=David |last4=Scholz |first4=Bernhard |title=Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming |chapter=A specialized B-tree for concurrent datalog evaluation |date=2019-02-16 |chapter-url=https://doi.org/10.1145/3293883.3295719 |series=PPoPP '19 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=327–339 |doi=10.1145/3293883.3295719 |isbn=978-1-4503-6225-2|s2cid=59617209 }} are instances of the MIMD paradigm.
In the shared-nothing setting, Datalog engines execute on a cluster of nodes. Such engines generally operate by splitting relations into disjoint subsets based on a hash function, performing computations (joins) on each node, and then exchanging newly-generated tuples over the network.{{Cite book |last1=Wu |first1=Jiacheng |last2=Wang |first2=Jin |last3=Zaniolo |first3=Carlo |title=Proceedings of the 2022 International Conference on Management of Data |chapter=Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines |date=2022-06-11 |chapter-url=https://doi.org/10.1145/3514221.3517853 |series=SIGMOD '22 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1433–1446 |doi=10.1145/3514221.3517853 |isbn=978-1-4503-9249-5|s2cid=249578825 }} "These approaches implement the idea of parallel bottom-up evaluation by splitting the tables into disjoint partitions via discriminating functions, such as hashing, where each partition is then mapped to one of the parallel workers. After each iteration, workers coordinate with each other to exchange newly generated tuples where necessary. Examples include Datalog engines based on MPI, Hadoop,{{Cite book |last1=Shaw |first1=Marianne |last2=Koutris |first2=Paraschos |last3=Howe |first3=Bill |last4=Suciu |first4=Dan |title=Datalog in Academia and Industry |chapter=Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop |date=2012 |editor-last=Barceló |editor-first=Pablo |editor2-last=Pichler |editor2-first=Reinhard |chapter-url=https://link.springer.com/chapter/10.1007/978-3-642-32925-8_17 |series=Lecture Notes in Computer Science |volume=7494 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=165–176 |doi=10.1007/978-3-642-32925-8_17 |isbn=978-3-642-32925-8}} and Spark.{{Cite book |last1=Shkapsky |first1=Alexander |last2=Yang |first2=Mohan |last3=Interlandi |first3=Matteo |last4=Chiu |first4=Hsuan |last5=Condie |first5=Tyson |last6=Zaniolo |first6=Carlo |title=Proceedings of the 2016 International Conference on Management of Data |chapter=Big Data Analytics with Datalog Queries on Spark |date=2016-06-14 |chapter-url=https://doi.org/10.1145/2882903.2915229 |series=SIGMOD '16 |volume=2016 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1135–1149 |doi=10.1145/2882903.2915229 |isbn=978-1-4503-3531-7 |pmc=5470845 |pmid=28626296}}

= Top-down evaluation strategies =

SLD resolution is sound and complete for Datalog programs.

= Magic sets =

Top-down evaluation strategies begin with a query or goal. Bottom-up evaluation strategies can answer queries by computing the entire minimal model and matching the query against it, but this can be inefficient if the answer only depends on a small subset of the entire model. The magic sets algorithm takes a Datalog program and a query, and produces a more efficient program that computes the same answer to the query while still using bottom-up evaluation.{{Cite journal |last1=Balbin |first1=I. |last2=Port |first2=G. S. |last3=Ramamohanarao |first3=K. |last4=Meenakshi |first4=K. |date=1991-10-01 |title=Efficient bottom-up computation of queries on stratified databases |journal=The Journal of Logic Programming |language=en |volume=11 |issue=3 |pages=295–344 |doi=10.1016/0743-1066(91)90030-S |issn=0743-1066|doi-access=free }} A variant of the magic sets algorithm has been shown to produce programs that, when evaluated using semi-naïve evaluation, are as efficient as top-down evaluation.{{Cite book |last=Ullman |first=J. D. |title=Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '89 |chapter=Bottom-up beats top-down for datalog |date=1989-03-29 |chapter-url=https://doi.org/10.1145/73721.73736 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=140–149 |doi=10.1145/73721.73736 |isbn=978-0-89791-308-9|s2cid=13269547 }}

Complexity

The decision problem formulation of Datalog evaluation is as follows: Given a Datalog program {{mvar|P}} split into a set of facts (EDB) {{mvar|E}} and a set of rules {{mvar|R}}, and a ground atom {{mvar|A}}, is {{mvar|A}} in the minimal model of {{mvar|P}}? In this formulation, there are three variations of the computational complexity of evaluating Datalog programs:{{Cite journal |last1=Dantsin |first1=Evgeny |last2=Eiter |first2=Thomas |last3=Gottlob |first3=Georg |last4=Voronkov |first4=Andrei |date=2001-09-01 |title=Complexity and expressive power of logic programming |url=https://doi.org/10.1145/502807.502810 |journal=ACM Computing Surveys |volume=33 |issue=3 |pages=374–425 |doi=10.1145/502807.502810 |issn=0360-0300}}

The {{dfni|data complexity}} is the complexity of the decision problem when {{mvar|A}} and {{mvar|E}} are inputs and {{mvar|R}} is fixed.
The {{dfni|program complexity}} is the complexity of the decision problem when {{mvar|A}} and {{mvar|R}} are inputs and {{mvar|E}} is fixed.
The {{dfni|combined complexity}} is the complexity of the decision problem when {{mvar|A}}, {{mvar|E}}, and {{mvar|R}} are inputs.

With respect to data complexity, the decision problem for Datalog is P-complete (See Theorem 4.4 in ). P-completeness for data complexity means that there exists a fixed datalog query for which evaluation is P-complete. The proof is based on Datalog metainterpreter for propositional logic programs.

With respect to program complexity, the decision problem is EXPTIME-complete. In particular, evaluating Datalog programs always terminates; Datalog is not Turing-complete.

Some extensions to Datalog do not preserve these complexity bounds. Extensions implemented in some Datalog engines, such as algebraic data types, can even make the resulting language Turing-complete.

Extensions

Several extensions have been made to Datalog, e.g., to support negation, aggregate functions, inequalities, to allow object-oriented programming, or to allow disjunctions as heads of clauses. These extensions have significant impacts on the language's semantics and on the implementation of a corresponding interpreter.

Datalog is a syntactic subset of Prolog, disjunctive Datalog, answer set programming, DatalogZ, and constraint logic programming. When evaluated as an answer set program, a Datalog program yields a single answer set, which is exactly its minimal model.{{Cite journal |last1=Bembenek |first1=Aaron |last2=Greenberg |first2=Michael |last3=Chong |first3=Stephen |date=2023-01-11 |title=From SMT to ASP: Solver-Based Approaches to Solving Datalog Synthesis-as-Rule-Selection Problems |journal=Proceedings of the ACM on Programming Languages |volume=7 |issue=POPL |pages=7:185–7:217 |doi=10.1145/3571200|s2cid=253525805 |doi-access=free }}

Many implementations of Datalog extend Datalog with additional features; see {{section link||Datalog engines}} for more information.

= Aggregation =

Datalog can be extended to support aggregate functions.{{Cite journal |last1=Zaniolo |first1=Carlo |last2=Yang |first2=Mohan |last3=Das |first3=Ariyam |last4=Shkapsky |first4=Alexander |last5=Condie |first5=Tyson |last6=Interlandi |first6=Matteo |date=September 2017 |title=Fixpoint semantics and optimization of recursive Datalog programs with aggregates* |url=https://www.cambridge.org/core/journals/theory-and-practice-of-logic-programming/article/abs/fixpoint-semantics-and-optimization-of-recursive-datalog-programs-with-aggregates/605FE14CADEA2567C9EDBB78BBD1E0A2 |journal=Theory and Practice of Logic Programming |language=en |volume=17 |issue=5–6 |pages=1048–1065 |doi=10.1017/S1471068417000436 |arxiv=1707.05681 |s2cid=6272867 |issn=1471-0684}}

Notable Datalog engines that implement aggregation include:

LogicBlox{{Cite web |title=Chapter 7. Rules - LogicBlox 3.10 Reference Manual |url=https://developer.logicblox.com/content/docs/core-reference/webhelp/rules.html#rules-aggregation |access-date=2023-03-04 |website=developer.logicblox.com}}
Soufflé

= Negation =

{{further|Syntax and semantics of logic programming#Extending Datalog with negation}}

Adding negation to Datalog complicates its semantics, leading to whole new languages and strategies for evaluation. For example, the language that results from adding negation with the stable model semantics is exactly answer set programming.

Stratified negation can be added to Datalog while retaining its model-theoretic and fixed-point semantics. Notable Datalog engines that implement stratified negation include:

LogicBlox{{Cite web |title=6.4. Negation - LogicBlox 3.10 Reference Manual |url=https://developer.logicblox.com/content/docs/core-reference/webhelp/formula-negation.html |access-date=2023-03-04 |website=developer.logicblox.com}} "Additionally, negation is only allowed when the platform can determine a way to stratify all rules and constraints that use negation."
Soufflé

Comparison to Prolog

Unlike in Prolog, statements of a Datalog program can be stated in any order. Datalog does not have Prolog's cut operator. This makes Datalog a fully declarative language.

In contrast to Prolog, Datalog

disallows complex terms as arguments of predicates, e.g., p(x, y) is admissible but not p(f(x), y),
disallows negation,
requires that every variable that appears in the head of a clause also appear in a literal in the body of the clause.

This article deals primarily with Datalog without negation (see also {{section link|Syntax and semantics of logic programming|Extending Datalog with negation}}). However, stratified negation is a common addition to Datalog; the following list contrasts Prolog with Datalog with stratified negation. Datalog with stratified negation

also disallows complex terms as arguments of predicates,
requires that every variable that appears in the head of a clause also appear in a positive (i.e., not negated) atom in the body of the clause,
requires that every variable appearing in a negative literal in the body of a clause also appear in some positive literal in the body of the clause.{{cite web |author1=Michael Lam |author2=Dr. Sin Min Lee |title=Datalog |url=http://www.cs.sjsu.edu/faculty/lee/cs157/24SpDatalog.ppt |website=Course CS 157A |publisher=SAN JOSÉ STATE UNIVERSITY, department of Computer Science |archive-url=https://web.archive.org/web/20170325035511/http://www.cs.sjsu.edu/faculty/lee/cs157/24SpDatalog.ppt |archive-date=2017-03-25}}{{Unreliable source?|date=March 2023}}

Expressiveness

Datalog generalizes many other query languages. For instance, conjunctive queries and union of conjunctive queries can be expressed in Datalog. Datalog can also express regular path queries.

When we consider ordered databases, i.e., databases with an order relation on their active domain, then the Immerman–Vardi theorem implies that the expressive power of Datalog is precisely that of the class PTIME: a property can be expressed in Datalog if and only if it is computable in polynomial time.{{Cite book |last1=Kolaitis |first1=Phokion G. |last2=Vardi |first2=Moshe Y. |chapter=On the expressive power of datalog: Tools and a case study |date=1990-04-02 |title=Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems |chapter-url=https://dl.acm.org/doi/10.1145/298514.298542 |language=en |publisher=ACM |pages=61–71 |doi=10.1145/298514.298542 |isbn=978-0-89791-352-2|journal=Journal of Computer and System Sciences}}

The {{dfni|boundedness problem}} for Datalog asks, given a Datalog program, whether it is {{dfni|bounded}}, i.e., the maximal recursion depth reached when evaluating the program on an input database can be bounded by some constant. In other words, this question asks whether the Datalog program could be rewritten as a nonrecursive Datalog program, or, equivalently, as a union of conjunctive queries. Solving the boundedness problem on arbitrary Datalog programs is undecidable,{{Cite journal|last1=Hillebrand|first1=Gerd G|last2=Kanellakis|first2=Paris C|last3=Mairson|first3=Harry G|last4=Vardi|first4=Moshe Y|date=1995-11-01|title=Undecidable boundedness problems for datalog programs|journal=The Journal of Logic Programming|language=en|volume=25|issue=2|pages=163–190|doi=10.1016/0743-1066(95)00051-K|issn=0743-1066|doi-access=free}} but it can be made decidable by restricting to some fragments of Datalog.

Datalog engines

Systems that implement languages inspired by Datalog, whether compilers, interpreters, libraries, or embedded DSLs, are referred to as {{dfni|Datalog engines}}. Datalog engines often implement extensions of Datalog, extending it with additional data types, foreign function interfaces, or support for user-defined lattices. Such extensions may allow for writing non-terminating or otherwise ill-defined programs.{{citation needed|date=October 2023}}

Here is a short list of systems that are either based on Datalog or provide a Datalog interpreter:

=Free software/open source=

class="wikitable sortable" \|+ List of Datalog engines that are free software and/or open source ! Name ! Year of latest release ! Written in ! Licence ! Data sources ! Description ! Links
scope="row" \| AbcDatalog \| 2023 \| Java \| {{BSD-lic}} \| \| Datalog engine that implements common evaluation algorithms; designed for extensibility, research use, and education \| [https://abcdatalog.seas.harvard.edu/ Homepage]
scope="row" \| Ascent \| 2023 \| Rust \| {{free\|MIT License}} \| \| A logic programming language (similar to Datalog) embedded in Rust via macros, supporting a Lattice and customized datastructure. \| [https://github.com/s-arash/ascent/ Repository]
scope="row" \| bddbddb \| 2007 \| Java \| {{LGPL-lic}} \| \| Datalog implementation designed to query Java bytecode including points-to analysis on large Java programs; using BDDs internally. \| [http://bddbddb.sourceforge.net/ Homepage]
scope="row" \| Bloom (Bud) \| 2017 \| Ruby \| {{BSD-lic}} 3-Clause \| \| Ruby DSL for programming with data-centric constructs, based on the [https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html Dedalus] extension of Datalog which adds a temporal dimension to the logic \| [http://bloom-lang.net/ Homepage] [https://github.com/bloom-lang/bud Repository]
scope="row" \| Cascalog \| 2014 \| Clojure \| {{free\|Apache 2.0}} \| can query other DBMS \| Data processing and querying library for Clojure and Java, designed to be used on Hadoop \| [https://github.com/nathanmarz/cascalog Repository] [http://web.archive.org/web/20161220171421/http://cascalog.org/ Homepage (archived)]
scope="row" \| Clingo \| 2024 \| C++ \| {{free\|MIT License}} \| \| Answer Set Programming system that supports Datalog as a special case; its standalone grounder gringo suffices for plain Datalog \| [https://potassco.org/clingo/ Homepage] [https://github.com/potassco/clingo Repository] [https://potassco.org/clingo/run/ Online demo]
scope="row" \| ConceptBase \| 2024 \| various \| {{BSD-lic}} 2-Clause \| \| deductive and object-oriented database system for conceptual modeling and metamodeling, which includes a Datalog query evaluator \| [https://conceptbase.sourceforge.net/ Homepage]
scope="row" \| Coral \| 1997 \| C++ \| {{proprietary\|proprietary, free for some uses, open source}} \| \| A deductive database system written in C++ with semi-naïve datalog evaluation. Developed 1988-1997. \| [http://research.cs.wisc.edu/coral/ Homepage]
scope="row" \| Crepe \| 2023 \| Rust \| {{free\|Apache 2.0 or MIT}} \| \| Rust library for expressing Datalog-like inferences, based on procedural macros \| [https://github.com/ekzhang/crepe Homepage]
scope="row" \| Datafrog \| 2019 \| Rust \| {{free\|Apache 2.0 or MIT}} \| \| Lightweight Datalog engine intended to be embedded in other Rust programs \| [https://github.com/rust-lang/datafrog Homepage]
scope="row" \| Datafun \| 2016 \| Racket \| {{proprietary\|open source, no license in repository}} \| \| Functional programming language that generalized Datalog on semilattices \| [https://www.rntz.net/datafun/ Homepage] [https://github.com/rntz/datafun Repository]
scope="row" \| Datahike \| 2024 \| Clojure \| {{free\|Eclipse Public License 1.0}} \| built-in database (in-memory or file) \| Fork of DataScript with a durable backend based on a [https://github.com/datacrypt-project/hitchhiker-tree hitchhiker tree], using Datalog as query language \| [https://github.com/replikativ/datahike Homepage]
scope="row" \| Datalevin \| 2024 \| Clojure \| {{free\|Eclipse Public License 1.0}} \| LMDB bindings \| Fork of DataScript optimized for LMDB durable storage, using Datalog as query language \| [https://github.com/juji-io/datalevin Homepage]
scope="row" \| Datalog (Erlang) \| 2019 \| Erlang \| {{free\|Apache 2.0}} \| \| Library to support Datalog queries in Erlang, with data represented as streams of tuples \| [https://github.com/fogfish/datalog Homepage]
scope="row" \| Datalog (MITRE) \| 2016 \| Lua \| {{LGPL-lic}} \| \| Lightweight deductive database system, designed to be small and usable on memory constrained devices \| [https://datalog.sourceforge.net/ Homepage] [https://ysangkok.github.io/mitre-datalog.js/wrapper.html Online demo]
scope="row" \| Datalog (OCaml) \| 2019 \| OCaml \| {{BSD-lic}} 2-clause \| \| In-memory Datalog implementation for OCaml featuring bottom-up and top-down algorithms \| [https://github.com/c-cube/datalog/ Homepage]
scope="row" \| Datalog (Racket) \| 2022 \| Racket \| {{free\|Apache 2.0 or MIT}} \| \| Racket package for using Datalog \| [https://docs.racket-lang.org/datalog/ Homepage] [https://github.com/racket/datalog Repository]
scope="row" \| Datalog Educational System \| 2021 \| Prolog \| {{LGPL-lic}} \| DBMS connectors \| Open-source implementation intended for teaching Datalog{{Citation \| last = Saenz-Perez \| title = DES: A Deductive Database System \| journal = Electronic Notes in Theoretical Computer Science \| volume = 271 \| pages = 63–78 \| place = ES \| doi = 10.1016/j.entcs.2011.02.011 \| year = 2011 \| doi-access = free }}. \| [https://des.sourceforge.net Homepage]
scope="row" \| DataScript \| 2024 \| Clojure \| {{free\|Eclipse Public License 1.0}} \| in-memory database \| Immutable database that runs in a browser, using Datalog as query language \| [https://github.com/tonsky/datascript Homepage]
scope="row" \| Datomic \| 2024 \| Clojure \| {{proprietary\|closed source; binaries released under Apache 2.0}} \| bindings for DynamoDB, Cassandra, PostgreSQL and others \| Distributed database running on cloud architectures; uses Datalog as query language \| [https://www.datomic.com/ Homepage]
scope="row" \| DDlog \| 2021 \| Rust \| {{free\|MIT License}} \| \| Incremental, in-memory, typed Datalog engine; compiled in Rust; based on the differential dataflow{{Citation \| url = https://github.com/timelydataflow/differential-dataflow/ \| title = Differential Dataflow\| date = July 2022}} library \| [https://github.com/vmware/differential-datalog Homepage]
scope="row" \| DLV \| 2023 \| C++ \| {{proprietary\|proprietary, free for some uses}} \| \| Answer Set Programming system that supports Datalog as a special case \| [https://dlv.demacs.unical.it/home Homepage] [https://www.dlvsystem.it/ Company]
scope="row" \| Dyna1 \| 2013 \| Haskell \| {{free\|GNU AGPL v3}} \| \| Declarative programming language using Datalog for statistical AI programming; later Dyna versions do not use Datalog \| [https://github.com/nwf/dyna Repository] [https://web.archive.org/web/20160117155947/http://dyna.org/ Homepage (archived)]
scope="row" \| Flix \| 2024 \| Java \| {{free\|Apache 2.0}} \| \| Functional and logic programming language inspired by Datalog extended with user-defined lattices and monotone filter/transfer functions \| [https://flix.dev/ Homepage] [https://play.flix.dev/ Online demo]
scope="row" \| Graal \| 2018 \| Java \| {{free\|CeCILL v2.1}} \| RDF import, CSV import, DBMS connectors \| Java toolkit dedicated to querying knowledge bases within the framework of existential rules (a.k.a. tuple-generating dependencies or Datalog+/-) \| [https://graphik-team.github.io/graal/ Homepage]
scope="row" \| Inter4QL \| 2020 \| C++ \| {{BSD-lic}} \| \| Interpreter for a database query language based on four-valued logic, supports Datalog as a special case \| [http://4ql.org/a-downloads-inter4ql.html Homepage]
scope="row" \| IRIS \| 2016 \| Java \| {{LGPL-lic}} v2.1 \| \| Logic programming system supporting Datalog and negation under the well-founded semantics; support for RDFS \| [https://github.com/NICTA/iris-reasoner/tree/master Repository]
scope="row" \| Jena \| 2024 \| Java \| {{free\|Apache 2.0}} \| RDF import \| Semantic web framework that includes a Datalog implementation as part of its general purpose rule engine; compatibility with RDF \| [https://jena.apache.org/documentation/inference/ Rule engine documentation]
scope="row" \| Mangle \| 2024 \| Go \| {{free\|Apache 2.0}} \| \| Programming language for deductive database programming, supporting an extension of Datalog \| [https://github.com/google/mangle Homepage]
scope="row" \| Naga \| 2021 \| Clojure \| {{free\|Eclipse Public License 1.0}} \| [https://github.com/quoll/asami Asami graph database] \| Query engine that executes Datalog queries over the graph database; runs in browsers (memory), on JVM (memory/files), or natively (memory/files). \| [https://github.com/quoll/naga/ Homepage]
scope="row" \| Nemo \| 2024 \| Rust \| {{free\|Apache 2.0 or MIT}} \| RDF import, CSV import \| In-memory rule engine for knowledge graph analysis and database transformations; compatible with RDF and SPARQL; supports tgds \| [https://github.com/knowsys/nemo Homepage] [https://tools.iccl.inf.tu-dresden.de/nemo/ Online demo]
scope="row" \| pyDatalog \| 2015 \| Python \| {{LGPL-lic}} \| DBMS connectors from Python \| Python library for interpreting Datalog queries \| [https://sites.google.com/site/pydatalog/ Homepage] [https://github.com/pcarbonn/pyDatalog Repository]
scope="row" \| RDFox \| 2024 \| C++ \| {{proprietary\|proprietary, free for some uses}} \| in-memory database, RDF import, CSV import, DBMS connectors \| Main-memory based RDF triple store with Datalog reasoning; supports incremental evaluation and high availability setups \| [https://www.oxfordsemantic.tech/ Homepage]
scope="row" \| SociaLite \| 2016 \| Java \| {{free\|Apache 2.0}} \| HDFS bindings \| Datalog variant and engine for large-scale graph analysis \| [http://web.archive.org/web/20141110003235/http://socialite-lang.github.io/ Homepage (archived)] [https://github.com/socialite-lang/socialite Repository]
scope="row" \| Soufflé \| 2023 \| C++ \| {{free\|UPL v1.0}} \| CSV import, sqlite3 bindings \| Datalog engine originally designed for applications static program analysis; rule sets are either compiled to C++ programs or interpreted \| [https://souffle-lang.github.io/ Homepage]
scope="row" \| tclbdd \| 2015 \| Tcl \| {{BSD-lic}} \| \| Datalog implementation based on binary decision diagrams; designed to support development of an optimizing compiler for Tcl{{Cite conference\|url=http://tcl.tk/pub/incoming/p14/Tcl-21st-2014-Portland/KevinKenny/kenny-bdd.pdf \|title=Binary decision diagrams, relational algebra, and Datalog: deductive reasoning for Tcl \|last=Kenny \|first=Kevin B \|conference=Twenty-first Annual Tcl/Tk Conference \|location=Portland, Oregon \|date=12–14 November 2014 \|access-date=29 December 2015 }} \| [https://chiselapp.com/user/kbk/repository/tclbdd/ Homepage]
scope="row" \| TerminusDB \| 2024 \| Prolog/Rust \| {{free\|Apache 2.0}} \| \| Graph database and document store, that also features a Datalog-based query language \| [https://terminusdb.com/ Homepage]
scope="row" \| XSB \| 2022 \| C \| {{LGPL-lic}} \| \| A logic programming and deductive database system based on Prolog with tabling giving Datalog-like termination and efficiency, including incremental evaluationThe XSB System, Version 3.7.x, {{Citation \| url = http://xsb.sourceforge.net/manual1/manual1.pdf \| title = Volume 1: Programmer's Manual }}. \| [https://xsb.sourceforge.net/ Homepage]
scope="row" \| XTDB (formerly Crux) \| 2024 \| Clojure \| {{free\|MPL 2.0}} \| bindings for Apache Kafka and others \| Immutable database with time-travel, Datalog used as query language in XTDB 1.x (may change in XTDB 2.x) \| [https://xtdb.com/ Homepage] [https://github.com/xtdb/xtdb Repository]

class="wikitable sortable"

|+ List of Datalog engines that are free software and/or open source

! Name

! Year of latest release

! Written in

! Licence

! Data sources

! Description

! Links

scope="row" | AbcDatalog

| 2023

| Java

| {{BSD-lic}}

| Datalog engine that implements common evaluation algorithms; designed for extensibility, research use, and education

| [https://abcdatalog.seas.harvard.edu/ Homepage]

scope="row" | Ascent

| 2023

| Rust

| {{free|MIT License}}

| A logic programming language (similar to Datalog) embedded in Rust via macros, supporting a Lattice and customized datastructure.

| [https://github.com/s-arash/ascent/ Repository]

scope="row" | bddbddb

| 2007

| Java

| {{LGPL-lic}}

| Datalog implementation designed to query Java bytecode including points-to analysis on large Java programs; using BDDs internally.

| [http://bddbddb.sourceforge.net/ Homepage]

scope="row" | Bloom (Bud)

| 2017

| Ruby

| {{BSD-lic}} 3-Clause

| Ruby DSL for programming with data-centric constructs, based on the [https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html Dedalus] extension of Datalog which adds a temporal dimension to the logic

| [http://bloom-lang.net/ Homepage] [https://github.com/bloom-lang/bud Repository]

scope="row" | Cascalog

| 2014

| Clojure

| {{free|Apache 2.0}}

| can query other DBMS

| Data processing and querying library for Clojure and Java, designed to be used on Hadoop

| [https://github.com/nathanmarz/cascalog Repository] [http://web.archive.org/web/20161220171421/http://cascalog.org/ Homepage (archived)]

scope="row" | Clingo

| 2024

| C++

| {{free|MIT License}}

| Answer Set Programming system that supports Datalog as a special case; its standalone grounder gringo suffices for plain Datalog

| [https://potassco.org/clingo/ Homepage] [https://github.com/potassco/clingo Repository] [https://potassco.org/clingo/run/ Online demo]

scope="row" | ConceptBase

| 2024

| various

| {{BSD-lic}} 2-Clause

| deductive and object-oriented database system for conceptual modeling and metamodeling, which includes a Datalog query evaluator

| [https://conceptbase.sourceforge.net/ Homepage]

scope="row" | Coral

| 1997

| C++

| {{proprietary|proprietary, free for some uses, open source}}

| A deductive database system written in C++ with semi-naïve datalog evaluation. Developed 1988-1997.

| [http://research.cs.wisc.edu/coral/ Homepage]

scope="row" | Crepe

| 2023

| Rust

| {{free|Apache 2.0 or MIT}}

| Rust library for expressing Datalog-like inferences, based on procedural macros

| [https://github.com/ekzhang/crepe Homepage]

scope="row" | Datafrog

| 2019

| Rust

| {{free|Apache 2.0 or MIT}}

| Lightweight Datalog engine intended to be embedded in other Rust programs

| [https://github.com/rust-lang/datafrog Homepage]

scope="row" | Datafun

| 2016

| Racket

| {{proprietary|open source, no license in repository}}

| Functional programming language that generalized Datalog on semilattices

| [https://www.rntz.net/datafun/ Homepage] [https://github.com/rntz/datafun Repository]

scope="row" | Datahike

| 2024

| Clojure

| {{free|Eclipse Public License 1.0}}

| built-in database (in-memory or file)

| Fork of DataScript with a durable backend based on a [https://github.com/datacrypt-project/hitchhiker-tree hitchhiker tree], using Datalog as query language

| [https://github.com/replikativ/datahike Homepage]

scope="row" | Datalevin

| 2024

| Clojure

| {{free|Eclipse Public License 1.0}}

| LMDB bindings

| Fork of DataScript optimized for LMDB durable storage, using Datalog as query language

| [https://github.com/juji-io/datalevin Homepage]

scope="row" | Datalog (Erlang)

| 2019

| Erlang

| {{free|Apache 2.0}}

| Library to support Datalog queries in Erlang, with data represented as streams of tuples

| [https://github.com/fogfish/datalog Homepage]

scope="row" | Datalog (MITRE)

| 2016

| Lua

| {{LGPL-lic}}

| Lightweight deductive database system, designed to be small and usable on memory constrained devices

| [https://datalog.sourceforge.net/ Homepage] [https://ysangkok.github.io/mitre-datalog.js/wrapper.html Online demo]

scope="row" | Datalog (OCaml)

| 2019

| OCaml

| {{BSD-lic}} 2-clause

| In-memory Datalog implementation for OCaml featuring bottom-up and top-down algorithms

| [https://github.com/c-cube/datalog/ Homepage]

scope="row" | Datalog (Racket)

| 2022

| Racket

| {{free|Apache 2.0 or MIT}}

| Racket package for using Datalog

| [https://docs.racket-lang.org/datalog/ Homepage] [https://github.com/racket/datalog Repository]

scope="row" | Datalog Educational System

| 2021

| Prolog

| {{LGPL-lic}}

| DBMS connectors

| Open-source implementation intended for teaching Datalog{{Citation | last = Saenz-Perez | title = DES: A Deductive Database System | journal = Electronic Notes in Theoretical Computer Science | volume = 271 | pages = 63–78 | place = ES | doi = 10.1016/j.entcs.2011.02.011 | year = 2011 | doi-access = free }}.

| [https://des.sourceforge.net Homepage]

scope="row" | DataScript

| 2024

| Clojure

| {{free|Eclipse Public License 1.0}}

| in-memory database

| Immutable database that runs in a browser, using Datalog as query language

| [https://github.com/tonsky/datascript Homepage]

scope="row" | Datomic

| 2024

| Clojure

| {{proprietary|closed source; binaries released under Apache 2.0}}

| bindings for DynamoDB, Cassandra, PostgreSQL and others

| Distributed database running on cloud architectures; uses Datalog as query language

| [https://www.datomic.com/ Homepage]

scope="row" | DDlog

| 2021

| Rust

| {{free|MIT License}}

| Incremental, in-memory, typed Datalog engine; compiled in Rust; based on the differential dataflow{{Citation | url = https://github.com/timelydataflow/differential-dataflow/ | title = Differential Dataflow| date = July 2022}} library

| [https://github.com/vmware/differential-datalog Homepage]

scope="row" | DLV

| 2023

| C++

| {{proprietary|proprietary, free for some uses}}

| Answer Set Programming system that supports Datalog as a special case

| [https://dlv.demacs.unical.it/home Homepage]
[https://www.dlvsystem.it/ Company]

scope="row" | Dyna1

| 2013

| Haskell

| {{free|GNU AGPL v3}}

| Declarative programming language using Datalog for statistical AI programming; later Dyna versions do not use Datalog

| [https://github.com/nwf/dyna Repository] [https://web.archive.org/web/20160117155947/http://dyna.org/ Homepage (archived)]

scope="row" | Flix

| 2024

| Java

| {{free|Apache 2.0}}

| Functional and logic programming language inspired by Datalog extended with user-defined lattices and monotone filter/transfer functions

| [https://flix.dev/ Homepage] [https://play.flix.dev/ Online demo]

scope="row" | Graal

| 2018

| Java

| {{free|CeCILL v2.1}}

| RDF import, CSV import, DBMS connectors

| Java toolkit dedicated to querying knowledge bases within the framework of existential rules (a.k.a. tuple-generating dependencies or Datalog+/-)

| [https://graphik-team.github.io/graal/ Homepage]

scope="row" | Inter4QL

| 2020

| C++

| {{BSD-lic}}

| Interpreter for a database query language based on four-valued logic, supports Datalog as a special case

| [http://4ql.org/a-downloads-inter4ql.html Homepage]

scope="row" | IRIS

| 2016

| Java

| {{LGPL-lic}} v2.1

| Logic programming system supporting Datalog and negation under the well-founded semantics; support for RDFS

| [https://github.com/NICTA/iris-reasoner/tree/master Repository]

scope="row" | Jena

| 2024

| Java

| {{free|Apache 2.0}}

| RDF import

| Semantic web framework that includes a Datalog implementation as part of its general purpose rule engine; compatibility with RDF

| [https://jena.apache.org/documentation/inference/ Rule engine documentation]

scope="row" | Mangle

| 2024

| Go

| {{free|Apache 2.0}}

| Programming language for deductive database programming, supporting an extension of Datalog

| [https://github.com/google/mangle Homepage]

scope="row" | Naga

| 2021

| Clojure

| {{free|Eclipse Public License 1.0}}

| [https://github.com/quoll/asami Asami graph database]

| Query engine that executes Datalog queries over the graph database; runs in browsers (memory), on JVM (memory/files), or natively (memory/files).

| [https://github.com/quoll/naga/ Homepage]

scope="row" | Nemo

| 2024

| Rust

| {{free|Apache 2.0 or MIT}}

| RDF import, CSV import

| In-memory rule engine for knowledge graph analysis and database transformations; compatible with RDF and SPARQL; supports tgds

| [https://github.com/knowsys/nemo Homepage] [https://tools.iccl.inf.tu-dresden.de/nemo/ Online demo]

scope="row" | pyDatalog

| 2015

| Python

| {{LGPL-lic}}

| DBMS connectors from Python

| Python library for interpreting Datalog queries

| [https://sites.google.com/site/pydatalog/ Homepage] [https://github.com/pcarbonn/pyDatalog Repository]

scope="row" | RDFox

| 2024

| C++

| {{proprietary|proprietary, free for some uses}}

| in-memory database, RDF import, CSV import, DBMS connectors

| Main-memory based RDF triple store with Datalog reasoning; supports incremental evaluation and high availability setups

| [https://www.oxfordsemantic.tech/ Homepage]

scope="row" | SociaLite

| 2016

| Java

| {{free|Apache 2.0}}

| HDFS bindings

| Datalog variant and engine for large-scale graph analysis

| [http://web.archive.org/web/20141110003235/http://socialite-lang.github.io/ Homepage (archived)] [https://github.com/socialite-lang/socialite Repository]

scope="row" | Soufflé

| 2023

| C++

| {{free|UPL v1.0}}

| CSV import, sqlite3 bindings

| Datalog engine originally designed for applications static program analysis; rule sets are either compiled to C++ programs or interpreted

| [https://souffle-lang.github.io/ Homepage]

scope="row" | tclbdd

| 2015

| Tcl

| {{BSD-lic}}

| Datalog implementation based on binary decision diagrams; designed to support development of an optimizing compiler for Tcl{{Cite conference|url=http://tcl.tk/pub/incoming/p14/Tcl-21st-2014-Portland/KevinKenny/kenny-bdd.pdf |title=Binary decision diagrams, relational algebra, and Datalog: deductive reasoning for Tcl |last=Kenny |first=Kevin B |conference=Twenty-first Annual Tcl/Tk Conference |location=Portland, Oregon |date=12–14 November 2014 |access-date=29 December 2015 }}

| [https://chiselapp.com/user/kbk/repository/tclbdd/ Homepage]

scope="row" | TerminusDB

| 2024

| Prolog/Rust

| {{free|Apache 2.0}}

| Graph database and document store, that also features a Datalog-based query language

| [https://terminusdb.com/ Homepage]

scope="row" | XSB

| 2022

| C

| {{LGPL-lic}}

| A logic programming and deductive database system based on Prolog with tabling giving Datalog-like termination and efficiency, including incremental evaluationThe XSB System, Version 3.7.x, {{Citation | url = http://xsb.sourceforge.net/manual1/manual1.pdf | title = Volume 1: Programmer's Manual }}.

| [https://xsb.sourceforge.net/ Homepage]

scope="row" | XTDB (formerly Crux)

| 2024

| Clojure

| {{free|MPL 2.0}}

| bindings for Apache Kafka and others

| Immutable database with time-travel, Datalog used as query language in XTDB 1.x (may change in XTDB 2.x)

| [https://xtdb.com/ Homepage] [https://github.com/xtdb/xtdb Repository]

=Non-free software=

FoundationDB provides a free-of-charge database binding for pyDatalog, with a tutorial on its use.{{Citation|url=http://foundationdb.com/documentation/latest/datalog.html |archive-url=https://archive.today/20130809001616/http://foundationdb.com/documentation/latest/datalog.html |url-status=dead |archive-date=2013-08-09 |title=FoundationDB Datalog Tutorial }}.
Leapsight Semantic Dataspace (LSD) is a distributed deductive database that offers high availability, fault tolerance, operational simplicity, and scalability. LSD uses Leaplog (a Datalog implementation) for querying and reasoning and was create by Leapsight.{{Cite web|url=http://leapsight.com/pages/products.html|title=Leapsight|archive-url=https://web.archive.org/web/20181111010739/http://leapsight.com/pages/products.html|archive-date=2018-11-11|url-status=dead}}
LogicBlox, a commercial implementation of Datalog used for web-based retail planning and insurance applications.
Profium Sense is a native RDF compliant graph database written in Java. It provides Datalog evaluation support of user defined rules.
.QL, a commercial object-oriented variant of Datalog created by Semmle for analyzing source code to detect security vulnerabilities.{{Citation | url = https://semmle.com/ql | title = Semmle QL| date = 18 September 2019}}.
SecPAL a security policy language developed by Microsoft Research.{{cite web |url=http://research.microsoft.com/projects/secpal |work=Microsoft Research |title=SecPAL |url-status=dead |archive-url=https://web.archive.org/web/20070223213744/http://research.microsoft.com/projects/SecPAL/ |archive-date=2007-02-23 }}
Stardog is a graph database, implemented in Java. It provides support for RDF and all OWL 2 profiles providing extensive reasoning capabilities, including datalog evaluation.
StrixDB: a commercial RDF graph store, SPARQL compliant with Lua API and Datalog inference capabilities. Could be used as httpd (Apache HTTP Server) module or standalone (although beta versions are under the Perl Artistic License 2.0).

Uses and influence

Datalog is quite limited in its expressivity. It is not Turing-complete, and doesn't include basic data types such as integers or strings. This parsimony is appealing from a theoretical standpoint, but it means Datalog per se is rarely used as a programming language or knowledge representation language.Lifschitz, Vladimir. "Foundations of logic programming." Principles of knowledge representation 3 (1996): 69-127. "The expressive possibilities of [Datalog] are much too limited for meaningful applications to knowledge representation." Most Datalog engines implement substantial extensions of Datalog. However, Datalog has a strong influence on such implementations, and many authors don't bother to distinguish them from Datalog as presented in this article. Accordingly, the applications discussed in this section include applications of realistic implementations of Datalog-based languages.

Datalog has been applied to problems in data integration, information extraction, networking, security, cloud computing and machine learning.{{Citation | url = http://www.cs.ucdavis.edu/~green/papers/sigmod906t-huang.pdf | title = SIGMOD 2011 | contribution = Datalog and Emerging applications | last = Huang, Green, and Loo | publisher = UC Davis}}.{{Cite journal|title=Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification|journal=Proceedings of ICML 2020|last1=Mei |first1=Hongyuan |last2=Qin |first2=Guanghui |last3=Xu |first3=Minjie |last4=Eisner |first4=Jason |year=2020 |arxiv=2006.16723 }} Google has developed an extension to Datalog for big data processing.{{Cite conference |last1=Chin |first1=Brian |last2=Dincklage |first2=Daniel von |last3=Ercegovac |first3=Vuk |last4=Hawkins |first4=Peter |last5=Miller |first5=Mark S. |last6=Och |first6=Franz |last7=Olston |first7=Christopher |last8=Pereira |first8=Fernando |title=Yedalog: Exploring Knowledge at Scale | conference= 1st Summit on Advances in Programming Languages (SNAPL 2015) |date=2015 |editor-last=Ball |editor-first=Thomas |editor2-last=Bodik |editor2-first=Rastislav |editor3-last=Krishnamurthi |editor3-first=Shriram |editor4-last=Lerner |editor4-first=Benjamin S. |editor5-last=Morrisett |editor5-first=Greg |url=http://drops.dagstuhl.de/opus/volltexte/2015/5017 |series=Leibniz International Proceedings in Informatics (LIPIcs) |location=Dagstuhl, Germany |publisher=Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik |volume=32 |pages=63–78 |doi=10.4230/LIPIcs.SNAPL.2015.63 |doi-access=free |isbn=978-3-939897-80-4}}

Datalog has seen application in static program analysis.{{Cite book |last1=Whaley |first1=John |last2=Avots |first2=Dzintars |last3=Carbin |first3=Michael |last4=Lam |first4=Monica S. |title=Programming Languages and Systems |chapter=Using Datalog with Binary Decision Diagrams for Program Analysis |date=2005 |editor-last=Yi |editor-first=Kwangkeun |chapter-url=https://link.springer.com/chapter/10.1007/11575467_8 |series=Lecture Notes in Computer Science |volume=3780 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=97–118 |doi=10.1007/11575467_8 |isbn=978-3-540-32247-4|s2cid=5223577 }} The Soufflé dialect has been used to write pointer analyses for Java and a control-flow analysis for Scheme.{{Cite book |last1=Scholz |first1=Bernhard |last2=Jordan |first2=Herbert |last3=Subotić |first3=Pavle |last4=Westmann |first4=Till |title=Proceedings of the 25th International Conference on Compiler Construction |chapter=On fast large-scale program analysis in Datalog |date=2016-03-17 |chapter-url=https://doi.org/10.1145/2892208.2892226 |series=CC 2016 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=196–206 |doi=10.1145/2892208.2892226 |isbn=978-1-4503-4241-4|s2cid=7531543 |url=https://zenodo.org/record/345932 }}{{Cite book |last1=Antoniadis |first1=Tony |last2=Triantafyllou |first2=Konstantinos |last3=Smaragdakis |first3=Yannis |title=Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis |chapter=Porting doop to Soufflé |date=2017-06-18 |chapter-url=https://doi.org/10.1145/3088515.3088522 |series=SOAP 2017 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=25–30 |doi=10.1145/3088515.3088522 |isbn=978-1-4503-5072-3|s2cid=3074689 }} Datalog has been integrated with SMT solvers to make it easier to write certain static analyses.{{Cite journal |last1=Bembenek |first1=Aaron |last2=Greenberg |first2=Michael |last3=Chong |first3=Stephen |date=2020-11-13 |title=Formulog: Datalog for SMT-based static analysis |journal=Proceedings of the ACM on Programming Languages |volume=4 |issue=OOPSLA |pages=141:1–141:31 |doi=10.1145/3428209|s2cid=226961727 |doi-access=free }} The Flix dialect is also suited to writing static program analyses.{{Cite journal |last1=Madsen |first1=Magnus |last2=Yee |first2=Ming-Ho |last3=Lhoták |first3=Ondřej |date=2016-06-02 |title=From Datalog to flix: a declarative language for fixed points on lattices |url=https://doi.org/10.1145/2980983.2908096 |journal=ACM SIGPLAN Notices |volume=51 |issue=6 |pages=194–208 |doi=10.1145/2980983.2908096 |issn=0362-1340}}

Some widely used database systems include ideas and algorithms developed for Datalog. For example, the SQL:1999 standard includes recursive queries, and the Magic Sets algorithm (initially developed for the faster evaluation of Datalog queries) is implemented in IBM's DB2.{{cite book|last1=Gryz|last2=Guo|last3=Liu|last4=Zuzarte|chapter=Query sampling in DB2 Universal Database|chapter-url=http://dl.acm.org/ft_gateway.cfm?id=1007664&ftid=268727&dwn=1&CFID=411300798&CFTOKEN=23708243|chapter-format=PDF|doi=10.1145/1007568.1007664|title=Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04|pages=839|year=2004|isbn=978-1581138597|s2cid=7775190}}

History

The origins of Datalog date back to the beginning of logic programming, but it became prominent as a separate area around 1977 when Hervé Gallaire and Jack Minker organized a workshop on logic and databases.{{Citation | editor1-first = Hervé | editor1-last = Gallaire | editor2-first = John 'Jack' | editor2-last = Minker | contribution = Logic and Data Bases, Symposium on Logic and Data Bases, Centre d'études et de recherches de Toulouse, 1977 | title = Advances in Data Base Theory | publisher = Plenum Press | place = New York | year = 1978 | isbn = 978-0-306-40060-5 | url-access = registration | url = https://archive.org/details/logicdatabases0000symp }}. David Maier is credited with coining the term Datalog.{{Citation | author1-link = Serge Abiteboul | first1 = Serge | last1 = Abiteboul | first2 = Richard | last2 = Hull | author3-link = Victor Vianu | first3 = Victor | last3 = Vianu | title = Foundations of databases | page = 305 | url = https://books.google.com/books?id=HN9QAAAAMAAJ&q=David+Maier| isbn = 9780201537710 | year = 1995 | publisher = Addison-Wesley }}.

Notes

References

{{Cite journal|last1=Ceri|first1=S.|last2=Gottlob|first2=G.|last3=Tanca|first3=L.|date=March 1989|title=What you always wanted to know about Datalog (and never dared to ask)|url=https://www2.cs.sfu.ca/CourseCentral/721/jim/DatalogPaper.pdf|journal=IEEE Transactions on Knowledge and Data Engineering|volume=1|issue=1|pages=146–166|doi=10.1109/69.43410|issn=1041-4347 | citeseerx = 10.1.1.210.1118}}
{{Cite book |last=Abiteboul |first=S. |url=https://www.worldcat.org/oclc/30546436 |title=Foundations of databases |date=1995 |publisher=Addison-Wesley |others=Richard Hull, Victor Vianu |isbn=0-201-53771-0 |location=Reading, Mass. |oclc=30546436}}

Category:Query languages

Category:Logic programming languages

Category:Declarative programming languages

datalog

Example

Comparison to relational databases

Syntax

= Syntactic sugar =

Semantics

= Model theoretic =

= Fixed-point =

= Proof-theoretic =

Evaluation

= Bottom-up evaluation strategies =

== Naïve evaluation ==

== Semi-naïve evaluation ==

== Performance considerations ==

= Top-down evaluation strategies =

= Magic sets =

Complexity

Extensions

= Aggregation =

= Negation =

Comparison to Prolog

Expressiveness

Datalog engines

=Free software/open source=

=Non-free software=

Uses and influence

History

See also

Notes

References