DuckDB

{{Short description|Open source column-oriented RDBMS}}

{{More citations needed|date=March 2024}}

{{Infobox software

| name = DuckDB

| logo = DuckDB logo.svg

| developer = DuckDB Labs

| latest_release_version = v1.3.0

| latest_release_date = {{start date |2025|05|21}}

| repo = {{URL|https://github.com/duckdb/duckdb}}

| programming language = C++

| operating_system = Cross-platform

| genre = Column-oriented DBMS
RDBMS

| license = MIT License

| website = {{URL|https://www.duckdb.org/}}

}}

{{Portal|Free and open-source software}}

DuckDB is an open-source column-oriented Relational Database Management System (RDBMS).{{Cite web |title=DuckDB Documentation SQL Introduction |url=https://duckdb.org/docs/sql/introduction.html |access-date=2024-11-20 }} It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (for example, SQLite) DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads.{{cite conference | last1=Raasveldt | first1=Mark | last2=Mühleisen | first2=Hannes | title=DuckDB: an Embeddable Analytical Database | publisher=ACM | date=2019-06-25 | isbn=978-1-4503-5643-5 | doi=10.1145/3299869.3320212 | pages=1981–1984}} The project has over 6 million downloads per month.{{Cite web |title=PyPi Download Stats |url=https://pypistats.org/packages/duckdb |access-date=2024-08-13 |website=www.pypistats.org |language=en |archive-date=2024-08-13 |archive-url=https://web.archive.org/web/20240813165631/https://pypistats.org/packages/duckdb |url-status=live }}{{Cite web |title=DuckDB Python Downloads Dashboard |url=https://duckdbstats.com/ |access-date=2024-08-13 |website=duckdbstats.com |language=en |archive-date=2024-08-13 |archive-url=https://web.archive.org/web/20240813165159/https://duckdbstats.com/ |url-status=live }}{{Cite web |last=Clark |first=Lindsay |title=DuckDB Labs puts limit on free support, rules out VC funding |url=https://www.theregister.com/2023/10/05/duckdb_labs_puts_limit_on_vc_funds/ |access-date=2024-03-23 |website=www.theregister.com |language=en |archive-date=2024-03-23 |archive-url=https://web.archive.org/web/20240323064605/https://www.theregister.com/2023/10/05/duckdb_labs_puts_limit_on_vc_funds/ |url-status=live }}

History

DuckDB was originally developed by Mark Raasveldt and Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in the Netherlands.{{cite book | last=Kamphuis | first=Chris | title=Advances in Information Retrieval | chapter=Graph Databases for Information Retrieval | series=Lecture Notes in Computer Science | publisher=Springer International Publishing | publication-place=Cham | volume=12036 | date=2020 | isbn=978-3-030-45441-8 | pmc=7148032 | doi=10.1007/978-3-030-45442-5_79 | pages=608–612}} The project co-founders designed DuckDB to address the need for an in-process OLAP database solution.{{cite magazine |last=van der Ent |first=Leendert |date=April 2023 |title=DuckDB: Introducing a New Class of Data Management Systems |magazine=I/O Magazine |url=https://ict-research.nl/wordpress/wp-content/uploads/2023/04/IO-magazine-NR1-2023.pdf |publisher=ICT Research Platform Nederland |access-date=12 November 2024}} DuckDB was first released in 2019.{{Cite web |last=Clark |first=Lindsay |title=DuckDB reaches version 0.5.0 |url=https://www.theregister.com/2022/09/09/duckdb_0_5_0/ |access-date=2024-03-23 |website=www.theregister.com |language=en |archive-date=2024-03-07 |archive-url=https://web.archive.org/web/20240307163220/https://www.theregister.com/2022/09/09/duckdb_0_5_0/ |url-status=live }} DuckDB version 1.0.0 was released on June 3, 2024, under the codename SnowDuck.{{cite web | last1=Raasveldt | first1=Mark | last2=Mühleisen | first2=Hannes |title=Announcing DuckDB 1.0.0 | date=3 June 2024 |url=https://duckdb.org/2024/06/03/announcing-duckdb-100.html |access-date=12 November 2024}}

Features

DuckDB uses a vectorized query processing engine. DuckDB is special amongst database management systems because it does not have any external dependencies and can build with just a C++11 compiler.{{Cite web |title=DuckDB Building Instructions |url=https://duckdb.org/docs/dev/building/build_instructions |access-date=2024-08-16 }} DuckDB also deviates from the traditional client–server model by running inside a host process (it has bindings, for example, for a Python interpreter with the ability to directly place data into NumPy arrays). DuckDB's SQL parser is derived from the pg_query library developed by Lukas Fittl, which is itself derived from PostgreSQL's SQL parser that has been stripped down as much as possible.{{Cite web |last=Slot |first=Marco |title=How We Fused DuckDB into Postgres with Crunchy Bridge for Analytics |date=24 May 2024 |url=https://www.crunchydata.com/blog/how-we-fused-duckdb-into-postgres-with-crunchy-bridge-for-analytics |access-date=12 November 2024}} DuckDB uses a single-file storage format to store data on

disk, designed to support efficient scans

and bulk updates, appends and deletes.{{cite conference | last1=Raasveldt | first1=Mark | last2=Mühleisen | first2=Hannes |title=Data Management for Data Science Towards Embedded Analytics| publisher=Conference on Innovative Data Systems Research | url=https://hannes.muehleisen.org/publications/CIDR2020-raasveldt-muehleisen-duckdb.pdf| date=2020 }} DuckDB is also compiled to WebAssembly using emscripten which enables DuckDB to run SQL in browser-based analytics tools.{{Cite web |title=Introducing Universal SQL |url=https://evidence.dev/blog/why-we-built-usql |access-date=2025-01-17 }}{{Cite web |title=How we evolved our query architecture with DuckDB |url=https://count.co/blog/how-we-evolved-our-query-architecture-with-duckdb |access-date=2025-01-17 }}

Comparison

DuckDB in its OLAP niche does not compete with the traditional DBMS like MSSQL, PostgreSQL and Oracle database. While using SQL for queries, DuckDB targets serverless applications and provides extremely fast responses using either Apache Parquet files or its own format for storage. These attributes make it a popular choice for large dataset analysis in interactive mode.{{cite book | last=Bannert | first=M. | title=Research Software Engineering: A Guide to the Open Source Ecosystem | publisher=CRC Press | series=Chapman & Hall/CRC Data Science Series | year=2024 | isbn=978-1-04-000513-2 | url=https://books.google.com/books?id=yWL7EAAAQBAJ&pg=PT25 | access-date=2024-03-23 | page=25 | archive-date=2024-03-23 | archive-url=https://web.archive.org/web/20240323010627/https://books.google.com/books?id=yWL7EAAAQBAJ&pg=PT25 | url-status=live }}

Commercial use

DuckDB is used at Facebook, Google, and Airbnb.{{Cite web |last=Clark |first=Lindsay |title=Scale-up database wrangler MotherDuck scores $47.5 million |url=https://www.theregister.com/2022/11/17/475_million_says_scaleup_databases/ |access-date=2024-03-23 |website=www.theregister.com |language=en |archive-date=2024-03-23 |archive-url=https://web.archive.org/web/20240323064604/https://www.theregister.com/2022/11/17/475_million_says_scaleup_databases/ |url-status=live }}

DuckDB co-author Mühleisen also runs a support and consultancy firm for the software, DuckDB Labs. The company has chosen not to take venture capital funding, stating "We feel investment would force the project direction towards monetization, and we would much prefer keeping DuckDB open and available for as many people as possible". Another company, MotherDuck, has received $100m funding for its data platform based on DuckDB, with investors including Andreessen Horowitz.{{Cite web |last=Clark |first=Lindsay |title=MotherDuck serverless analytics platform wins $52.5M funding |url=https://www.theregister.com/2023/09/21/motherduck_funding/ |access-date=2024-03-23 |website=www.theregister.com |language=en |archive-date=2024-03-23 |archive-url=https://web.archive.org/web/20240323064604/https://www.theregister.com/2023/09/21/motherduck_funding/ |url-status=live }}

DuckDB Foundation

The independent non-profit DuckDB Foundation safeguards the long-term maintenance and development of DuckDB. The foundation holds much of the intellectual property of the project and is funded by charitable donations.{{Cite web |title=DuckDB Foundation |url=https://duckdb.org/foundation/ |access-date=2024-11-09 }} The DuckDB Foundation's statutes ensure DuckDB remains open-source under the MIT license in perpetuity.{{Cite web |title=DuckDB Project FAQs |url=https://duckdb.org/faq.html |access-date=2024-11-09 }}

Language support

In addition to the native C and C++ APIs, DuckDB supports a range of programming languages.

class="wikitable"

|+ Client APIs

LanguageNotesReference
JavaThe [https://duckdb.org/docs/api/java Java API] is implemented using JNI.{{cite web |title=Java JNI Source Code |url=https://github.com/duckdb/duckdb-java/blob/main/src/jni/duckdb_java.cpp |access-date=2024-09-07 |website=www.github.com |language=en}} Integration with the Apache Arrow{{cite web |title=DuckDB Java Arrow Source Code |url=https://github.com/duckdb/duckdb-java/blob/v1.0.0/src/main/java/org/duckdb/DuckDBResultSet.java#L132 |website=www.github.com |access-date=2024-09-07}} format is provided.{{cite web |title=DuckDB Java Source Code |url=https://github.com/duckdb/duckdb-java |access-date=2024-09-07 |website=www.github.com |language=en }}
PythonThe [https://duckdb.org/docs/api/python/overview Python API] implements support for the Pandas,{{cite web |title=DuckDB Pandas Source |url=https://github.com/duckdb/duckdb/tree/v1.0.0/tools/pythonpkg/src/include/duckdb_python/pandas| access-date=2024-09-07 |website=www.github.com |language=en }} Apache Arrow{{cite web |title=DuckDB PyArrow Source |url=https://github.com/duckdb/duckdb/tree/v1.0.0/tools/pythonpkg/src/include/duckdb_python/arrow |access-date=2024-09-07 |website=www.github.com |language=en }} and Polars data analysis packages.{{cite web |title=DuckDB Python Source Code |url=https://github.com/duckdb/duckdb/tree/v1.0.0/tools/pythonpkg/src/include/duckdb_python |access-date=2024-09-07 |website=www.github.com |language=en }}
RustThe [https://duckdb.org/docs/api/rust Rust API] is distributed as a [https://docs.rs/duckdb/latest/duckdb/ rust crate] that exposes an elegant wrapper over the native C API.{{cite web |title=DuckDB Rust Source Code |url=https://github.com/duckdb/duckdb-rs |access-date=2024-09-07 |website=www.github.com |language=en }}
Node.JS[https://duckdb.org/docs/api/nodejs/overview Node API]{{cite web |title=DuckDB Node Source Code|url=https://github.com/duckdb/duckdb-node |access-date=2024-09-07 |website=www.github.com |language=en }}
R[https://duckdb.org/docs/api/r R API]{{cite web |title=DuckDB R Source Code |url=https://github.com/duckdb/duckdb-r |access-date=2024-09-07 |website=www.github.com |language=en }}
Julia[https://duckdb.org/docs/api/julia Julia API]{{cite web |title=DuckDB Jullia Source Code |url=https://github.com/duckdb/duckdb/tree/v1.0.0/tools/juliapkg |access-date=2024-09-07 |website=www.github.com |language=en }}
Swift[https://duckdb.org/docs/api/swift Swift API]{{cite web |title=DuckDB Swift Source Code |url=https://github.com/duckdb/duckdb-swift |website=www.github.com |access-date=2024-09-07}}
WebAssembly[https://duckdb.org/docs/api/wasm/overview WASM API]{{cite web |title=DuckDB Swift Source Code |url=https://github.com/duckdb/duckdb-wasm |website=www.github.com |access-date=2025-01-17}}

Extensions

DuckDB's architecture supports extensions, allowing additional functionality to be added dynamically.{{cite web |url=https://duckdb.org/docs/extensions/overview.html |title=DuckDB Extensions Overview|website=www.duckdb.org |access-date=2025-01-17}} Many popular extensions are maintained by the core DuckDB team, and there are over 30 community extensions maintained by third parties.{{cite web |url=https://duckdb.org/docs/extensions/core_extensions |title=Core DuckDB Extensions |website=www.duckdb.org |access-date=2025-01-17}}{{cite web |url=https://duckdb.org/community_extensions/list_of_extensions |title=List of Community Extensions |website=www.duckdb.org |access-date=2025-01-17}}{{cite web |url=https://github.com/mehd-io/duckdb-extension-radar |title=DuckDB Extension Radar |website=www.github.com |access-date=2025-01-17}}

References

{{reflist}}

Further reading

  • {{cite web |last1=Woodie |first1=Alex |title=DuckDB Walks to the Beat of Its Own Analytics Drum |url=https://www.datanami.com/2024/03/05/duckdb-walks-to-the-beat-of-its-own-analytics-drum/ |website=Datanami |date=5 March 2024}}