Apache Drill
{{Short description|Open-source software framework}}
{{Primary sources|date=September 2012}}
{{Infobox software
| name = Apache Drill
| logo = Apache Drill logo.svg
| developer = Apache Software Foundation
| released = {{Start date and age|2015|05|19|df=no}}
| latest release version = 1.20.3
| latest release date = {{Start date and age|2023|01|07|df=no}}
| repo = {{URL|https://gitbox.apache.org/repos/asf?p%3Ddrill.git|Drill Repository}}
| programming language = Java
| operating system = Cross-platform
| license = Apache License 2.0
| website = {{URL|https://drill.apache.org}}
}}
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR,{{cite web |title=Apache Drill: Tracking its history as an open source community |url=http://radar.oreilly.com/2015/09/apache-drill-tracking-its-history-as-an-open-source-community.html |date=21 Sep 2015 |archive-url=https://archive.today/20160318173233/http://radar.oreilly.com/2015/09/apache-drill-tracking-its-history-as-an-open-source-community.html |archive-date=18 March 2016 |last=Friedman |first=Ellen}}{{Cite web |title=Brief About The Differences between Apache Drill Vs Presto |url=https://www.hitechnectar.com/blogs/apache-drill-vs-presto/ |access-date=2023-04-13 |website=HitechNectar |language=en-US}} Drill is inspired by Google's Dremel system.{{Cite web |title=Spark SQL vs. Apache Drill-War of the SQL-on-Hadoop Tools |url=https://www.projectpro.io/article/spark-sql-vs-apache-drill-war-of-the-sql-on-hadoop-tools/234 |access-date=2022-11-15 |website=ProjectPro |language=en}} Drill is an Apache top-level project.{{cite web|title=The Apache Software Foundation Announces Apache Drill as a Top-Level Project|date=2 December 2014 |url=https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces66|access-date=2014-12-02}}
Drill supports a variety of NoSQL databases and file systems, including Alluxio, HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores.
Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, if Drill and the datastore are on the same nodes.{{Cite web|title = Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage|url = https://drill.apache.org/|website = drill.apache.org|access-date = 2015-12-29}}
Tom Shiran is the founder of the Apache Drill Project.{{Cite web |last=Vizard |first=Michael |date=2021-09-01 |title=Apache Software Foundation updates Drill for broader SQL queries |url=https://venturebeat.com/business/apache-software-foundation-updates-drill-for-broader-sql-queries/ |access-date=2022-10-20 |website=VentureBeat |language=en-US}} It was designated an Apache Software Foundation top-level project in December 2016.{{Cite web |date=2016-04-11 |title=Apache Drill Eliminates ETL, Data Transformation for MapR Database |url=https://thenewstack.io/apache-drill-eliminates-etl-data-transformation-mapr-database/ |access-date=2022-11-15 |website=The New Stack |language=en-US}}
Features
One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds.{{cite web|url=http://wiki.apache.org/incubator/DrillProposal|title=DrillProposal - INCUBATOR - Apache Software Foundation }}
- Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared
- Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
- Extremely user and developer friendly
- Pluggable architecture enables connectivity to multiple datastores
- Version 1.9 added dynamic user-defined functions
- Version 1.11 added cryptographic-related functions and PCAP file format support
Back-end Support
Drill is primarily focused on non-relational datastores, including Apache Hadoop text files, NoSQL, and cloud storage. A notable feature also includes in situ querying of local JSON and Apache Parquet files. Some additional datastores that it supports include:
- All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR
- NoSQL: MongoDB, Apache HBase, Apache Cassandra
- Online Analytical Processing: Apache Kudu, Apache Druid, OpenTSDB
- Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Swift, IBM Cloud Object Storage
- Diverse data formats, including Apache Avro, Apache Parquet and JSON
- RDBMs storage plugins (Using JDBC to connect to MySQL, PostgreSQL, and others)
A new datastore can be added by developing a storage plugin. Drill's "schema-free" JSON data model enables it to query non-relational datastores in-situ .{{Cite web|title = Frequently Asked Questions - Apache Drill|url = https://drill.apache.org/faq/|website = drill.apache.org|access-date = 2015-12-29}}
Front-end Support
Drill itself can be queried via JDBC, ODBC, or REST through a variety of methods and languages including Python and Java. The default install includes a web interface allowing end-users to execute ANSI SQL directly and export data tables as CSV files without any programming.
The dashboard library, Apache Superset,{{Cite web |last=Wayner |first=James R. Borck, Martin Heller, Steven Nuñez, Andrew C. Oliver, Ian Pointer and Peter |date=2020-10-05 |title=The best open source software of 2020 |url=https://www.infoworld.com/article/3575858/the-best-open-source-software-of-2020.html |access-date=2022-11-26 |website=InfoWorld |language=en}} is particularly well suited for visualization of data queried with Drill.
See also
{{Portal|Free and open-source software}}
References
{{Reflist}}
Papers
Some papers influenced the birth and design. Here is a partial list:
- 2005 [http://www.eecs.berkeley.edu/~franklin/Papers/dataspaceSR.pdf From Databases to Dataspaces: A New Abstraction for Information Management], the authors highlight the need for storage systems to accept all data formats and to provide APIs for data access that evolve based on the storage system's understanding of the data.
- 2010 [http://research.google.com/pubs/pub36632.html Dremel: Interactive Analysis of Web-Scale Datasets]
External links
- {{Official website|//drill.apache.org/}}
- [http://radar.oreilly.com/2015/09/apache-drill-tracking-its-history-as-an-open-source-community.html Apache Drill: Tracking its history as an open source community]
- [https://www.zdnet.com/article/sql-and-hadoop-its-complicated/ SQL and Hadoop: It's complicated]
{{Apache Software Foundation}}