Trino (SQL query engine)

{{Short description|Open-source distributed SQL query engine}}

{{Infobox software

| name = Trino

| logo = Trino-logo-w-bk.svg

| logo caption =

| screenshot = Trino-dashboard.png

| caption = Trino UI Version 358

| author = Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang

| programming language = Java

| operating system = Cross-platform

| repo = {{URL|https://github.com/trinodb/trino|Trino Repository}}

| standard = ANSI SQL, JDBC

| genre = Data Warehouse

| license = Apache License 2.0

| website = {{URL|https://trino.io}}

}}

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources.{{cite web |title=Overview — Trino 468 Documentation |url=https://trino.io/docs/468/overview.html |website=trino.io |access-date=27 December 2024}} Trino can query data lakes that contain a variety of file formats such as simple row-oriented CSV and JSON data files to more performant open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage using the Hive{{cite web |title=Hive connector — Trino 393 Documentation |url=https://trino.io/docs/393/connector/hive.html |website=trino.io}} and Iceberg{{cite web |title=Iceberg connector — Trino 393 Documentation |url=https://trino.io/docs/393/connector/iceberg.html |website=trino.io |access-date=25 August 2022}} table formats. Trino also has the ability to run federated queries that query tables in different data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch.{{cite web |title=Connectors — Trino 393 Documentation |url=https://trino.io/docs/393/connector.html |website=trino.io |access-date=25 August 2022}} Trino is released under the Apache License.{{cite web |title=trinodb/trino LICENSE |url=https://github.com/trinodb/trino/blob/master/LICENSE |publisher=Trino |access-date=25 August 2022 |date=25 August 2022}}

History

In January 2019, the original creators of Presto, Martin Traverso, Dain Sundstrom, and David Phillips, created a fork of the Presto project. They initially kept the name Presto and used the PrestoSQL web handle to distinguish it from the original PrestoDB project. Simultaneously, they announced the Presto Software Foundation. The foundation is a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL query engine.{{Cite web|url=https://www.prweb.com/releases/presto-software-foundation-launches-to-advance-presto-open-source-community-815915772.html|title=Presto Software Foundation Launches to Advance Presto Open Source Community|website=PRWeb|access-date=2019-02-01}}{{Cite web|url=https://thenewstack.io/prestos-new-foundation-signals-growth-for-the-big-data-sql-engine/|title=Presto's New Foundation Signals Growth for the Big Data SQL Engine|date=2019-01-31|website=The New Stack|language=en-US|access-date=2019-02-01}}

In December 2020, PrestoSQL was rebranded as Trino. The Trino Software Foundation, code base, and all other PrestoSQL assets were renamed as part of the rebrand.{{cite web |last1=Traverso |first1=Martin |last2=Sundstrom |first2=Dain |last3=Phillips |first3=David |title=We're rebranding PrestoSQL as Trino |url=https://trino.io/blog/2020/12/27/announcing-trino.html |website=trino.io |access-date=7 September 2021 |language=en |date=27 December 2020}}

Presto and Trino were originally designed and developed by Martin, Dain, David, and Eric Hwang at Facebook to allow data analysts to run interactive queries on its large data warehouse in Apache Hadoop. Trino shares the first six years of development with the Presto project.{{cite web |title=Contributors to trinodb/trino |url=https://github.com/trinodb/trino/graphs/contributors?from=2012-08-05&to=2018-08-05&type=c |website=GitHub |access-date=20 September 2021 |language=en}}{{cite web |title=Contributors to prestodb/presto |url=https://github.com/prestodb/presto/graphs/contributors?from=2012-08-05&to=2018-08-05&type=c |website=GitHub |access-date=20 September 2021 |language=en}} To learn more about the earlier history of Trino, you can reference the Presto history section.

Trino is used in many data platforms and products from cloud providers and other vendors. Customization of these products varies from pure Trino usage to heavily customized systems to run a data platform or integration in specialized data platforms for usage with specific data. [https://trino.io/users Examples include Amazon Athena, Starburst Galaxy, Dune, and many others.]

Architecture

File:Figure 4-1 Trino architecture.png

Trino is written in Java.{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 2. Installing and Configuring Trino |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=19–24}} It runs on a cluster of servers that contains two types of nodes, a coordinator and a worker.

  • The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the service provider interface (SPI) to obtain the available tables, table statistics, and other information needed to carry out its tasks.
  • The workers are responsible for executing the tasks and operators fed to them by the scheduler. These tasks process rows from the data sources which produce results that are returned to the coordinator and ultimately back to the client.

Trino adheres to the ANSI SQL{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 1. Introducing Trino |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=3–17}} standard and includes various parts of the following ANSI specifications: SQL-92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016, SQL:2023.

Trino supports the separation of compute and storage and may be deployed both on-premises and in the cloud.{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 13. Real-World Examples |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=267–272}}

Trino has a Distributed computing MPP architecture. Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads.

See also

References

{{Reflist}}