Draft:Velox (execution engine)

{{AFC submission|d|nn|u=Pedroerp-wiki|ns=118|decliner=TheTechie|declinets=20250123035600|ts=20241112050942}}

{{AFC submission|d|v|u=Pedroerp-wiki|ns=118|decliner=SafariScribe|declinets=20241026213820|small=yes|ts=20241026200745}}

{{AFC comment|1=Needs more secondary sources. thetechie@enwiki (she/they {{pipe}} talk) 03:56, 23 January 2025 (UTC)}}

----

{{Short description|Open Source composable execution engine for data management systems}}

{{Draft topics|internet-culture|software|computing|technology}}

{{AfC topic|stem}}

{{Multiple issues|

{{Third-party|date=October 2024}}

{{Cleanup bare URLs|date=October 2024}}

}}

{{Infobox software

| name = Velox

| developer = Velox OSS Community

| released = {{start date and age|2022}}

| programming language = C++

| operating system = Cross-platform

| genre = Database

| license = Apache License 2.0

| repo = {{URL|https://github.com/facebookincubator/velox}}

| website = {{URL|https://velox-lib.io/}}

}}

Velox is an open source composable execution engine written and distributed as a C++ library.

{{Cite conference

| url=https://vldb.org/pvldb/vol15/p3372-pedreira.pdf

| title=Velox: Meta's Unified Execution Engine

| date=2022

| publisher=VLDB Endowment

| book-title=Proceedings of the VLDB Endowment

| conference=48th International Conference on Very Large Databases

| pages=3372–3384

| location=Sydney, Australia

| id=10.14778/3554821.3554829

| last1= Pedreira

| first1=Pedro

| last2=Erling

| first2=Orri

| last3=Basmanova

| first3=Masha

| last4=Wilfong

| first4=Kevin

| last5=Sakka

| first5=Laith

| last6=Pai

| first6=Krishna

| last7=He

| first7=Wei

| last8=Chattopadhyay

| first8=Biswapesh

}}

.

Velox provides reusable, high-performance, and extensible data processing components that can be used when building data management systems. Velox implements the execution engine layer as defined in the composable data stack

{{Cite conference

| title = The Composable Data Management System Manifesto

| url = https://www.vldb.org/pvldb/vol16/p2679-pedreira.pdf

| date=2023

| publisher=VLDB Endowment

| book-title=Proceedings of the VLDB Endowment

| conference=49th International Conference on Very Large Databases

| pages=2150–8097

| location=Vancouver, Canada

| id=10.14778/3603581.3603604

| last1=Pedreira

| first1=Pedro

| last2=Erling

| first2=Orri

| last3=Karanasos

| first3=Konstantinos

| last4=Schneider

| first4=Scott

| last5=McKinney

| first5=Wes

| last6=Valluri

| first6=Satya

| last7=Zait

| first7=Mohamed

| last8=Nadeau

| first8=Jacques

}}

, and as such relies on clients (the engine using the library) to provide a language frontend, an optimizer, and an execution runtime environment. Engines integrate with Velox by providing an optimized query plan, and relying on Velox for its execution.

Velox was created by Meta in 2020 and open sourced in 2022

{{Cite web

| title = Introducing Velox: An open source unified execution engine

| url = https://engineering.fb.com/2023/03/09/open-source/velox-open-source-execution-engine/

| publisher=Engineering Blog at Meta

| date=2023

| access-date=2024-11-11

}}

{{Cite web

| title = Meta's Velox Means Database Performance Is Not Subject To Interpretation

| url = https://www.nextplatform.com/2022/08/31/metas-velox-means-database-performance-is-not-subject-to-interpretation/

| publisher=The Next Platform

| author=Timothy Morgan

| date=2022

| access-date=2024-11-11

}}.

It is today used to accelerate Presto (the Prestissimo project), Spark (using the Apache Gluten project

{{Cite conference

| title = The Gluten Open-Source Software Project: Modernizing Java-based Query Engines for the Lakehouse Era

| url = https://ceur-ws.org/Vol-3462/CDMS8.pdf

| date=2023

| conference=VLDB International Workshop on Composable Data Management Systems (CDMS'23)

| pages=2150–8097

| location=Vancouver, Canada

| last1=Shankaran

| first1=Akash

| last2=Gu

| first2=George

| last3=Chen

| first3=Weiting

| last4=Yang

| first4=Binwei

| last5=Kulkarni

| first5=Chidamber

| last6=Rambacher

| first6=Mark

| last7=Tatbul

| first7=Nesime

| last8=Cohen

| first8=David

}}

), Voltron Data's Theseus engine, and a series of other systems within Meta and across the industry.

History

Velox was created in 2020 at Meta by Orri Erling and Masha Basmanova, soon joined by Pedro Pedreira. Velox's initial target was to accelerate Presto queries as an extension of the Aria project by rewriting the engine in C++. Given the amount of teams at Meta interested on high-performance building blocks for data management system, Velox was created as an extensible and reusable library, and early on adopted by Meta's stream processing platform (XStream), then by Presto (Prestissimo project) and a series of other systems related to data warehouse ingestion, realtime processing, and data for AI/ML.

Velox was open sourced in 2022. Companies like Ahana

{{Cite web| title = Ahana Joins Leading Open Source Innovators in its Commitment to the Velox Open Source Project Created by Meta| url = https://www.infoworld.com/article/2337375/ahana-joins-leading-open-source-innovators-in-its-commitment-to-the-velox-open-source-project-creat.html|publisher=InfoWorld| author=Beth Winkowski| date=2022| access-date=2024-11-11}}

(eventually acquired by IBM in 2024{{Cite web| title = IBM joins the Presto Foundation through acquisition of Ahana| url = https://prestodb.io/blog/2023/04/12/ibm-joins-presto-foundation/| publisher= PrestoDB Foundation| author=Vikram Murali and Steven Mih |date=2023-04-12| access-date=2024-11-11}}), Intel, Byte Dance, and Voltron Data joined the project early on. Other companies such as Microsoft, Uber, NVidia, Alibaba, Pinterest, Meituan and others are active contributors.

Features

Velox provides the following features:

  • Operators: implementation of relational operators such as TableScan, TableWriter, Filter, Project, Aggregation, Joins, Shuffle/Exchange, and more.
  • Vectors: An Arrow-compatible columnar memory layout module, providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes.
  • Expression Eval: A vectorized and extensible expression evaluation engine, providing features such as encoding peeling and fast-paths, memoization, constant folding, conjunct re-ordering and more.
  • Storage and IO: Support for file formats such as Parquet, ORC/DWRF, Nimble, table formats such as Iceberg, network serialization protocols such as Presto Page and Spark UnsafeRow, and cloud storage such as S3, HDFS, GCS, ABFS, and more.

Performance

Velox's execution model is columnar and based on vectorization. Using this model, physical operators are decomposed in small and concise loops of computation (little loops) that can be more efficient processed by modern CPUs. Vectorization provides better data and instruction locality, and enables CPUs to more efficiently leverage techniques such as out-of-order execution and SIMD instructions.

Velox also implements compressed execution by leveraging cascading encodings such as dictionaries, constant, and RLEs during execution to more efficiently implement database operations. Physical operators usually provide multiple paths of execution (where leveraging data encodings is beneficial), and can also generate data that is encoded using the input.

Velox also makes use of lazy materialization techniques to delay the materialization of data to the point during execution when the data is in fact needed. Such techniques along with prefetching, preloading, and IO coalescing improve IO efficiency and reduce the amount of data read and decoded.

Due to these and other performance features, Velox is reported to present 3-4x superior efficiency if compared to systems like vanilla Presto or Spark

{{Cite web

| title = New C++ Acceleration Library Velox Juices Code Execution Up To 8x

| url = https://www.bigdatawire.com/2022/08/31/new-c-acceleration-library-velox-juices-code-execution-up-to-8x/=

| author=Alex Woodie

| publisher=Big Data Wire

| date=2022

| access-date=2024-11-11

}}.

Integrations

  • Presto, through the Prestissimo (or Presto Native) effort.
  • Apache Spark, through Apache Gluten{{Cite web

| title = The Apache Gluten Project

| url = https://gluten.apache.org/

| publisher=The Apache Software Foundation

| date=2023

| access-date=2024-11-11

}}

  • Voltron Data Theseus.

References

{{reflist}}