Draft:Velox (execution engine)


Velox
Developer(s)Velox OSS Community
Initial release2022; 2 years ago (2022)
Repositorygithub.com/facebookincubator/velox
Written inC++
Operating systemCross-platform
TypeDatabase
LicenseApache License 2.0
Websitevelox-lib.io

Velox is an open source composable execution engine written and distributed as a C++ library [1]. Velox provides reusable, high-performance, and extensible data processing components that can be used when building data management systems. Velox implements the execution engine layer as defined in the composable data stack [2] , and as such relies on clients (the engine using the library) to provide a language frontend, an optimizer, and an execution runtime environment. Engines integrate with Velox by providing an optimized query plan, and relying on Velox for its execution.

Velox was created by Meta in 2020 and open sourced in 2022 [3] [4]. It is today used to accelerate Presto (the Prestissimo project), Spark (using the Apache Gluten project [5]), Voltron Data's Theseus engine, and a series of other systems within Meta and across the industry.

History

edit

Velox was created in 2020 at Meta by Orri Erling and Masha Basmanova, soon joined by Pedro Pedreira. Velox's initial target was to accelerate Presto queries as an extension of the Aria project by rewriting the engine in C++. Given the amount of teams at Meta interested on high-performance building blocks for data management system, Velox was created as an extensible and reusable library, and early on adopted by Meta's stream processing platform (XStream), then by Presto (Prestissimo project) and a series of other systems related to data warehouse ingestion, realtime processing, and data for AI/ML.

Velox was open sourced in 2022. Companies like Ahana [6] (eventually acquired by IBM in 2024[7]), Intel, Byte Dance, and Voltron Data joined the project early on. Other companies such as Microsoft, Uber, NVidia, Alibaba, Pinterest, Meituan and others are active contributors.

Features

edit

Velox provides the following features:

  • Operators: implementation of relational operators such as TableScan, TableWriter, Filter, Project, Aggregation, Joins, Shuffle/Exchange, and more.
  • Vectors: An Arrow-compatible columnar memory layout module, providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes.
  • Expression Eval: A vectorized and extensible expression evaluation engine, providing features such as encoding peeling and fast-paths, memoization, constant folding, conjunct re-ordering and more.
  • Storage and IO: Support for file formats such as Parquet, ORC/DWRF, Nimble, table formats such as Iceberg, network serialization protocols such as Presto Page and Spark UnsafeRow, and cloud storage such as S3, HDFS, GCS, ABFS, and more.

Performance

edit

Velox's execution model is columnar and based on vectorization. Using this model, physical operators are decomposed in small and concise loops of computation (little loops) that can be more efficient processed by modern CPUs. Vectorization provides better data and instruction locality, and enables CPUs to more efficiently leverage techniques such as out-of-order execution and SIMD instructions.

Velox also implements compressed execution by leveraging cascading encodings such as dictionaries, constant, and RLEs during execution to more efficiently implement database operations. Physical operators usually provide multiple paths of execution (where leveraging data encodings is beneficial), and can also generate data that is encoded using the input.

Velox also makes use of lazy materialization techniques to delay the materialization of data to the point during execution when the data is in fact needed. Such techniques along with prefetching, preloading, and IO coalescing improve IO efficiency and reduce the amount of data read and decoded.

Due to these and other performance features, Velox is reported to present 3-4x superior efficiency if compared to systems like vanilla Presto or Spark [8].

Integrations

edit
  • Presto, through the Prestissimo (or Presto Native) effort.
  • Apache Spark, through Apache Gluten[9].
  • Voltron Data Theseus.

References

edit
  1. ^ Pedreira, Pedro; Erling, Orri; Basmanova, Masha; Wilfong, Kevin; Sakka, Laith; Pai, Krishna; He, Wei; Chattopadhyay, Biswapesh (2022). "Velox: Meta's Unified Execution Engine" (PDF). Proceedings of the VLDB Endowment. 48th International Conference on Very Large Databases. Sydney, Australia: VLDB Endowment. pp. 3372–3384. 10.14778/3554821.3554829.
  2. ^ Pedreira, Pedro; Erling, Orri; Karanasos, Konstantinos; Schneider, Scott; McKinney, Wes; Valluri, Satya; Zait, Mohamed; Nadeau, Jacques (2023). "The Composable Data Management System Manifesto" (PDF). Proceedings of the VLDB Endowment. 49th International Conference on Very Large Databases. Vancouver, Canada: VLDB Endowment. pp. 2150–8097. 10.14778/3603581.3603604.
  3. ^ "Introducing Velox: An open source unified execution engine". Engineering Blog at Meta. 2023. Retrieved 2024-11-11.
  4. ^ Timothy Morgan (2022). "Meta's Velox Means Database Performance Is Not Subject To Interpretation". The Next Platform. Retrieved 2024-11-11.
  5. ^ Shankaran, Akash; Gu, George; Chen, Weiting; Yang, Binwei; Kulkarni, Chidamber; Rambacher, Mark; Tatbul, Nesime; Cohen, David (2023). The Gluten Open-Source Software Project: Modernizing Java-based Query Engines for the Lakehouse Era (PDF). VLDB International Workshop on Composable Data Management Systems (CDMS'23). Vancouver, Canada. pp. 2150–8097.
  6. ^ Beth Winkowski (2022). "Ahana Joins Leading Open Source Innovators in its Commitment to the Velox Open Source Project Created by Meta". InfoWorld. Retrieved 2024-11-11.
  7. ^ Vikram Murali and Steven Mih (2023-04-12). "IBM joins the Presto Foundation through acquisition of Ahana". PrestoDB Foundation. Retrieved 2024-11-11.
  8. ^ Alex Woodie (2022). "New C++ Acceleration Library Velox Juices Code Execution Up To 8x". Big Data Wire. Retrieved 2024-11-11.
  9. ^ "The Apache Gluten Project". The Apache Software Foundation. 2023. Retrieved 2024-11-11.