Alluxio
{{Infobox software
| name = Alluxio
| author = Haoyuan Li
| developer = UC Berkeley AMPLab
| released = [https://www.alluxio.io/download {{Start date and age|2013|04|08}}]
| latest release version = v2.9.4
| latest release date = {{Start date and age|2024|06|11}}{{cite web
|url = https://github.com/Alluxio/alluxio/releases
|title = Releases · Alluxio/alluxio
|website = github.com
|access-date = 2025-02-09
}}
| operating system = macOS, Linux
| programming language = Java
| license = Apache License 2.0
| website = {{URL|https://www.alluxio.io}}
| repo = https://github.com/Alluxio/alluxio
| language = Java
}}
Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis,
{{cite tech report
| last1 = Li | first1 = Haoyuan
| title = Alluxio: A Virtual Distributed File System
| institution = EECS Department, University of California, Berkeley
| date = 7 May 2018
| number = UCB/EECS-2018-29
| url = https://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-29.html
}} advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio is situated between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.
Data Driven Applications, such as Data Analytics, Machine Learning, and AI, use APIs (such as Hadoop HDFS API, S3 API, FUSE API) provided by Alluxio to interact with data from various storage systems at a fast speed. Popular frameworks running on top of Alluxio include Apache Spark,{{cite web |title=Running Spark on Alluxio - Alluxio v2.9.5 (stable) |url=https://docs.alluxio.io/os/user/stable/en/compute/Spark.html |website=Alluxio |access-date=14 February 2025}} Presto, TensorFlow, Trino,{{cite web |title=Alluxio file system support — Trino 470 Documentation |url=https://trino.io/docs/current/object-storage/file-system-alluxio.html |website=trino.io |access-date=14 February 2025}} Apache Hive, and PyTorch, etc.{{cn|date=February 2025}}
Alluxio can be deployed on-premise, in the cloud (e.g. Microsoft Azure, AWS, Google Compute Engine), or a hybrid cloud environment. It can run on bare-metal or in containerized environments such as Kubernetes, Docker, Apache Mesos.
History
Alluxio was initially started by Haoyuan Li at UC Berkeley's AMPLab in 2013, and open sourced in 2014. Alluxio had in excess of 1000 contributors in 2018,[https://www.openhub.net/p/tachyon Open HUB Alluxio development activity] making it one of the most active projects in the data eco-system.
class="wikitable" |
Version
! Original release date ! Latest version ! Release date |
---|
{{Version|o|0.2}}
| 2013-04-08 | 0.2.1 | 2013-04-25 |
{{Version|o|0.3}}
| 2013-10-21 | 0.3.0 | 2013-10-21 |
{{Version|o|0.4}}
| 2014-02-02 | 0.4.1 | 2014-02-25 |
{{Version|o|0.5}}
| 2014-07-20 | 0.5.0 | 2014-07-20 |
{{Version|o|0.6}}
| 2015-03-01 | 0.6.4 | 2015-04-23 |
{{Version|o|0.7}}
| 2015-07-17 | 0.7.1 | 2015-08-10 |
{{Version|o|0.8}}
| 2015-10-21 | 0.8.2 | 2015-11-10 |
{{Version|o|1.0}}
| 2016-02-23 | 1.0.1 | 2016-03-27 |
{{Version|o|1.1}}
| 2016-06-06 | 1.1.1 | 2016-07-04 |
{{Version|o|1.2}}
| 2016-07-17 | 1.2.0 | 2016-07-17 |
{{Version|o|1.3}}
| 2016-10-05 | 1.3.0 | 2016-10-05 |
{{Version|o|1.4}}
| 2017-01-12 | 1.4.0 | 2017-01-12 |
{{Version|o|1.5}}
| 2017-06-11 | 1.5.0 | 2017-06-11 |
{{Version|o|1.6}}
| 2017-09-24 | 1.6.1 | 2017-11-02 |
{{Version|o|1.7}}
| 2018-01-14 | 1.7.1 | 2018-03-26 |
{{Version|co|1.8}}
| 2018-07-07 | 1.8.2 | 2019-08-05 |
{{Version|co|2.0}}
| 2019-06-27 | 2.0.1 | 2019-09-03 |
{{Version|co|2.1}}
| 2019-11-06 | 2.1.2 | 2020-02-04 |
{{Version|co|2.2}}
| 2020-03-11 | 2.2.2 | 2020-06-24 |
{{Version|co|2.3}}
| 2020-06-30 | 2.3.0 | 2020-06-30 |
{{Version|co|2.4}}
| 2020-10-19 | 2.4.1 | 2020-11-20 |
{{Version|co|2.5}}
| 2021-03-10 | 2.5.0 | 2021-03-10 |
{{Version|co|2.6}}
| 2021-06-23 | 2.6.2 | 2021-09-17 |
{{Version|co|2.7}}
| 2021-11-16 | 2.7.4 | 2022-04-19 |
{{Version|co|2.8}}
| 2022-05-04 | 2.8.1 | 2022-08-17 |
{{Version|c|2.9}}
| 2022-11-16 | 2.9.3 | 2023-03-27 |
colspan="4" | {{Version |l |show=111110}} |
Enterprises that use Alluxio
The following is a list of notable enterprises that have used or are using Alluxio:
{{columns-list|colwidth=15em|
- Baidu{{cite web|url=https://readwrite.com/2016/02/22/new-fast-sql-project/|title=This New Open Source Project Is 100X Faster than Spark SQL In Petabyte-Scale Production}}
- Barclays{{cite web|url=https://dzone.com/articles/Accelerate-In-Memory-Processing-with-Spark-from-Hours-to-Seconds-With-Tachyon|title=Making the Impossible Possible with Tachyon: Accelerate Spark Jobs from Hours to Seconds}}
- China Unicom{{cite web|url=https://www.techrepublic.com/article/china-unicoms-big-bet-on-open-source/|title=China Unicom's big bet on open source}}
- Comcast{{cite web |url=https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to-predictions|title=Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions}}
- Cray{{cite web|url=https://www.cray.com/blog/cray-analytics-alluxio-wrangling-enterprise-storage/|title=Cray Analytics and Alluxio – Wrangling Enterprise Storage|access-date=2019-02-19|archive-date=2019-07-14|archive-url=https://web.archive.org/web/20190714162915/https://www.cray.com/blog/cray-analytics-alluxio-wrangling-enterprise-storage/|url-status=dead}}
- DiDi Chuxing{{cite web |url=https://www.slideshare.net/Alluxio/alluxios-use-and-practice-in-didi|title=Alluxio's Use and Practice in Didi}}
- DBS Bank{{cite web |url=https://app.qwoted.com/opportunities/event-data-transformation-in-financial-services-featuring-dbs-bank-story|title=Data Transformation in Financial Services}}
- Esri{{cite web |url=https://www.alluxio.com/blog/arcgis-and-alluxio-using-alluxio-to-enhance-arcgis-data-capability-and-get-faster-insights-from-all-your-data|title=ArcGIS and Alluxio - Using Alluxio to enhance ArcGIS data capability and get faster insights from all your data}}
- Huawei{{cite web |url=https://www.theregister.co.uk/2016/09/01/huawei_hugs_alluxio_for_memory_analytics_data_caching/?mt=1472739398540|title=Huawei hugs open-sourcey Alluxio: Thanks for the memories|website=The Register }}
- IBM{{cite web|url=https://developer.ibm.com/code/2017/04/05/how-alluxio-is-accelerating-apache-spark-workloads/|title=How Alluxio is Accelerating Apache Spark Workloads|access-date=2019-02-19|archive-url=https://web.archive.org/web/20190714162905/https://developer.ibm.com/code/2017/04/05/how-alluxio-is-accelerating-apache-spark-workloads/|archive-date=2019-07-14|url-status=dead}}
- Intel{{cite web |url=https://software.intel.com/en-us/blogs/2016/02/04/getting-started-with-tachyon-by-use-cases|title=Getting Started with Tachyon by Use Cases}}
- JD.com{{cite web |url=https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69052|title=Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks}}
- Lenovo{{cite web |url=https://globenewswire.com/news-release/2017/11/21/1197958/0/en/World-s-Largest-Computer-Maker-Lenovo-Selects-Alluxio-for-Data-Management-of-Worldwide-Smartphone-Data.html|title=World's Largest Computer Maker Lenovo Selects Alluxio for Data Management of Worldwide Smartphone Data}}
- Samsung{{cite web |url=https://www.samsung.com/semiconductor/insights/tech-leadership/enhancing-the-value-of-alluxio-with-samsung-nvme-ssds/|title=Enhancing the Value of Alluxio with Samsung NVMe SSDs}}
- Tencent{{cite web |url=https://www.alluxio.com/blog/tencent-case-study-delivering-customized-news-to-over-100-million-users-per-month-with-alluxio|title=Tencent Delivering Customized News to Over 100 Million Users per Month with Alluxio}}
- Vipshop{{cite web |url=https://www.slideshare.net/Alluxio/the-practice-of-alluxio-in-near-realtime-data-platform-at-vipshop-chinese|title=The Practice of Alluxio in Near Real-Time Data Platform at VIPShop}}
- Wells Fargo{{cite web |url=https://2018gputechconf.smarteventscloud.com/connect/sessionDetail.ww?SESSION_ID=152068|title=Bringing Data to Life - Data Management and Visualization Techniques}}
}}
See also
References
{{Reflist|30em}}