Apache Airflow
{{Short description|Open-source workflow management platform}}
{{Infobox software
| name = Apache Airflow
| logo = AirflowLogo.png
| caption =
| logo alt = Apache Airflow logo
| author = Maxime Beauchemin / Airbnb
| developer = Apache Software Foundation
| released = [https://github.com/apache/airflow/releases/tag/1.0.0 {{Start date and age|2015|06|03}}]
| latest release version =
| latest release date =
| latest preview version =
| latest preview date =
| repo =
| programming language = Python
| operating system = macOS, Linux
| size =
| language =
| genre = Workflow management platform
| license = Apache License 2.0
| website = {{URL|https://airflow.apache.org}}
}}
Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014{{Cite web|url=https://airflow.apache.org/project.html|title=Apache Airflow|website=Apache Airflow|url-status=live|archive-url=https://web.archive.org/web/20190812084339/https://airflow.apache.org/project.html|archive-date=August 12, 2019|access-date=September 30, 2019}} as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface.{{Cite web|url=https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8|title=Airflow: a workflow management platform|last=Beauchemin|first=Maxime|date=June 2, 2015|website=Medium|url-status=live|archive-url=https://web.archive.org/web/20190813011749/https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8|archive-date=August 13, 2019|access-date=September 30, 2019}}{{Cite web|url=https://airbnb.io/projects/airflow/|title=Airflow|url-status=live|archive-url=https://web.archive.org/web/20190706014909/https://airbnb.io/projects/airflow/|archive-date=July 6, 2019|access-date=September 30, 2019}} From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.
Airflow is written in Python, and workflows are created via Python scripts. Airflow is designed under the principle of "configuration as code". While other "configuration as code" workflow platforms exist using markup languages like XML, using Python allows developers to import libraries and classes to help them create their workflows.
Overview
Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. DAGs can be run either on a defined schedule (e.g. hourly or daily) or based on external event triggers (e.g. a file appearing in Hive{{Cite web|url=http://bytepawn.com/airflow.html|title=Airflow review|last=Trencseni|first=Marton|date=January 16, 2016|website=BytePawn|url-status=live|archive-url=https://web.archive.org/web/20190228235838/http://bytepawn.com/airflow.html|archive-date=February 28, 2019|access-date=October 1, 2019}}). Previous DAG-based schedulers like Oozie and Azkaban tended to rely on multiple configuration files and file system trees to create a DAG, whereas in Airflow, DAGs can often be written in one Python file.{{Cite web|url=https://cwiki.apache.org/confluence/display/incubator/AirflowProposal|title=AirflowProposal|date=March 28, 2019|website=Apache Software Foundation|access-date=October 1, 2019}}
Managed providers
Three notable providers offer ancillary services around the core open source project.
- [https://astronomer.io/ Astronomer] has built a SaaS tool and Kubernetes-deployable Airflow stack that assists with monitoring, alerting, devops, and cluster management.{{Cite web|url=https://www.americaninno.com/cincy/cincy-startups/astronomer-is-now-the-apache-airflow-company/|title=Astronomer is Now the Apache Airflow Company|last=Lipp|first=Cassie|date=July 13, 2018|website=americaninno|access-date=September 18, 2019}}
- Cloud Composer is a managed version of Airflow that runs on Google Cloud Platform (GCP) and integrates well with other GCP services.{{Cite web|url=https://techcrunch.com/2018/05/01/google-launches-cloud-composer-a-new-workflow-automation-tool-for-developers/|title=Google launches Cloud Composer, a new workflow automation tool for developers|website=TechCrunch|date=May 2018 |language=en-US|access-date=2019-09-18}}
- Amazon Web Services offers Managed Workflows for Apache Airflow starting from November 2020.{{Cite web|date=2020-11-24|title=Introducing Amazon Managed Workflows for Apache Airflow (MWAA)|url=https://aws.amazon.com/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/|access-date=2020-12-17|website=Amazon Web Services|language=en-US}}
References
{{Reflist}}
External links
- {{official}}
{{Apache Software Foundation}}
{{DEFAULTSORT:Airflow}}