Airflow is not a standard python project. Most of the python projects fall into one of two types - application or library. As described in this StackOverflow question, the decision whether to pin (freeze) dependency versions for a python project depends on the type. For applications, dependencies should be pinned, but for libraries, they should be open.
For applications, pinning the dependencies makes it more stable to install in the future - because new (even transitive) dependencies might cause installation to fail. For libraries - the dependencies should be open to allow several different libraries with the same requirements to be installed at the same time.
The problem is that Apache Airflow is a bit of both - application to install and library to be used when you are developing your own operators and DAGs.
This - seemingly unsolvable - puzzle is solved by having pinned constraints files.
Note
Only pip
installation is officially supported.
While it is possible to install Airflow with tools like poetry or
pip-tools, they do not share the same workflow as
pip
- especially when it comes to constraint vs. requirements management.
Installing via Poetry
or pip-tools
is not currently supported.
There are known issues with bazel
that might lead to circular dependencies when using it to install
Airflow. Please switch to pip
if you encounter such problems. The Bazel
community added support
for cycles in this PR so it might be that
newer versions of bazel
will handle it.
If you wish to install airflow using these tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.
By default when you install apache-airflow
package - the dependencies are as open as possible while
still allowing the apache-airflow
package to install. This means that the apache-airflow
package
might fail to install when a direct or transitive dependency is released that breaks the installation.
In that case, when installing apache-airflow
, you might need to provide additional constraints (for
example pip install apache-airflow==1.10.2 Werkzeug<1.0.0
)
There are several sets of constraints we keep:
- 'constraints' - these are constraints generated by matching the current airflow version from sources
- and providers that are installed from PyPI. Those are constraints used by the users who want to
install airflow with pip, they are named
constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt
.
- "constraints-source-providers" - these are constraints generated by using providers installed from
current sources. While adding new providers their dependencies might change, so this set of providers
is the current set of the constraints for airflow and providers from the current main sources.
Those providers are used by CI system to keep "stable" set of constraints. They are named
constraints-source-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt
- "constraints-no-providers" - these are constraints generated from only Apache Airflow, without any
providers. If you want to manage airflow separately and then add providers individually, you can
use them. Those constraints are named
constraints-no-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt
.
The first two can be used as constraints file when installing Apache Airflow in a repeatable way. It can be done from the sources:
from the PyPI package:
pip install "apache-airflow[google,amazon,async]==2.2.5" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.5/constraints-3.8.txt"
The last one can be used to install Airflow in "minimal" mode - i.e when bare Airflow is installed without extras.
When you install airflow from sources (in editable mode) you should use "constraints-source-providers" instead (this accounts for the case when some providers have not yet been released and have conflicting requirements).
pip install -e ".[devel]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt"
This also works with extras - for example:
pip install ".[ssh]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt"
There are different set of fixed constraint files for different python major/minor versions and you should use the right file for the right python version.
If you want to update just the Airflow dependencies, without paying attention to providers, you can do it
using constraints-no-providers
constraint files as well.
pip install . --upgrade \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt"
The constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt
and constraints-no-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt
will be automatically regenerated by CI job every time after the pyproject.toml
is updated and pushed
if the tests are successful.
Note
Only pip
installation is currently officially supported.
While there are some successes with using other tools like poetry or
pip-tools, they do not share the same workflow as
pip
- especially when it comes to constraint vs. requirements management.
Installing via Poetry
or pip-tools
is not currently supported.
There are known issues with bazel
that might lead to circular dependencies when using it to install
Airflow. Please switch to pip
if you encounter such problems. Bazel
community works on fixing
the problem in this PR so it might be that
newer versions of bazel
will handle it.
If you wish to install airflow using these tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.
There are a number of extras that can be specified when installing Airflow. Those
extras can be specified after the usual pip install - for example pip install -e.[ssh]
for editable
installation. Note that there are two kinds of extras - regular
extras (used when you install
airflow as a user, but in editable
mode you can also install devel
extras that are necessary if
you want to run airflow locally for testing and doc
extras that install tools needed to build
the documentation.
This is the full list of these extras:
The devel
extras are not available in the released packages. They are only available when you install
Airflow from sources in editable
installation - i.e. one that you are usually using to contribute to
Airflow. They provide tools such as pytest
and mypy
for general purpose development and testing, also
some providers have their own development-related extras tbat allow to install tools necessary to run tests,
where the tools are specific for the provider.
devel, devel-all, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel- hadoop, devel-mypy, devel-sentry, devel-static-checks, devel-tests
The doc
extras are not available in the released packages. They are only available when you install
Airflow from sources in editable
installation - i.e. one that you are usually using to contribute to
Airflow. They provide tools needed when you want to build Airflow documentation (note that you also need
devel
extras installed for airflow and providers in order to build documentation for airflow and
provider packages respectively). The doc
package is enough to build regular documentation, where
doc_gen
is needed to generate ER diagram we have describing our database.
doc, doc-gen
Those extras are available as regular Airflow extras and are targeted to be used by Airflow users and contributors to select features of Airflow they want to use They might install additional providers or just install dependencies that are necessary to enable the feature.
aiobotocore, airbyte, alibaba, all, all-core, all-dbs, amazon, apache-atlas, apache-beam, apache- cassandra, apache-drill, apache-druid, apache-flink, apache-hdfs, apache-hive, apache-impala, apache-kafka, apache-kylin, apache-livy, apache-pig, apache-pinot, apache-spark, apache-webhdfs, apprise, arangodb, asana, async, atlas, atlassian-jira, aws, azure, cassandra, celery, cgroups, cloudant, cncf-kubernetes, cohere, common-io, common-sql, crypto, databricks, datadog, dbt-cloud, deprecated-api, dingding, discord, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, github-enterprise, google, google-auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, ldap, leveldb, microsoft-azure, microsoft-mssql, microsoft-psrp, microsoft-winrm, mongo, mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, pydantic, qdrant, rabbitmq, redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, teradata, trino, uv, vertica, virtualenv, weaviate, webhdfs, winrm, yandex, zendesk
You can now check how to update Airflow's metadata database if you need to update structure of the DB.