Dagster is a system for building modern data applications.
Elegant programming model: Dagster provides a set of abstractions for building self-describing, testable, and reliable data applications. It embraces the principles of functional data programming; gradual, optional typing; and testability as a first-class value.
Flexible & incremental: Dagster integrates with your existing tools and systems, and can invoke any computation–whether it be Spark, Python, a Jupyter notebook, or SQL. It is also designed to work with your existing systems like Kubernetes.
Beautiful tools: Dagster's development environment, dagit, is designed to facilitate local development for data engineers, machine learning engineers, and data scientists. It also can be run as a production service, to support operating, debugging, and maintaining large-scale production data pipelines.
pip install dagster dagit
This installs two modules:
- Dagster: the core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
- Dagit: the UI for developing and operating Dagster pipelines, including a DAG browser, a type-aware config editor, and a live execution interface.
hello_dagster.py
from dagster import execute_pipeline, pipeline, solid
@solid
def get_name(_):
return 'dagster'
@solid
def hello(context, name: str):
context.log.info('Hello, {name}!'.format(name=name))
@pipeline
def hello_pipeline():
hello(get_name())
Save the code above in a file named hello_dagster.py
. You can execute the pipeline using any one
of the following methods:
(1) Dagster Python API
if __name__ == "__main__":
execute_pipeline(hello_pipeline) # Hello, dagster!
(2) Dagster CLI
$ dagster pipeline execute -f hello_dagster.py
(3) Dagit web UI
$ dagit -f hello_dagster.py
Next, jump right into our tutorial, or read our complete documentation. If you're actively using Dagster or have questions on getting started, we'd love to hear from you:
For details on contributing or running the project for development, check out our contributing
guide.
Dagster works with the tools and systems that you're already using with your data, including:
Integration | Dagster Library | |
Apache Airflow | dagster-airflow Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs. |
|
Apache Spark | dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and PySpark. |
|
Dask | dagster-dask
Provides a Dagster integration with Dask / Dask.Distributed. |
|
Datadog | dagster-datadog
Provides a Dagster resource for publishing metrics to Datadog. |
|
/ | Jupyter / Papermill | dagstermill Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines. |
PagerDuty | dagster-pagerduty
A library for creating PagerDuty alerts from Dagster workflows. |
|
Snowflake | dagster-snowflake
A library for interacting with the Snowflake Data Warehouse. |
|
Cloud Providers | ||
AWS | dagster-aws
A library for interacting with Amazon Web Services. Provides integrations with Cloudwatch, S3, EMR, and Redshift. |
|
Azure | dagster-azure
A library for interacting with Microsoft Azure. |
|
GCP | dagster-gcp
A library for interacting with Google Cloud Platform. Provides integrations with GCS, BigQuery, and Cloud Dataproc. |
This list is growing as we are actively building more integrations, and we welcome contributions!