Replies: 4 comments 8 replies
-
Another twist to think about as a future direction would be when a partition represents an aggregation of other assets, "yearly partition A of asset X depends on monthly partitions 1-12 of asset Y". |
Beta Was this translation helpful? Give feedback.
-
@sryza, could you share a bit more on how the API for this might work? |
Beta Was this translation helpful? Give feedback.
-
Easily defining a partitioned asset and then depending on the combined partition output somehow would be amazing! The docs in this area needs to be better, now I've had to read the HN example over and over and do a lot of guessing to make something work :( |
Beta Was this translation helpful? Give feedback.
-
With the upcoming 0.14 edition of dagster will partitioned (software-defined) assets be part of the API? |
Beta Was this translation helpful? Give feedback.
-
Edit: the examples in this discussion are now out-of-date. Please refer to the docs on software-defined assets instead: https://docs.dagster.io/concepts/assets/software-defined-assets
Dagster 0.12.12 introduced experimental "Software-defined asset" APIs:
@asset
andbuild_assets_job
. These APIs sit on top of the new graph/job/op APIs and enable a novel way of constructing Dagster jobs that puts assets at the forefront.As a reminder, to Dagster, an "asset" is a data product: an object produced by a data pipeline, e.g. a table, ML model, or report.
Conceptually, software-defined assets invert the typical relationship between assets and computation. Instead of defining a graph of ops and recording which assets those ops end up materializing, you define a set of assets, each of which knows how to compute its contents from upstream assets.
Taking a software-defined asset approach has a few main benefits:
@graph
/@pipeline
to wire up dependencies between your ops.Defining an asset
A software-defined asset combines:
Here's an example of a pair of assets defined using the
@asset
decorator:Zooming in on the “events” asset...
The asset APIs work most elegantly when you're able to separate IO from compute using IOManagers. The IOManager handles reading and writing the inputs and outputs to persistent storage, while the body of the asset's function handles the logical data transformation.
Building a job from a set of assets
You can build a Dagster job that materializes a set of assets. The generated job can be used anywhere you'd use a regular Dagster job. You can invoke
execute_in_process
, include it inside a Dagster repository, etc.Viewing assets in Dagit
To turn on the experimental asset UI, click the gear icon in the top right of Dagit, and switch on "Experimental Asset APIs":
Then, when you navigate to a job that was built from a set of assets, you'll see a page that looks like this:
This is different from the standard Job / Pipeline page in a few ways:
Assets and dbt
Software-defined assets support a dbt-native approach to orchestration. A dbt model is essentially a software-defined asset. It has an asset key (the name of the dbt model), an op (the SQL select statement that computes the model), and upstream assets (the
ref
s andsource
s inside the select statement).You can load all the models in a dbt project into assets:
You can then visualize your dbt model graph in Dagit, execute models individually, and track lineage between dbt models and non-dbt assets, or between dbt models in different dbt projects. One of the things this is useful for is determining the consequences of changing or removing a dbt model.
The Dagit screenshot above shows a trio of assets loaded from a dbt project. Dagster automatically loads in column documentation from dbt's schema.yml, as well as the SQL for the model.
Future Directions
What's laid out above is the initial foundation of software-defined assets in Dagster. Here's what we foresee building on top of it in the future:
terraform plan
, that allows you to view a diff between your current deployed data and your assets defined in code.Beta Was this translation helpful? Give feedback.
All reactions