Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move replace_attributes, merge_attributes onto AssetSpec, publicise #25941

Merged
merged 5 commits into from
Nov 25, 2024

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented Nov 14, 2024

Summary

Now that they've baked for a little bit, moves the replace_attributes and merge_attributes utilities directly onto AssetSpec, and marks them public:

spec: AssetSpec = ...

updated_spec = spec.replace_attributes(description="A new description.")
spec: AssetSpec = ...

# add new python kind, does not affect existing kinds
updated_spec = spec.merge_attributes(kinds={"python"})

How I Tested These Changes

Update unit test.

Changelog

Introduced AssetSpec.replace_attributes and AssetSpec.merge_attributes to easily alter properties of an asset spec.

@benpankow
Copy link
Member Author

benpankow commented Nov 14, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally supportive, just want to be careful about old params

skippable: bool = ...,
group_name: Optional[str] = ...,
code_version: Optional[str] = ...,
freshness_policy: Optional[FreshnessPolicy] = ...,
Copy link
Contributor

@OwenKephart OwenKephart Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove freshness_policy to avoid needing to go through a deprecation cycle ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just mean freshness_policy, right? or are we removing automation_condition too 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol yeah

@schrockn
Copy link
Member

Having a method makes sense. My only question before we introduce this is how we handle specs and AssetsDefinitions uniformly. Ideally we would have a one liner for that.

@benpankow
Copy link
Member Author

My only question before we introduce this is how we handle specs and AssetsDefinitions uniformly. Ideally we would have a one liner for that.

I can think of a couple options:

  1. Rename these methods to replace_spec_attributes, merge_spec_attributes and also define them on AssetsDefinition. This leaves us with very nice behavior in comprehensions:
my_specs_and_defs = ...

my_updated_specs_and_defs = [
  asset.replace_spec_attributes(owner="[email protected]")
  for asset in my_specs_and_defs
]
  1. Retain & publicize standalone methods which can operate on both specs and definitions, e.g. dg.replace_spec_attributes, dg.merge_spec_attributes. Under the hood, they just call spec.replace_attributes(...) or def.map_specs(lambda spec: spec.replace_attributes(...))
my_specs_and_defs = ...

my_updated_specs_and_defs = [
  dg.replace_spec_attributes(asset, owner="[email protected]")
  for asset in my_specs_and_defs
]

# Can still call directly on specs
my_spec = ...
my_spec = my_spec.replace_attributes(...)
  1. No change, use ternary in comprehension:
my_specs_and_defs = ...

my_updated_specs_and_defs = [
  asset.replace_attributes(owner="[email protected]")
  for asset in my_specs_and_defs
  if isinstance(asset, AssetSpec)
  else
  asset.map_specs(lambda spec: spec.replace_attributes(owner="[email protected]")
]

@dpeng817
Copy link
Contributor

@benpankow wrt unifying behavior around assetsdefinition and assetspec
I think replace_spec_attributes looks nice in theory but will get complicated quickly. EG, if you're often predicating on metadata tags etc which don't have a direct analog on assetsdefinition, you're still leaking the complexity

@benpankow
Copy link
Member Author

benpankow commented Nov 20, 2024

I think replace_spec_attributes looks nice in theory but will get complicated quickly. EG, if you're often predicating on metadata tags etc which don't have a direct analog on assetsdefinition, you're still leaking the complexity

That's true - I do think the predicate case will be much less common in the mixed spec/definitions case, e.g. if you're conditionally changing properties, it's probably right after pulling in specs from an integration, since at that point you have a relatively homogenous set of specs where you can reason about what metadata you're predicating on.

That being said, maybe we want something like

def map_specs(func: Callable[[AssetSpec], AssetSpec], iter: Iterable[Union[AssetsDefinition, AssetSpec]]) -> Sequence[Union[AssetsDefinition, AssetSpec]]):
  return [obj.map_specs(func) for obj in iter if isinstance(x, AssetsDefinition) else func(obj)]

my_specs_and_defs = ...

my_predicate: Callable[[AssetSpec], bool] = ...

my_updated_specs_and_defs = dg.map_specs(
  lambda spec: spec.replace_attributes(owner="[email protected]") if my_predicate(spec) else spec,
  my_specs_and_defs
)

This way, semantics match map(lambda spec: ..., ...) closely, just with auto-marshalling of specs into and out of definitions. This world would basically be a combination of this PR, as-is (adding customization utilities directly to AssetSpec) plus stacking map_specs.

@dpeng817
Copy link
Contributor

dpeng817 commented Nov 20, 2024

I find myself more convinced by map_specs than AssetsDefinition.replace_attributes - to our discussion in product review, I think replace_attributes on AssetsDefinition seems overly specified. We're wrapping a primitive with map_specs, true, but I don't think it actually removes any of the flexibility of that primitive.

The problem; I think we'll unfortunately end up with one of these for every kind of spec. IE; map_check_specs. I think that's preferable to the alternative though of forcing users to encounter this awkwardness.

@schrockn
Copy link
Member

map_specs sounds great but I think it should be map_asset_specs.

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a standalone function map_asset_specs 👍🏻

@benpankow
Copy link
Member Author

Introduced in #26109

Comment on lines 193 to 196
customer_metrics_dag_asset = customer_metrics_dag_asset.replace_attributes(
customer_metrics_dag_asset,
deps=[load_customers],
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replace_attributes() method is being called with customer_metrics_dag_asset as both the receiver and first argument. Since this method only accepts keyword arguments, the first argument should be removed:

customer_metrics_dag_asset = customer_metrics_dag_asset.replace_attributes(
    deps=[load_customers]
)

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@benpankow benpankow merged commit 0de7987 into master Nov 25, 2024
1 of 2 checks passed
@benpankow benpankow deleted the benpankow/replace-attributes branch November 25, 2024 17:52
benpankow added a commit that referenced this pull request Nov 25, 2024
## Summary

Based on conversation in #25941, introduces a new `dg.map_asset_specs`
function which can be used to map a user-provided function over a set of
`AssetSpec`s and `AssetDefinitions`.

```python
@multi_asset(specs=[
  AssetSpec(key="foo"),
  AssetSpec(key="bar"),
])
def my_multi_asset():
  ...

my_specs_and_defs = [
  my_multi_asset,
  AssetSpec(key="external")
]

my_mapped_specs_and_defs = map_asset_specs(
  lambda spec: spec.replace_attributes(owners="[email protected]"),
  my_specs_and_defs
)
```


## Test Plan

New unit tests.

## Changelog
> Introduced dg.map_asset_specs to enable modifying `AssetSpec`s and
`AssetsDefinition`s in bulk.
maximearmstrong added a commit that referenced this pull request Nov 25, 2024
#26027)

## Summary & Motivation

Following an internal discussion
[here](dagster-io/internal#12670) and
product review
[here](https://www.notion.so/dagster/Product-Reviews-1805443bfd5d4d40b3f6be8c79897295?p=13f18b92e4628020bd6fd3ac45f28af0&pm=s),
get_asset_spec().key is the new recommended way to access the asset key
in the Translator.

Asset keys can now be customized in a custom translator using
`replace_attributes`, like:

```python
class MyCustomTableauTranslator(DagsterTableauTranslator):
    def get_asset_spec(self, data: TableauContentData) -> dg.AssetSpec:
        default_spec = super().get_asset_spec(data)
        return replace_attributes(
            default_spec,
            key=default_spec.key.with_prefix("prefix"),
        )
```

This PR stack will be merged after #25941 lands, so that
`replace_attributes()` can be called as
`AssetSpec().replace_attributes()`

These changes will also be done to the following integrations: 
- Power BI
- Sigma
- Looker
- dlt
- Sling
- sdf

We will need to rework dbt to use this pattern.

## How I Tested These Changes

Updated unit test with BK
maximearmstrong added a commit that referenced this pull request Nov 25, 2024
…ranslator (#26028)

## Summary & Motivation

Mark DagsterTableauTranslator.get_asset_key as deprecated. To be removed
in 1.10.

This PR stack will be merged after #25941 lands, so that
`replace_attributes()` can be called as
`AssetSpec().replace_attributes()`

## Changelog

[dagster-tableau] `DagsterTableauTranslator.get_asset_key` is deprecated
in favor of `DagsterTableauTranslator.get_asset_spec().key`
maximearmstrong added a commit that referenced this pull request Nov 25, 2024
…26029)

## Summary & Motivation

Update the custom translator example in the Tableau docs to reflect
`get_asset_spec().key`.

This PR stack will be merged after #25941 lands, so that
`replace_attributes()` can be called as
`AssetSpec().replace_attributes()`
benpankow added a commit that referenced this pull request Nov 26, 2024
)

## Summary

Follow-on to #25941 which updates some non-type-checked `_replace`
callsites to use these new methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants