Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions #25944

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Nov 15, 2024

Summary & Motivation

This PR implements the fivetran_assets decorator and the build_fivetran_assets_definitions factory.

  • fivetran_assets can be used to create all assets for a given Fivetran connector, i.e. one asset per table in the connector.
  • build_fivetran_assets_definitions can be used to create all Fivetran assets defs, one per connector. It uses fivetran_assets.

Both the asset decorator and factory use load_fivetran_asset_specs. This is motivated by the current implementation of dagster-dbt, dagster-dlt and dagster-sling - each leverages an asset decorator that loads the asset specs by itself.

To avoid calling the Fivetran API each time load_fivetran_asset_specs is called, it is cached using functools.lru_cache. load_fivetran_asset_specs uses the state-backed defs, so reloading the code won't make additional calls to the Fivetran API, but calling load_fivetran_asset_specs N times in a script will make N calls to the Fivetran API.

The goals here are:

  • make the Fivetran integration as similar as possible to the other ELT integrations by using the same patterns, eg. asset decorator
  • make the user experience as simple as possible and avoid having users manage the asset specs and number of calls to the Fivetran API.

How I Tested These Changes

Additional unit tests with BK.

Changelog

[dagster-fivetran] The fivetran_assets decorator is added. It can be used with the FivetranWorkspace resource and DagsterFivetranTranslator translator to load Fivetran tables for a given connector as assets in Dagster. The build_fivetran_assets_definitions factory can be used to create assets for all the connectors in your Fivetran workspace.

Copy link
Contributor Author

maximearmstrong commented Nov 15, 2024

@maximearmstrong maximearmstrong changed the title [10/n][dagster-fivetran] Implement fivetran_assets decorator and build_fivetran_assets_definitions factory [10/n][dagster-fivetran] Implement fivetran_assets and build_fivetran_assets_definitions Nov 15, 2024
@maximearmstrong maximearmstrong force-pushed the maxime/rework-fivetran-10 branch 3 times, most recently from 2500b07 to 3f0e9aa Compare November 15, 2024 14:44
Comment on lines 893 to 912
def sync_and_poll(
self, context: Optional[Union[OpExecutionContext, AssetExecutionContext]] = None
):
raise NotImplementedError()

def __eq__(self, other):
return (
isinstance(other, FivetranWorkspace)
and self.account_id == other.account_id
and self.api_key == other.api_key
and self.api_secret == other.api_secret
)

def __hash__(self):
return hash((self.account_id + self.api_key + self.api_secret))


@lru_cache(maxsize=None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching load_fivetran_asset_specs with functools.lru_cache requires FivetranWorkspace to be hashable.

An alternative to caching load_fivetran_asset_specs would be to call it in a FivetranWorkspace cached property.

class FivetranWorkspace(ConfigurableResource):
    ...
    @cache_property
    def asset_specs(
        self, 
        dagster_fivetran_translator: Type[DagsterFivetranTranslator] = DagsterFivetranTranslator
    ) -> Sequence[AssetSpec]:
        return load_fivetran_asset_specs(
            workspace=self, 
            dagster_fivetran_translator=dagster_fivetran_translator
        )

For the context, after this and this discussions, we decided to detach spec loading from the resources, which is why I cached load_fivetran_asset_specs instead of caching a property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @benpankow @dpeng817 @schrockn @alangenfeld @OwenKephart for feedback on this specific comment and the PR description about caching the asset specs calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i bias towards @cached_method on an object instance instead of @lru_cache(maxsize=None) on a top level function - I think its much easier to reason about when things are expected to be cached

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we went with cached_property for the FivetranWorkspace equivalent in Airlift - I think that's likely the right move here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to a cached method in FivetranWorkspace in 501caf7

Copy link
Contributor

@dpeng817 dpeng817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I... don't think we should add these. Reasoning:

  • All other cases where we added new decorators were before we converged upon a new API design - and I think likely weren't considered holistically. The @dbt_assets decorator is an exception, but we didn't have AssetSpec yet. If we could go back and do it again, I have some doubts we would have added any of them (although there is some messy transformational stuff that it needs to do, so maybe I'm wrong)
  • We can always add it later. But if we add it now we're stuck supporting it and eventually adding arguments for all the asset spec things (like group_name here). Eventually you'll need to support some sort of splitting API, etc. I just think it's better to point users towards a more genericizable solution.

I'm also not sure I understand why we need both the decorator and build_fivetran_assets_definitions? Is the idea that it just provides progressively easier starting points?

If the actual function implementations were doing more complicated stuff, maybe I would see the additional value but these are pretty simple wrappers, thus I think we should hold off.

@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from c5911bf to db028f1 Compare November 25, 2024 23:22
@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from db028f1 to a773b5e Compare November 26, 2024 22:50
@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from a773b5e to 74db3e2 Compare November 26, 2024 23:21
@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from 74db3e2 to c18340e Compare November 27, 2024 13:32
@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from c18340e to 3c9a56a Compare November 27, 2024 13:44
@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from 3c9a56a to 48a3bb0 Compare November 27, 2024 14:05
@maximearmstrong maximearmstrong force-pushed the maxime/implement-resync-and-poll-method-fivetran-client branch from 48a3bb0 to cd041b0 Compare November 27, 2024 14:25
Base automatically changed from maxime/implement-resync-and-poll-method-fivetran-client to master November 27, 2024 14:43
@maximearmstrong maximearmstrong force-pushed the maxime/rework-fivetran-10 branch 3 times, most recently from 748298f to 538348f Compare November 27, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants