Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/n][dagster-fivetran] Update DagsterFivetranTranslator and related classes for rework #25751

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Nov 5, 2024

Summary & Motivation

Builds out a very barebones translator class for the new version of the Fivetran integration.

The implementation for this translator will be inspired by the DagsterFivetranTranslator introduced in #25557, but a new implementation is required to leverage the workspace context and state-backed definitions, which is incompatible with the current translator and way of building assets.

Edit after Ben's comment here:

Move things around under translator.py. This PR leverages the DagsterFivetranTranslator introduced in introduced in #25557. FivetranWorkspaceData will implement the method to_fivetran_connector_table_props_data in a subsequent PR, that will map raw connector and destination data fetched using the Fivetran API into FivetranConnectorTableProps objects, that are compatible with the translator. This process will match what we currently do.

How I Tested These Changes

Tests will be added in subsequent PRs.

Copy link
Contributor Author

maximearmstrong commented Nov 5, 2024

@@ -1 +1,2 @@
from dagster_fivetran.v2.resources import FivetranWorkspace as FivetranWorkspace
from dagster_fivetran.v2.translator import DagsterFivetranTranslator as DagsterFivetranTranslator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until the previous functions and resource are deprecated and removed, users would import the reworked integration with the following:

from dagster_fivetran.v2 import DagsterFivetranTranslator, FivetranWorkspace

The new version of the integration won't actually be v2, so perhaps another import path would be better?

This is mainly to avoid name collisions between both previous and new translators. We could also name this translator differently if we really want users to import with from dagster_fivetran import FivetranWorkspace, but Dagster[IntegrationName]Translator is the naming convention for new integrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just call it something different than *.v2. Maybe experimental for a release -> 1.0 the integration, then we switch the old stuff to be under .legacy and make the new stuff exist under the top level package. @PedramNavid and @C00ldudeNoonan would probably have thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I updated the folder name for experimental in the previous PR in this stack.

Copy link
Contributor

@dpeng817 dpeng817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to your queue for some design questions

Comment on lines 77 to 80
def get_connector_asset_key(self, data: FivetranContentData) -> AssetKey:
raise NotImplementedError()

def get_connector_spec(self, data: FivetranContentData) -> AssetSpec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this might be inherited from existing patterns, but I do find this kinda awkward. Creates a weird footgun where you can implement get_connector_spec in a way that does not utilize get_connector_asset_key, right?

I wonder if we should just have get_connector_spec and then force people to use get_connector_spec(...).asset_key.

To make this more efficient, we could cache the get_connector_spec result per datum.

Copy link
Contributor Author

@maximearmstrong maximearmstrong Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this might be inherited from existing patterns, but I do find this kinda awkward.

Yes exactly. A similar discussion came up here.

I'm in favor of this change - I agree that this pattern is error prone.

It was kinda easier for a user to override the get_asset_key function, to add a prefix for instance. We could recommend doing something like the following instead, which looks a bit awkward but is less error prone.

class CustomDagsterTranslator(DagsterTranslator):
    def get_asset_spec(self, props) -> AssetSpec:
        asset_spec = super().get_asset_spec(props)
        return asset_spec._replace(asset_key=asset_spec.asset_key.with_prefix("my_prefix"))

@benpankow Any thoughts? I can update the other translators if we all agree on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have better fxns for updating params on asset specs now too: replace_attributes top level fxn. But yea I think this pattern is good. Having an example people can copy paste, even if it looks slightly awkward, is much preferable to introducing a footgun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we made the other integrations where this fxn exists public yet @maximearmstrong ? / would like to understand the status / experimentality there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with this change - I think our initial push for get_asset_key was to avoid doing extra work to construct an entire asset spec when we would discard most of it & just retrieve the key, but Chris made a good point that any asset whose key we're generating we'll also be generating a spec for, either before or after. As long as we can cache the spec generation, we don't end up doing extra work in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - we can remove the get_asset_key method in favor of the replace_attributes fnx for customization.

Have we made the other integrations where this fxn exists public yet

The get_asset_spec method is used in the translator of each BI integrations, and these translators are not marked as experimental. I think that method is pretty stable at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_asset_key method is also not marked as experimental for the BI integrations, so that implies a deprecation cycle, but the risk seems pretty low here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea if we're going to do it, we should probably do so soon. Want to post a public discussion about it to make sure we get buy in from @schrockn @benpankow ; do we have anyone customizing BI translators to our knowledge yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, no one is customizing the Tableau and Power BI translators yet. @benpankow Do you know if anyone is customizing the Sigma and Looker ones?

I can post the public discussion with code snippets to get consensus on this.

For this PR, I will remove the get_asset_key method - we can easily add it depending on the outcome of the discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that sounds good to me!

return self._context

def get_asset_key(self, data: FivetranContentData) -> AssetKey:
if data.content_type == FivetranContentType.CONNECTOR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also have get_destination_asset_key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will leverage the previous translator instead. See updated PR description.

check.assert_never(data.content_type)

def get_asset_spec(self, data: FivetranContentData) -> AssetSpec:
if data.content_type == FivetranContentType.CONNECTOR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also have get_destination_spec?

else:
check.assert_never(data.content_type)

def get_asset_spec(self, data: FivetranContentData) -> AssetSpec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this matches the way that we model Fivetran assets, at least in the current form - each connector can sync one or many tables, each of which we represent as an asset. Destinations may not make sense to model as an asset at all, since they just represent a storage destination but not a specific e.g. table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading the code after reading you comment, that makes a lot of sense. I will adjust the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved things around so that we leverage the translator as it was implemented, while saving the state with the raw API data. See updated PR description for the details.

Comment on lines 57 to 67
def get_asset_key(self, data: FivetranContentData) -> AssetKey:
if data.content_type == FivetranContentType.CONNECTOR:
return self.get_connector_asset_key(data)
else:
check.assert_never(data.content_type)

def get_asset_spec(self, data: FivetranContentData) -> AssetSpec:
if data.content_type == FivetranContentType.CONNECTOR:
return self.get_connector_spec(data)
else:
check.assert_never(data.content_type)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both FivetranContentType.CONNECTOR and FivetranContentType.DESTINATION need to be handled in the get_asset_key and get_asset_spec methods. Consider adding elif branches for FivetranContentType.DESTINATION that call corresponding get_destination_asset_key and get_destination_spec methods. These methods should be added as NotImplemented stubs, similar to the existing connector methods. The check.assert_never should only be hit for invalid enum values that aren't part of FivetranContentType.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@maximearmstrong maximearmstrong changed the title [dagster-fivetran] Scaffold DagsterFivetranTranslator for rework [dagster-fivetran] Update DagsterFivetranTranslator and related classes for rework Nov 6, 2024
@maximearmstrong maximearmstrong changed the title [dagster-fivetran] Update DagsterFivetranTranslator and related classes for rework [2/n][dagster-fivetran] Update DagsterFivetranTranslator and related classes for rework Nov 7, 2024


@whitelist_for_serdes
@record
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want/need this class, FivetranContentType etc if it's not getting fed into the translator?

In particular I think having the content type isn't necessary, since we aren't exposing this data to the user directly

Copy link
Contributor Author

@maximearmstrong maximearmstrong Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth keeping FivetranContentType for a few reasons:

Overall, I'm in favor of keeping this to keep our design pattern for XWorkspaceData and fetch_x_workspace_data as consistent as possible across integrations.

Subclass this class to implement custom logic for each type of Fivetran content.
"""

def get_asset_key(self, props: FivetranConnectorTableProps) -> AssetKey:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to delete this now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done in 6eb1099

@maximearmstrong maximearmstrong merged commit b3e4e7c into master Nov 8, 2024
1 check passed
@maximearmstrong maximearmstrong deleted the maxime/rework-fivetran-2 branch November 8, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants