-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2/n][dagster-fivetran] Update DagsterFivetranTranslator and related classes for rework #25751
Conversation
@@ -1 +1,2 @@ | |||
from dagster_fivetran.v2.resources import FivetranWorkspace as FivetranWorkspace | |||
from dagster_fivetran.v2.translator import DagsterFivetranTranslator as DagsterFivetranTranslator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until the previous functions and resource are deprecated and removed, users would import the reworked integration with the following:
from dagster_fivetran.v2 import DagsterFivetranTranslator, FivetranWorkspace
The new version of the integration won't actually be v2, so perhaps another import path would be better?
This is mainly to avoid name collisions between both previous and new translators. We could also name this translator differently if we really want users to import with from dagster_fivetran import FivetranWorkspace
, but Dagster[IntegrationName]Translator
is the naming convention for new integrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just call it something different than *.v2
. Maybe experimental
for a release -> 1.0 the integration, then we switch the old stuff to be under .legacy
and make the new stuff exist under the top level package. @PedramNavid and @C00ldudeNoonan would probably have thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - I updated the folder name for experimental
in the previous PR in this stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to your queue for some design questions
def get_connector_asset_key(self, data: FivetranContentData) -> AssetKey: | ||
raise NotImplementedError() | ||
|
||
def get_connector_spec(self, data: FivetranContentData) -> AssetSpec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this might be inherited from existing patterns, but I do find this kinda awkward. Creates a weird footgun where you can implement get_connector_spec in a way that does not utilize get_connector_asset_key, right?
I wonder if we should just have get_connector_spec
and then force people to use get_connector_spec(...).asset_key
.
To make this more efficient, we could cache the get_connector_spec result per datum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this might be inherited from existing patterns, but I do find this kinda awkward.
Yes exactly. A similar discussion came up here.
I'm in favor of this change - I agree that this pattern is error prone.
It was kinda easier for a user to override the get_asset_key
function, to add a prefix for instance. We could recommend doing something like the following instead, which looks a bit awkward but is less error prone.
class CustomDagsterTranslator(DagsterTranslator):
def get_asset_spec(self, props) -> AssetSpec:
asset_spec = super().get_asset_spec(props)
return asset_spec._replace(asset_key=asset_spec.asset_key.with_prefix("my_prefix"))
@benpankow Any thoughts? I can update the other translators if we all agree on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have better fxns for updating params on asset specs now too: replace_attributes
top level fxn. But yea I think this pattern is good. Having an example people can copy paste, even if it looks slightly awkward, is much preferable to introducing a footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we made the other integrations where this fxn exists public yet @maximearmstrong ? / would like to understand the status / experimentality there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with this change - I think our initial push for get_asset_key
was to avoid doing extra work to construct an entire asset spec when we would discard most of it & just retrieve the key, but Chris made a good point that any asset whose key we're generating we'll also be generating a spec for, either before or after. As long as we can cache the spec generation, we don't end up doing extra work in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair - we can remove the get_asset_key
method in favor of the replace_attributes
fnx for customization.
Have we made the other integrations where this fxn exists public yet
The get_asset_spec
method is used in the translator of each BI integrations, and these translators are not marked as experimental. I think that method is pretty stable at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_asset_key
method is also not marked as experimental for the BI integrations, so that implies a deprecation cycle, but the risk seems pretty low here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea if we're going to do it, we should probably do so soon. Want to post a public discussion about it to make sure we get buy in from @schrockn @benpankow ; do we have anyone customizing BI translators to our knowledge yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, no one is customizing the Tableau and Power BI translators yet. @benpankow Do you know if anyone is customizing the Sigma and Looker ones?
I can post the public discussion with code snippets to get consensus on this.
For this PR, I will remove the get_asset_key
method - we can easily add it depending on the outcome of the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that sounds good to me!
return self._context | ||
|
||
def get_asset_key(self, data: FivetranContentData) -> AssetKey: | ||
if data.content_type == FivetranContentType.CONNECTOR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we also have get_destination_asset_key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will leverage the previous translator instead. See updated PR description.
check.assert_never(data.content_type) | ||
|
||
def get_asset_spec(self, data: FivetranContentData) -> AssetSpec: | ||
if data.content_type == FivetranContentType.CONNECTOR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we also have get_destination_spec?
else: | ||
check.assert_never(data.content_type) | ||
|
||
def get_asset_spec(self, data: FivetranContentData) -> AssetSpec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this matches the way that we model Fivetran assets, at least in the current form - each connector can sync one or many tables, each of which we represent as an asset. Destinations may not make sense to model as an asset at all, since they just represent a storage destination but not a specific e.g. table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the code after reading you comment, that makes a lot of sense. I will adjust the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved things around so that we leverage the translator as it was implemented, while saving the state with the raw API data. See updated PR description for the details.
8c291f8
to
4dfe34d
Compare
7c954b2
to
bbaf99d
Compare
def get_asset_key(self, data: FivetranContentData) -> AssetKey: | ||
if data.content_type == FivetranContentType.CONNECTOR: | ||
return self.get_connector_asset_key(data) | ||
else: | ||
check.assert_never(data.content_type) | ||
|
||
def get_asset_spec(self, data: FivetranContentData) -> AssetSpec: | ||
if data.content_type == FivetranContentType.CONNECTOR: | ||
return self.get_connector_spec(data) | ||
else: | ||
check.assert_never(data.content_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both FivetranContentType.CONNECTOR
and FivetranContentType.DESTINATION
need to be handled in the get_asset_key
and get_asset_spec
methods. Consider adding elif
branches for FivetranContentType.DESTINATION
that call corresponding get_destination_asset_key
and get_destination_spec
methods. These methods should be added as NotImplemented
stubs, similar to the existing connector methods. The check.assert_never
should only be hit for invalid enum values that aren't part of FivetranContentType
.
Spotted by Graphite Reviewer
Is this helpful? React 👍 or 👎 to let us know.
4dfe34d
to
24ed962
Compare
bbaf99d
to
ed343b0
Compare
24ed962
to
1c392d7
Compare
67c556e
to
7197ca1
Compare
|
||
|
||
@whitelist_for_serdes | ||
@record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want/need this class, FivetranContentType
etc if it's not getting fed into the translator?
In particular I think having the content type isn't necessary, since we aren't exposing this data to the user directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth keeping FivetranContentType
for a few reasons:
- The content data for Fivetran is a JSON object returned by the API for both connectors and destinations. Using
FivetranContentType
, it's easier to keep track of the type of theFivetranContentData
.- This pattern is used for Power BI and Tableau, where the content data is also a JSON object returned by an API.
- The implementation of
fetch_fivetran_workspace_data
leverages theFivetranContentType
enum to create theFivetranWorkspaceData
.- We use
FivetranContentType
when creatingFivetranContentData
objects and theFivetranWorkspaceData
object, to differentiate connectors and destinations. See usage in subsequent PR, in [4/n][dagster-fivetran] Implementfetch_fivetran_workspace_data
#25788 and [5/n][dagster-fivetran] ImplementFivetranWorkspaceData
toFivetranConnectorTableProps
method #25797
- We use
Overall, I'm in favor of keeping this to keep our design pattern for XWorkspaceData
and fetch_x_workspace_data
as consistent as possible across integrations.
1c392d7
to
c03162d
Compare
7197ca1
to
e21e3f0
Compare
Subclass this class to implement custom logic for each type of Fivetran content. | ||
""" | ||
|
||
def get_asset_key(self, props: FivetranConnectorTableProps) -> AssetKey: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to delete this now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done in 6eb1099
e21e3f0
to
6eb1099
Compare
0d105eb
to
e702849
Compare
6eb1099
to
5d21367
Compare
5d21367
to
fed3154
Compare
Summary & Motivation
Builds out a very barebones translator class for the new version of the Fivetran integration.The implementation for this translator will be inspired by theDagsterFivetranTranslator
introduced in #25557, but a new implementation is required to leverage the workspace context and state-backed definitions, which is incompatible with the current translator and way of building assets.Edit after Ben's comment here:
Move things around under translator.py. This PR leverages the
DagsterFivetranTranslator
introduced in introduced in #25557.FivetranWorkspaceData
will implement the methodto_fivetran_connector_table_props_data
in a subsequent PR, that will map raw connector and destination data fetched using the Fivetran API intoFivetranConnectorTableProps
objects, that are compatible with the translator. This process will match what we currently do.How I Tested These Changes
Tests will be added in subsequent PRs.