Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Proposal for refreshes needed metadata #67

Merged
merged 14 commits into from
Apr 2, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,8 @@ definitions:
- stream
- sync_mode
- destination_sync_mode
- generation_id
- sync_id
properties:
stream:
"$ref": "#/definitions/AirbyteStream"
Expand All @@ -484,6 +486,15 @@ definitions:
type: array
items:
type: string
generation_id:
evantahler marked this conversation as resolved.
Show resolved Hide resolved
description: Monotically increasing numeric id representing the current generation of a stream. This id can be shared across syncs
type: integer
minimum_generation_id:
description: The minimum generation id which is needed in a stream. If it is present, the destination will try to delete the data that are part of a generation lower than this property. If it is absent, the destination won't try to delete data
type: integer
sync_id:
description: Monotically increasing numeric id representing the current sync id. This is aimed to be unique per sync.
type: integer
SyncMode:
type: string
enum:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,8 @@ definitions:
- stream
- sync_mode
- destination_sync_mode
- current_generation_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generation_id below, not current_generation_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update following the discussion in #67 (comment)

- sync_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any backwards compatibility concerns that we need to worry about by making these required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, there will be if a migrated connector is running on a non migrated platform.
@cgardens Is it ok if I move it back to a dedicated object with the 3 fields being required and the new object will be optional in the ConfiguredAirbyteStream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intuition is that we are actually safe, because i can't think of a time that a connector ever gives a configured catalog to the platform. it's only ever the platform giving it to a connector. so maybe it's safe? I'd have to dig through the code a bit more to sanity check it.

Copy link
Contributor Author

@benmoriceau benmoriceau Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is when the platform gives a catalog to a connector with those fields. If the connector are migrated, those fields will be required but not present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would that happen though? If the platform knows about though fields it should also populate them, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it can happen in OSS where we can have the connectors running on an old platform. The platform won't know about the fields but the connector will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. good point.

Copy link
Contributor Author

@benmoriceau benmoriceau Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does the solution which extract those fields in a dedicated object and make it optional in the configuredAirbyteStream sound good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not excited about that. I think it's letting a migration details dictate the design. i'd rather make them all nullable but specify that they are nullable for backwards compatibility reasons and if they are null what the default behavior is.

@evantahler wdyt? relates to your comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ideal, but I think it's the best we can do. Added to https://docs.google.com/document/d/1wnnyYT7nzEyrvTiCOQGImfbgCyiTaBU9QIzUmojeozg/edit

properties:
stream:
"$ref": "#/definitions/AirbyteStream"
Expand All @@ -485,6 +487,15 @@ definitions:
type: array
items:
type: string
generation_id:
description: Monotically increasing numeric id representing the current generation of a stream. This id can be shared across syncs
type: integer
minimum_generation_id:
description: The minimum generation id which is needed in a stream. If it is present, the destination will try to delete the data that are part of a generation lower than this property. If it is absent, the destination won't try to delete data
type: integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be null? I think on the first sync both generation_id and minimum_generation_id both be "0", and non-null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set to null for regular sync where clearing the data is not needed. Having the minimum generation id won't be useful for the destination in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem if we always include it? Allowing null implies a 3rd state that is a little confusing. Unless there is a reason we must allow it to be nullable, I think it makes most sense to always set it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the behavior if we set it to 0 will be the same than setting it at null. It will mean that we will not perform a cleaning of the data. FYI @edgao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk. if that's the case. i agree with evan. it shouldn't be nullable.

sync_id:
description: Monotically increasing numeric id representing the current sync id. This is aimed to be unique per sync.
type: integer
SyncMode:
type: string
enum:
Expand Down
Loading