Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Proposal for refreshes needed metadata #67

Merged
merged 14 commits into from
Apr 2, 2024

Conversation

benmoriceau
Copy link
Contributor

@benmoriceau benmoriceau commented Mar 6, 2024

@benmoriceau benmoriceau requested a review from evantahler March 6, 2024 16:10
@evantahler evantahler requested review from jbfbell and davinchia March 6, 2024 16:18
@benmoriceau
Copy link
Contributor Author

This PR was made as a support for the writing of the specs. The change in the protocol will go through the normal proposal process which is describe here: https://www.notion.so/Protocol-Change-Process-64877a3d60564884bda07dc5dea8b4af

@evantahler
Copy link
Contributor

Add feat: to your title to get the CI to pass. This repo uses conventional-commits!

@benmoriceau benmoriceau marked this pull request as ready for review March 22, 2024 21:25
@evantahler
Copy link
Contributor

@@ -485,6 +497,20 @@ definitions:
type: array
items:
type: string
StreamMetadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be added into ConfiguredAirbyteStream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to get too into bike shedding, but at the point that it is in ConfiguredAirbyteStream does it need its own struct at all? can't these all just be fields directly in ConfiguredAirbyteStream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@benmoriceau benmoriceau changed the title Proposal for refreshes needed metadata feat: Proposal for refreshes needed metadata Mar 22, 2024
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 493 to 495
minimum_generation_id:
description: The minimum generation id which is needed in a stream. If it is present, the destination will try to delete the data that are part of a generation lower than this property. If it is absent, the destination won't try to delete data
type: integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be null? I think on the first sync both generation_id and minimum_generation_id both be "0", and non-null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set to null for regular sync where clearing the data is not needed. Having the minimum generation id won't be useful for the destination in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem if we always include it? Allowing null implies a 3rd state that is a little confusing. Unless there is a reason we must allow it to be nullable, I think it makes most sense to always set it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the behavior if we set it to 0 will be the same than setting it at null. It will mean that we will not perform a cleaning of the data. FYI @edgao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk. if that's the case. i agree with evan. it shouldn't be nullable.

@benmoriceau
Copy link
Contributor Author

There's a second file to update as well: https://github.com/airbytehq/airbyte-protocol/blob/main/protocol-models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml

Updated and I also bumped the version which was missing.

@benmoriceau benmoriceau requested a review from evantahler April 1, 2024 21:45
.env Outdated Show resolved Hide resolved
@benmoriceau benmoriceau requested a review from evantahler April 1, 2024 21:49
Comment on lines 467 to 468
- current_generation_id
- sync_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any backwards compatibility concerns that we need to worry about by making these required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, there will be if a migrated connector is running on a non migrated platform.
@cgardens Is it ok if I move it back to a dedicated object with the 3 fields being required and the new object will be optional in the ConfiguredAirbyteStream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intuition is that we are actually safe, because i can't think of a time that a connector ever gives a configured catalog to the platform. it's only ever the platform giving it to a connector. so maybe it's safe? I'd have to dig through the code a bit more to sanity check it.

Copy link
Contributor Author

@benmoriceau benmoriceau Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is when the platform gives a catalog to a connector with those fields. If the connector are migrated, those fields will be required but not present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would that happen though? If the platform knows about though fields it should also populate them, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it can happen in OSS where we can have the connectors running on an old platform. The platform won't know about the fields but the connector will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. good point.

Copy link
Contributor Author

@benmoriceau benmoriceau Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does the solution which extract those fields in a dedicated object and make it optional in the configuredAirbyteStream sound good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not excited about that. I think it's letting a migration details dictate the design. i'd rather make them all nullable but specify that they are nullable for backwards compatibility reasons and if they are null what the default behavior is.

@evantahler wdyt? relates to your comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ideal, but I think it's the best we can do. Added to https://docs.google.com/document/d/1wnnyYT7nzEyrvTiCOQGImfbgCyiTaBU9QIzUmojeozg/edit

@@ -464,6 +464,8 @@ definitions:
- stream
- sync_mode
- destination_sync_mode
- current_generation_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generation_id below, not current_generation_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update following the discussion in #67 (comment)

@benmoriceau benmoriceau requested a review from evantahler April 1, 2024 23:43
@benmoriceau benmoriceau requested a review from cgardens April 2, 2024 00:39
@benmoriceau benmoriceau merged commit 7e8a487 into main Apr 2, 2024
6 checks passed
@benmoriceau benmoriceau deleted the bmoric/protocol-change-for-refreshes branch April 2, 2024 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants