Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Graphql implementation of the repo StargazersStream #123

Merged
merged 25 commits into from
May 19, 2022

Conversation

ericboucher
Copy link
Contributor

@ericboucher ericboucher commented May 18, 2022

This PR is adding a Graphql implementation of the repo StargazersStream and solving #120
This also integrated backward pagination and early exit by faking a "since" parameter.

Open questions:

  1. should we replace the existing stream completely? (I lean towards yes, but not sure if it is strictly backward compatible). If we go with yes, then we don't need Paginate backwards for stargazers #121 and can close it.
  2. the graphql endpoint is missing gravatar_id for the user object, is it even worth mentioning? (I lean towards no)

Bonus - I made a few tweaks that seem to improve requests-caching

@ericboucher ericboucher linked an issue May 18, 2022 that may be closed by this pull request
@ericboucher ericboucher linked an issue May 18, 2022 that may be closed by this pull request
@ericboucher
Copy link
Contributor Author

@edgarrmondragon any thoughts on questions 1 and 2 above?

@edgarrmondragon
Copy link
Member

edgarrmondragon commented May 18, 2022

@edgarrmondragon any thoughts on questions 1 and 2 above?

@ericboucher the schema and replication keys look the same, so the rest and graphql versions only differ in the pagination order?

the graphql endpoint is missing gravatar_id for the user object, is it even worth mentioning? (I lean towards no)

I agree 👍

@ericboucher
Copy link
Contributor Author

The replication keys are different owner, repo -> repo_id @edgarrmondragon

tap_github/repository_streams.py Show resolved Hide resolved
tap_github/tests/fixtures.py Outdated Show resolved Hide resolved
@edgarrmondragon
Copy link
Member

@ericboucher I see the primary key will change with the new graphql stream. What do you think of the following plan:

  • Change the name of the new stream to something like stargazers_graphql
  • Add a deprecation warning, in the readme and maybe in the code too for the REST version of the stream
  • Deselect the new version by default (TBD how we handle that)
  • Release new version with both streams

In a future release we remove the REST version and consider renaming the graphql one.

cc @aaronsteers

@edgarrmondragon edgarrmondragon self-requested a review May 19, 2022 01:56
@ericboucher
Copy link
Contributor Author

@edgarrmondragon not sure I'm a fan since the next release would be yet another breaking change by renaming thisstargazers_graphql to stargazers for consistency.

To be honest, since the previous stream was working so poorly I think it's reasonable to introduce this one and override the previous one directly. Maybe we can add a warning in the code on this streams initialization? Is there a way to force the target to drop existing tables / update indexes?

@ericboucher
Copy link
Contributor Author

@edgarrmondragon @aaronsteers I followed your suggestion and used the same partition keys so that we can "safely" override the REST stream completely. Which I did in the end.

The breaking id changes should happen in #122

Please have a final look and let me know what you think.

cc @laurentS for final thoughts as well

@laurentS
Copy link
Contributor

To be honest, since the previous stream was working so poorly I think it's reasonable to introduce this one and override the previous one directly. Maybe we can add a warning in the code on this streams initialization? Is there a way to force the target to drop existing tables / update indexes?

I don't think there's any defined behaviour on the target side, each target can do pretty much anything with the data, at least from the singer spec perspective, which just defines the interface between tap and target.
But I'd agree with the view that the current stream is behaving poorly. Also, taking into account the number of stars on this repo (5, of which 2 are commenting on this PR and 1 is with Meltano), I'd think it's fair to assume the number of users is very low (and those actually using this stream even lower).
Considering the sort of problems an unstable PKey can bring (see #110 and #112), I'd be in favour of simply replacing the old stream with the new one (and stable keys), with a big fat warning in the logs on init. As an option, rename the old stream stargazers_rest and keep it with a deprecation warning for a bit longer, so that on the off chance that someone is actually using the stream, they have an easy fix.

@ericboucher
Copy link
Contributor Author

Is it easy to use a stream with another name? Eg. use stargazers_rest but still write to a stargazers table? Is this something meltano-cli provides @edgarrmondragon ?

@edgarrmondragon
Copy link
Member

edgarrmondragon commented May 19, 2022

Is it easy to use a stream with another name? Eg. use stargazers_rest but still write to a stargazers table? Is this something meltano-cli provides @edgarrmondragon ?

@ericboucher the SDK comes with stream renaming out-of-the-box: https://sdk.meltano.com/en/latest/stream_maps.html#aliasing-a-stream-using-alias

@sonarcloud
Copy link

sonarcloud bot commented May 19, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ericboucher ericboucher merged commit 44564b2 into main May 19, 2022
@ericboucher ericboucher deleted the stargazers-graphql branch May 19, 2022 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

request-cache is not working properly Use graphql endpoint for stargazers
4 participants