Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Importer: Delete Org Events Prior to Event Import #424

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

irby
Copy link
Collaborator

@irby irby commented Oct 30, 2024

A current theory to the issue causing #411 is that updates to Meetup events, in particular recurring events, is causing a duplicate Meetup event to be created with a different service ID.

While I haven't been able to repro this issue locally, I have seen in some of the API responses that only one of the duplicates appear in the GraphQL API response. This helps support the theory that updates are causing duplicates.

In this solution, I am looking to check and remove duplicates for meetup and meetup_graphql events only. I check on events within the org for the same timestamp and delete the existing DB records before inserting the new records.

@irby irby closed this Oct 30, 2024
@irby
Copy link
Collaborator Author

irby commented Oct 30, 2024

@allella I've come up with this solution which would work based on my knowledge of the duplication issue, however this will result in the Id column being incremented continuously as the event importer deletes and creates records.

This wouldn't be an issue if we were using a UUID as the Id attribute but in the current state this would be problematic for the table.

@allella
Copy link
Member

allella commented Oct 30, 2024

I do wonder, could we limit all of this recreation a bit if we try to detect a potential duplicate, and only run the purge if we think there may be a duplicate?

Like, if we do a search on our database for any events by the same org, the same day, same start time, and same service and if anything is returned, only then do we blow things away?

If we did this approach, could we further limit it to only deleting events on that day when there's a potential conflict, so we're not deleting a bunch of stuff when there are no other events for that org that could possibly be a conflict?

@irby irby reopened this Oct 31, 2024
@irby
Copy link
Collaborator Author

irby commented Oct 31, 2024

@allella Check this out now. I have set it up so that we only check for duplicates on meetup_graphql / meetup and filter it on time of the event.

@allella
Copy link
Member

allella commented Nov 5, 2024

@bogdankharchenko could you check this out.

The source of the duplication is still a bit of a mystery, but a look at the HG calendar shows a number of recurring Meetup events that generate duplicates. Our best guess is it's for events that are setup on a recurring template on Meetup's end and possibly the duplicates happen due to some change the organizer's are making on Meetup.

Matt's suggestion was to search if there are existing events for the same org + same date + same start time, and if there are just purge the existing record and reimport.

@bogdankharchenko
Copy link
Collaborator

bogdankharchenko commented Nov 20, 2024

@irby I spent a few mins just looking at the duplicated data, and it seems that that, the ID from graphql either comes in as a stripe or as an integer.

Perhaps an easier solution is to just say, ignore events which have string/integer ID's? What I suspect is happening, is at some point this Org was under Meetup REST which was importing as string, and GraphQL is importing it as integer. And since we import event many months in advance, this is where the issue started.

Does that sound plausible?

@allella
Copy link
Member

allella commented Nov 21, 2024

@bogdankharchenko we found there's a token value in the Event response that we weren't querying in GraphQL. That token seemed to match the alphabetical value that was causing the duplicate of the integer ID on events that are part of a recurring series.

The hunch is we'll be able to use this token to avoid or workaround the duplicates.

I just pulled Matt's recent PR to log the token values and we'll see if that will help us out.

@allella
Copy link
Member

allella commented Nov 21, 2024

This was the conversation about the token value.

@irby
Copy link
Collaborator Author

irby commented Nov 21, 2024

Yes, @bogdankharchenko to add to what @allella has said, this token field on an event is optional, but when a duplicate is identified it does match the id of the original event.

For example:

Event 1:

  • id: 123
  • token: 123
  • date: 11/10/2024

Event 2:

  • id: 124
  • token: undefined
  • date: 12/10/2024

Event 3:

  • id: abc
  • toke: 123
  • date: 11/10/2024

My current line of thinking is we do this: to map the service_id field, unless token is undefined we use that value for the service_id. If token is undefined, then we map service_id to the event id, which will always be populated for an event.

I think this is perhaps the best and only way we can resolve the duplicate event issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants