-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[24.1] Add username and email deduplication scripts #18492
Conversation
Failures relevant: imports. I'll fix this. |
self.engine = engine | ||
|
||
def deduplicate_emails(self) -> None: | ||
with self.engine.begin() as connection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do this, users won't control the new email address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as deduplication goes, there is not much else we can do: there should be no duplicate emails in the database, and if there are, we can either delete the duplicate records or edit the duplicate values. Another option is to expect admins to do this manually (deleting rows, contacting users, whatever), and add this to the admin release notes. This script modifies the emails on the older duplicate accounts, leaving the newest intact, which of course is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is not much else we can do:
At a minimum prefix/suffix the email address to something not guessable ? Yes, if there are duplicates it's better to check which account actually has data, it's not unlikely only one of them will have anything at all, since we probably pick one on login, or we've always been denying / failing at the login stage, at which point toggling the delete flag is sufficient.
Upon further consideration, I think this belongs in dev. I'll add this to #18487. |
Ref #18487
These are 2 scripts that can be run to remove duplicate emails and duplicate usernames from the galaxy_user table.
I am not sure if we want them in 24.1 (this PR) or in dev. It's a feature, not a bug fix; but this feature can be used to fix a data problem on main that's running 24.1, and, potentially, any other database that is sufficiently ancient.
I'll have a separate PR against dev with migrations that add unique constraints to both fields.
I've tested this manually. Automated tests are a pain here, since there is no way that I know of to populate a sqlite database with data that violates a unique constraint (other than removing the constraint, which would require creating a new galaxy_user table just for the test).
How to test the changes?
(Select all options that apply)
License