-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_metadata: Allow graceful handling of duplicate strain names #810
Comments
Consider augur/augur/curate/__init__.py Lines 79 to 83 in 18e07b4
Lines 39 to 49 in 18e07b4
|
Re-upping this here in the context of For metadata I tested by doubling entries in
this errors out For sequences I tested by doubling entries in
Here, there's no complaints. The resulting files have n entries in I think we should be consistent with behavior for sequences and metadata and error out if there are duplicate sequences when running |
@trvrb thanks for resurfacing this. Zooming out a bit, there have been several issues where de-duplication has been discussed:
#586, #725, and #810 all seem to be discussing the same thing: allow existing Augur commands to handle de-duplication. #919 and #616 are separately similar: create a new Augur command to handle de-duplication. The current status quo and consensus used by most commands is well summarized by @huddlej in #616 (comment):
Based on that, the solution here is to close #586, #725, and #810 as "not planned". Separately, as pointed out in the previous comment, there is a bug in |
Thanks for the write up @victorlin! I suspect that it doesn't make sense to even check for duplicates in many commands. For instance Though other commands like |
Closing this as not planned per #810 (comment) |
Context
I'm using GISAID's data export for Augur. Someone uploaded two different sequences with different EPI ISLs but the same strain name. This causes augur's
read_metadata
to crash here:It would be good if there was a flag that one could set to be happy with duplicates and just choose the first (or last or none.
Because now I need to manually remove this duplicate or run a different script to deduplicate the metadata myself. It can be done but it'd be nice if the util had a flag taking care of this.
The text was updated successfully, but these errors were encountered: