You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once we've completed #32, we can use strain names to deduplicate sequences.
This is necessary in case different groups sequence the same virus or if sequences are generated from different protocols.
(NOTE: This is separate from the versioning in GenBank, we already pull in the latest version of GenBank sequences).
Description
The duplicate sequences should probably be filtered out in a new script (e.g. ingest/bin/deduplicate-records) OR potentially use the augur deduplicate command (see nextstrain/augur#919).
We probably want to keep a file with all sequences in case people want the duplicate sequences for any reason.
The deduplicated files will be the main ones used for LAPIS and/or our monkeypox builds.
The text was updated successfully, but these errors were encountered:
Update: We currently have a duplicate in the hMPX build (MPXV-M5312_HM12_Rivers from accessions MT903340 and NC_063383). It’s not a huge problem as it's not in the current outbreak.
Context
Once we've completed #32, we can use strain names to deduplicate sequences.
This is necessary in case different groups sequence the same virus or if sequences are generated from different protocols.
(NOTE: This is separate from the versioning in GenBank, we already pull in the latest version of GenBank sequences).
Description
The duplicate sequences should probably be filtered out in a new script (e.g.
ingest/bin/deduplicate-records
) OR potentially use the augur deduplicate command (see nextstrain/augur#919).We probably want to keep a file with all sequences in case people want the duplicate sequences for any reason.
The deduplicated files will be the main ones used for LAPIS and/or our monkeypox builds.
The text was updated successfully, but these errors were encountered: