-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flu: include subclade proposals #236
Conversation
After some hick-ups with inadvertently updated Yam trees, this now included the |
I looked at H3N2 https://clades.nextstrain.org/?dataset-server=gh:@flu/clade-proposals@ and the column seems to be there: A couple of very pedantic complaints from my side:
I realize also that those people who are interested in this column are probably already deep in the trenches and are familiar with the different nomenclatures and the battles there. They should not be scared by the subtle differences in naming conventions. But as an engineer, I have to bark when I see it :) |
data_output/index.json
Outdated
"clades": 25, | ||
"clades": 24, | ||
"customClades": { | ||
"subclade": 20, | ||
"short-clade": 16 | ||
"short-clade": 16, | ||
"subclade": 21, | ||
"proposed_clade": 21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably guys discussed and decided on something already, but just mentioning that the clades and subclades are still disappearing. If that's fine or if the solution is in the works, then ignore this - I am just not in the loop on the latest updates.
I think John mentioned that there's a way to keep a consistent set of clades when building the tree.
The code that gathers this clade and clade-like attr info is here, should you need to reuse it of improve on it:
nextclade_data/scripts/rebuild
Lines 43 to 48 in 745ffb9
clades = [] | |
custom_clades = {} | |
if tree_json_path is not None and isfile(tree_json_path): | |
tree_json = json_read(tree_json_path) | |
clades = tree_find_clades(tree_json) | |
custom_clades = tree_find_clade_like_attrs(tree_json) |
(initially I thought I'd add a list of all attrs there, but sc2 has 3000+ lineages across 5 datasets, so it's too much for an index file fetched on every visit; might dump this info separately quite easily though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivan-aksamentov Thanks for catching this. The problem we discussed before was specific to H3N2 HA clades and this approach to force-including strains from each clade has fixed the issue in this PR. We just need to dig into the missing clades/subclades for H1 HA and repeat the process (which is what I think Richard was attempting with this update to H1 reference strains).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like 6B.1A.4 is missing from this PR's version of the H1 HA broad dataset. Adding A/Nagano/2649/2018 to the force-included references should fix this.
@rneher I made a couple of minor changes in this PR based on @ivan-aksamentov's comments above. If those changes look ok, I can update this PR to use the corresponding dataset output. |
Good points regarding name standardization. ATM, we are definitely very inconsistent. At some point, we'll probably be able to ditch or demote What one could do here and in the source repo is to rename all
But happy to consider other proposals. |
@huddlej
this is my shot at including the subclade proposals.