Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flu: include subclade proposals #236

Merged
merged 10 commits into from
Nov 4, 2024
Merged

flu: include subclade proposals #236

merged 10 commits into from
Nov 4, 2024

Conversation

rneher
Copy link
Member

@rneher rneher commented Oct 27, 2024

@huddlej

this is my shot at including the subclade proposals.

@rneher rneher had a problem deploying to refs/pull/236/merge October 27, 2024 17:30 — with GitHub Actions Failure
@rneher rneher had a problem deploying to refs/pull/236/merge October 27, 2024 18:52 — with GitHub Actions Failure
@rneher rneher temporarily deployed to refs/pull/236/merge October 27, 2024 18:54 — with GitHub Actions Inactive
@rneher rneher temporarily deployed to refs/pull/236/merge October 27, 2024 19:04 — with GitHub Actions Inactive
@rneher
Copy link
Member Author

rneher commented Oct 27, 2024

After some hick-ups with inadvertently updated Yam trees, this now included the proposed_clades for H3 and H1. I have hid them in the results table, but they are exported into the table and inspectable on the tree.

@ivan-aksamentov
Copy link
Member

I looked at H3N2

https://clades.nextstrain.org/?dataset-server=gh:@flu/clade-proposals@

and the column seems to be there:

01

A couple of very pedantic complaints from my side:

  • we now have all possible case conventions: seqName, short-clade, proposed_clade. Would be nice to stick to something:

    find data/ -type f -name 'tree.json' -exec sed -i 's/proposed_clade/subclade-proposal/g' {} +

    But I see that the source repo also making choices, so it is probably better to be consistent with the source.

  • The key proposed_clade has clade in it, but the value resembles subclade values the most, not clade.

  • Display name "Subclade proposal" deviates from the key proposed_clade in 2 ways: clade vs sublcade, and then the order of words. Though, if a column is hidden in web, then no immediate problem. Can be improved later if needed.

  • Description could provide some more info - who proposes and why and how. Right now you have to go and see the source repo to understand the big picture. Description is isabled in in web along with the column right now, can be improved any time.

I realize also that those people who are interested in this column are probably already deep in the trenches and are familiar with the different nomenclatures and the battles there. They should not be scared by the subtle differences in naming conventions. But as an engineer, I have to bark when I see it :)

Comment on lines 804 to 808
"clades": 25,
"clades": 24,
"customClades": {
"subclade": 20,
"short-clade": 16
"short-clade": 16,
"subclade": 21,
"proposed_clade": 21
Copy link
Member

@ivan-aksamentov ivan-aksamentov Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably guys discussed and decided on something already, but just mentioning that the clades and subclades are still disappearing. If that's fine or if the solution is in the works, then ignore this - I am just not in the loop on the latest updates.

I think John mentioned that there's a way to keep a consistent set of clades when building the tree.

The code that gathers this clade and clade-like attr info is here, should you need to reuse it of improve on it:

clades = []
custom_clades = {}
if tree_json_path is not None and isfile(tree_json_path):
tree_json = json_read(tree_json_path)
clades = tree_find_clades(tree_json)
custom_clades = tree_find_clade_like_attrs(tree_json)

(initially I thought I'd add a list of all attrs there, but sc2 has 3000+ lineages across 5 datasets, so it's too much for an index file fetched on every visit; might dump this info separately quite easily though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivan-aksamentov Thanks for catching this. The problem we discussed before was specific to H3N2 HA clades and this approach to force-including strains from each clade has fixed the issue in this PR. We just need to dig into the missing clades/subclades for H1 HA and repeat the process (which is what I think Richard was attempting with this update to H1 reference strains).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like 6B.1A.4 is missing from this PR's version of the H1 HA broad dataset. Adding A/Nagano/2649/2018 to the force-included references should fix this.

@huddlej
Copy link
Contributor

huddlej commented Oct 28, 2024

@rneher I made a couple of minor changes in this PR based on @ivan-aksamentov's comments above. If those changes look ok, I can update this PR to use the corresponding dataset output.

@rneher
Copy link
Member Author

rneher commented Oct 29, 2024

Good points regarding name standardization. ATM, we are definitely very inconsistent.
Nextclade columns tends to be all camelCase. In the python code base, we mostly use snake_case. And in addition we have added short-clade here and clades-long (I think for file names, we have often used kebab-case).

At some point, we'll probably be able to ditch or demote short-clade.

What one could do here and in the source repo is to rename all

  • the NextClade output column to proposedSubclade
  • the color by in the tree to Proposed Subclade
  • the file names in the source repo to proposed-subclades

But happy to consider other proposals.

@rneher rneher temporarily deployed to refs/pull/236/merge November 3, 2024 21:47 — with GitHub Actions Inactive
@rneher rneher deployed to refs/pull/236/merge November 3, 2024 21:58 — with GitHub Actions Active
@rneher rneher merged commit 2d12779 into master Nov 4, 2024
@rneher rneher deleted the flu/clade-proposals branch November 4, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants