Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] S gene coverage #39

Open
molly-hetheringtonrauth opened this issue Sep 18, 2024 · 2 comments
Open

[FEATURE] S gene coverage #39

molly-hetheringtonrauth opened this issue Sep 18, 2024 · 2 comments
Assignees

Comments

@molly-hetheringtonrauth
Copy link
Contributor

molly-hetheringtonrauth commented Sep 18, 2024

Feature Request

CDC selects samples for their dashboard (and models) that have a complete S gene. We need to begin tracking the S gene so we know how many of our samples are making it to the CDC dashboard, so we are meeting our deliverables. We need to incorporate this into our wdl and makes sure we are getting usable outputs.

Solution

Notes from data meeting

  • we probably want to use the consensus sequence to calculate the percent coverage; samtools coverage doesn't allow a min depth, so the percent coverage would be inflated.
  • Nextclade (dev version) - nextclade.csv file provides percent coverage per gene available (with some required parsing).
  • to get coverage across amplicons, use the alignment consensus sequence and coordinates of the amplicon regions.
  • weird consensus calling with medaka
  • we will wait to see what Sam comes up with regarding the analysis he is performing using past data.

Upstream effects

None(?)

Downstream effects

  • Updating BigQuery Data Transfers to account for the new column headers of the results summary file.
@molly-hetheringtonrauth molly-hetheringtonrauth changed the title [REQUIREMENT] S gene coverage depth for data tracking and for QC notebook [FEATURE] S gene coverage Sep 18, 2024
@arianna-smith
Copy link
Contributor

Time estimates for subsections:
WDL changes and testing - 24
Bigquery schema updates - 18
BigQuery table and Tableau dashboard - 12
Any other possible breaks - 16

@molly-hetheringtonrauth
Copy link
Contributor Author

molly-hetheringtonrauth commented Oct 23, 2024

nextclade development version has an output that tells you S gene coverage; once nextclade releases this update we could tie this issue with #18 .

nextclade PR: nextstrain/nextclade#1514

As a side note - we might want to use percent coverage from nextclade instead of our custom script to be consistent and have one source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants