-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kraken2] Add module to recalculate abundances based on fragment length - Kraken2_ont wf and TheiaCoV_ONT wf #240
Conversation
…rse_classified task to recalculate abundances based on basepairs rather than number of reads from kraken2 classified reads and report outputs
…ified the percent_human, percent_sc2, percent_target_org and kraken_target_org outputs
… kraken2 outputs by the new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to doublecheck that the total number of reads should be coming from the full table instead of just the classified reads table. Tests all went well:
Kraken2_ONT testing here: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/job_history/425f1603-3686-4634-8206-db7589e6aaaa
TheiaCoV_ONT here: https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/f5b52a85-be35-4b91-b93c-654acf7ed5a2
Kraken2_PE testing here: https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/64a6c758-b139-4871-9990-2d80ca0ed7a5
Kraken2_SE testing here: https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/2c6834ef-f195-48fb-b34e-8c5f2542d2ab
Remove classified reads as output fro ONT workflow. Update CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results look good!
Closes #167
🛠️ Changes Being Made
This PR implements a new task
kraken2_parse_classified
that takes as input the classified reads file from Kraken2, alongside the Kraken2 report. This task computes the abundances based on fragment length for each taxon_id in the classified reads file, and parses the report to populate the taxon name.The new output report has the following structure:
Percent
,Num_basepairs
,Rank
,Taxon_ID
,Name
As with Kraken2 report, the header is not included.
Additionally, a previously unknown error has been patched on the
kraken2_theiacov
tasks that would fail when a target organism was passed due to a syntax error.Impacted Workflows/Tasks
A new workflow has been added:
Kraken2_ONT_PHB
The following workflow has been adjusted to include the new abundance recalculation step:
TheiaCoV_ONT_PHB
🧠 Context and Rationale
An assessment was performed to evaluate the performance of Kraken2 on long error-prone Oxford Nanopore reads. In this assessment, the recalculation of abundances based on the number of basepairs (instead of Kraken2's default behaviour of calculating abundances based on fragment number) has successfully computed the expected results.
📋 Workflow/Task Steps
For
Kraken2_ONT_PHB
, the following steps are taken:kraken2_parse_classified
task where abundances are recalculatedFor
TheiaCoV_ONT_PHB
, the following steps were added:read_QC_trim_ont
subworkflow, the classified reads file and report file are passed to the newkraken2_parse_classified
task where abundances are recalculated for both raw and dehosted readsInputs
For
Kraken2_ONT_PHB
:For
TheiaCoV_ONT_PHB
:Outputs
For
Kraken2_ONT_PHB
:For
TheiaCoV_ONT_PHB
:Impacted Outputs
For
TheiaCoV_ONT_PHB
:🧪 Testing
Locally
Terra
Underway
Kraken2_ONT_PHB
9 in silico samples of human sequences mixed with target viral sequences: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/b795db74-97fa-49f4-a851-1cbdad2b21e8
Scenarios for Reviewer to Test
target_org
Kraken2_ONT_PHB
workflow on known abundance samplesTheiaCoV_ONT_PHB
on samples with known abundance and assert the new results🔬 Quality checks
Pull Request (PR) checklist: