-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Workflow] Flye_denovo to replace DragonFlye #692
Open
fraser-combe
wants to merge
59
commits into
main
Choose a base branch
from
smw-flye-dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…der structure tasks and rename to skip polish and skip trim, medaka single polish
…n in racon, update flye param names passed miniwdl check
…_plot output to workflows
…rease maxRetries for Medaka task and capture selected model, update docs theiaprok
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR closes #611, #585, and #565.
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
This PR introduces a new
flye_denovo
workflow as a replacement for theDragonflye
workflow. The updated workflow streamlines the assembly and polishing pipeline, focusing on being flexible and modular with the addition of assembly visualization through Bandage plots.Notable enhancements include:
New -tasks, including optional read trimming with
Porechop
, enhanced assembly visualization withBandage
, and multiple polishing options. Supports ONT data, hybrid assemblies with Illumina reads, and multiple assembly polishing tools (Medaka
,Racon
, andPolypolish
).Medaka polishing is set at 1 round as recommended by Rwick, and ONT
⚡ Impacted Workflows/Tasks
flye_denovo
workflow.Dragonflye
workflow.task_porechop.wdl
task_flye.wdl
task_bandageplot.wdl
task_bwa.wdl
task_medaka.wdl
task_racon.wdl
task_dnaapler.wdl
task_polypolish.wdl
task_filtercontigs.wdl
removes task_dragonfly.wdl
This PR may lead to different results in pre-existing outputs: Yes
This PR uses an element that could cause duplicate runs to have different results: Yes
🛠️ Changes
flye_denovo.wdl
to replaceDragonflye
. as a sub workflow⚙️ Algorithm
flye_denovo
workflow replaces theDragonflye
workflow, with a modular and flexible structure that separates tasks like trimming, assembly, polishing, and final orientation for clarity and maintainability.Polypolish
.Flye
,Porechop
,Medaka
,Racon
,Polypolish
,Bandage
, andDnaapler
.Flye
,Medaka
,Racon
,dnaapler
and other tasks to their latest stable versions➡️ Inputs
No
⬅️ Outputs
Added bandage plot png output
version outputs for task level software
medaka models used
Assembly_fasta output from dnaapler for downstream analyses
🧪 Testing
Scenarios tested within TheiaProk - Expected TheiaProk workflow to complete successfully for each task and specifically for flye_denovo workflow we expect to see successful creation of assembly fasta after any filtering or polishing conducted.
Default path Flye>Medaka Polish>Filtercontige>dnaApler
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/586af547-03dd-4cb8-8877-8041d0064464
medaka output model and version
Porechop run i.e skip_trim_reads = false
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/e5a645db-0e17-4c58-84ce-4e1f44ef9042
Skip polishing skip_polishing = true
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/110a4ffa-c208-4849-a3b2-11d88ffddc90
Racon polishing pathway (polishing_rounds = 2)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/95694506-946f-4ef7-9b3d-657522cc7809
Hybrid assembly ONT data and Illumina (Polypolish and BWA)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/4fa9c915-4572-43b9-bd9b-28bed40c75a4
##Comparisons between DragonFlye and New Flye_denovo subworkflow##
Here we are looking for similarities in assemblies, statistics and downstream analyses. 8 bacterial samples selected
Both workflows produce assemblies of similar lengths for each sample, with minor variations (typically within ±1%).
Both workflows achieve high BUSCO completeness scores, generally above 90%.
Both workflows consistently predict the same taxa for each sample.
Example data comparisons table
Downstream analyses
-Gambit Taxon: Identical predictions across workflows for each sample.
Add in comparison results
-Both workflows produce identical results for most downstream analyses, ensuring reliable serotype predictions, taxonomic classifications, and virulence gene identifications.
Finally the 44 validation ONT raw data samples were ran through Flye denovo and samples were checked manually to compare against previously ran Dragonflye submissions and we found similar comparable results
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/47b75bb4-2a1b-4e41-b5f3-2f421e4e38ed
Suggested Scenarios for Reviewer to Test
Parameters to test:
skip_trim_reads: true
skip_polishing: false
polishing_rounds: 1
Expected outputs: Final polished assembly in FASTA format.
Metadata output for versions used (e.g., Flye, Medaka).
No trimming or filtering applied.
Successful Bandage plot and GFA graph generation.
🔬 Final Developer Checklist
workflows_overview
tables.🎯 Reviewer Checklist