-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TheiaMeta: Viral Metagenomics workflow #64
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…form assembly with ivar, otherwise use shovil with megahit assembler
…d return the one with the highest base count
…and consensus assembly are generated concurrently, and the final contig is selected based on final assembly length (consensus len or total basepairs in aligned de novo contigs)
…apping. allow for the output of sam file format instead of the default PAF file.
…hese reads are now available under `read1_unmapped` and `read2_unmapped`
… directly, deprecating the bam2fastq task
* added helpful comments and changed read concatenation block to use samplenames that have hyphens instead of underscores. ran successfully with miniwdl, terra testing next * add frame work for GHA, working theiaprok_illumina_pe workflow * add placeholder filters, update filter check * avoid processing empty list * debug filtering * debug filtering * debug filtering * debug filtering * going back to original filter * add filter in single file * working prototype of updated qc_check * add tests for theiaprok_illumina_se, fix missing slash in shovill se task * add all theiaprok inputs to qc check task * remove ani species match from inputs, not used * add qc check the theiaprok_se * change workflows to use qc_check_phb task * fix task name * parse busco results * add qc_check_phb to theiaeuk * put busco results variable in quotes error * add qc check to theiaprok fasta and ont * update md5sums * add qc check to theiacov wfs * fix more md5sums * fix output file name * PHBG v1.3.0 changes - vibrio subworkflow * update description * expose min freq input for consensus and variant tasks * fix variable names * fix variable types * fix kraken empty string error * wdl doesn't have else * avoid empty string outputs * variable read as empty if equal to zero, enclose in quotes * update workflows to remove optional kraken outputs * update nor reproducible md5s * fix ci and local not matching * add quast to theiaprok fasta * add gc percent to theiaeuk outputs * add min_freq inputs to theiacov_illumina_se * add gc percent to theiaprok and theiaeuk qc check * add num reads to qc check theiaprok theiaeuk * recursion for assembly length check creates bug so removed * fix typo in pytest_filter.yml * add qc_check_phb task check to gha * update gha md5sums and qc check checks * typo corrected and fixed spacing on optional input * updated sra_fetch workflow to use fastq-dl v2.0.1. also exposed optional inputs for docker, disk_size, memory, cpus. tested fine with miniwdl * fix error on theiaprok * update checksums --------- Co-authored-by: kapsakcj <[email protected]> Co-authored-by: Robert A. Petit III <[email protected]> Co-authored-by: Sage Wright <[email protected]> Co-authored-by: Michelle Scribner <[email protected]> Co-authored-by: kevinlibuit <[email protected]> Co-authored-by: kevinlibuit <[email protected]>
* added helpful comments and changed read concatenation block to use samplenames that have hyphens instead of underscores. ran successfully with miniwdl, terra testing next * add frame work for GHA, working theiaprok_illumina_pe workflow * add placeholder filters, update filter check * avoid processing empty list * debug filtering * debug filtering * debug filtering * debug filtering * going back to original filter * add filter in single file * working prototype of updated qc_check * add tests for theiaprok_illumina_se, fix missing slash in shovill se task * add all theiaprok inputs to qc check task * remove ani species match from inputs, not used * add qc check the theiaprok_se * change workflows to use qc_check_phb task * fix task name * parse busco results * add qc_check_phb to theiaeuk * put busco results variable in quotes error * add qc check to theiaprok fasta and ont * update md5sums * add qc check to theiacov wfs * fix more md5sums * fix output file name * PHBG v1.3.0 changes - vibrio subworkflow * update description * expose min freq input for consensus and variant tasks * fix variable names * fix variable types * fix kraken empty string error * wdl doesn't have else * avoid empty string outputs * variable read as empty if equal to zero, enclose in quotes * update workflows to remove optional kraken outputs * update nor reproducible md5s * fix ci and local not matching * add quast to theiaprok fasta * add gc percent to theiaeuk outputs * add min_freq inputs to theiacov_illumina_se * add gc percent to theiaprok and theiaeuk qc check * add num reads to qc check theiaprok theiaeuk * recursion for assembly length check creates bug so removed * fix typo in pytest_filter.yml * add qc_check_phb task check to gha * update gha md5sums and qc check checks * typo corrected and fixed spacing on optional input * updated sra_fetch workflow to use fastq-dl v2.0.1. also exposed optional inputs for docker, disk_size, memory, cpus. tested fine with miniwdl * fix error on theiaprok * update checksums --------- Co-authored-by: kapsakcj <[email protected]> Co-authored-by: Robert A. Petit III <[email protected]> Co-authored-by: Sage Wright <[email protected]> Co-authored-by: Michelle Scribner <[email protected]> Co-authored-by: kevinlibuit <[email protected]> Co-authored-by: kevinlibuit <[email protected]>
TODO:
|
…ed and unmapped read files, as well as some assembly statistics regarding those files
andrewjpage
force-pushed
the
im-metagenomics-workflow
branch
from
August 17, 2023 10:20
a831131
to
99e9496
Compare
…cs into im-metagenomics-workflow
jrotieno
approved these changes
Sep 20, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runs great, well done @cimendes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #110
Setting as draft as development is still underway.🛠️ Changes Being Made
This PR features a new workflow, TheiaMeta_Illumina_PE, for the assembly of viral metagenomic data.
The diagram of the workflow is available below:
🧠 Context and Rationale
📋 Workflow/Task Steps
Please see the diagram above.
This workflow takes in Illumina PE data and performs:
The following quality metrics are computed:
Inputs
Mandatory inputs:
Optional outputs:
Outputs
🧪 Testing
Locally
Tests passed locally with HAV blood sample with commit id 2d1a69b:
miniwdl run -v /home/ines_mendes/Git/public_health_bioinformatics/workflows/metagenomics/wf_theiameta_illumina_pe.wdl read1= ~/Test/HAV_Metagenomics/HAV0024_S8_L001_R1_001.fastq.gz read2= ~/Test/HAV_Metagenomics/HAV0024_S8_L001_R2_001.fastq.gz samplename=HAV0024 reference= ~/Test/HAV_Metagenomics/HAV.fasta
Terra
commit id 2d1a69b
🔬 Quality checks
Pull Request (PR) checklist: