-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated VCF output file renaming in kSNP3 task #207
Conversation
… output and change the output names to be more descriptive
needs more work. ran successfully in Terra but the 2 VCF output files are not truly saved, but rather the text file that contains the path to the VCF files. May just need to re-name the files to something predictable (in other words remove the samplename of the reference genome) |
…tions to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling
OK, everything is running successfully/as expected in Terra. This PR is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vcf output: gs://fc-e093cd23-1f79-4914-ba40-fef4f27492cf/submissions/9af8f312-2559-45d5-ae9c-40972ac02757/ksnp3_workflow/84b1c766-e267-4c6d-b896-ebe0e03e9a57/call-ksnp3_task/ksnp3/ecoli_VCF.reference_genome.vcf
snps not in ref: gs://fc-e093cd23-1f79-4914-ba40-fef4f27492cf/submissions/9af8f312-2559-45d5-ae9c-40972ac02757/ksnp3_workflow/84b1c766-e267-4c6d-b896-ebe0e03e9a57/call-ksnp3_task/ksnp3/ecoli_VCF_.SNPsNotinRef.tsv
* updated VCF output file renaming in kSNP3 task (#207) * updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive * ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling * added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename" * reduce unnecessary logging in MIDAS task (#210) * made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!) * update CI * expose tbprofiler parameters as inputs in merlin * input spelling --------- Co-authored-by: Curtis Kapsak <[email protected]>
…245) * output tbprofiler vcf * update default docker * fix path * add sample id to the beginning of the coverage report * update default docker * Enable TBProfiler parameter changes (#246) * updated VCF output file renaming in kSNP3 task (#207) * updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive * ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling * added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename" * reduce unnecessary logging in MIDAS task (#210) * made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!) * update CI * expose tbprofiler parameters as inputs in merlin * input spelling --------- Co-authored-by: Curtis Kapsak <[email protected]> * update md5sums * caller_options tbprofiler * caller_options merlin magic * --calling_params tbprofiler * calling_params tbprofiler * quotes around params tbprofiler * added quotes around calling params tbprofiler * "-C 1 -F 0.0" tbprof * removed caller options * hardcoded tbprofiler freebayes params * re-optionalize * update md5sums * Add branch name to versioning task * version reversion for merge * update checksums --------- Co-authored-by: frankambrosio3 <[email protected]> Co-authored-by: Curtis Kapsak <[email protected]> Co-authored-by: frankambrosio3 <[email protected]> Co-authored-by: kevinlibuit <[email protected]>
* output tbprofiler vcf * update default docker * fix path * add sample id to the beginning of the coverage report * update default docker * Enable TBProfiler parameter changes (#246) * updated VCF output file renaming in kSNP3 task (#207) * updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive * ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling * added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename" * reduce unnecessary logging in MIDAS task (#210) * made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!) * update CI * expose tbprofiler parameters as inputs in merlin * input spelling --------- Co-authored-by: Curtis Kapsak <[email protected]> * update md5sums * caller_options tbprofiler * caller_options merlin magic * --calling_params tbprofiler * calling_params tbprofiler * quotes around params tbprofiler * added quotes around calling params tbprofiler * "-C 1 -F 0.0" tbprof * removed caller options * hardcoded tbprofiler freebayes params * re-optionalize * update md5sums * draft tbprofiler tngs * add versioning * add to dockstore * commenting out clockwork to try and fix bugs? * chop 30 bp from both sides * update workflow to use trimmomatic chop * remove whitespace cruft * merge into regular trimmomatic task * change naming * prevent widespread failures * update to latest version of tbp_parser and enable tngs bed file * update md5sum * tngs updates * update docker * update docker * udpate paths * update docker * enable rrs & rrl frequency changeing * update docker * update md5sum * write all tbprofiler outputs * remove comment cruft * remove database for tbprofiler; broken * fix issue * update version and allow for user-modifable expert rule regions bed file * add rpob & etha freq modification params * add new parameters for modification * optionalize * v1.3.9 * new version!!!!!!1 * update md5sums --------- Co-authored-by: frankambrosio3 <[email protected]> Co-authored-by: Curtis Kapsak <[email protected]> Co-authored-by: frankambrosio3 <[email protected]>
also added 1 new File output and change the output names to be more descriptiveSee below for better descriptionDraft for now while we test in Terra🛠️ Changes Being Made
tasks/phylogenetic_inference/task_ksnp3.wdl
changesmv
commands to rename the output VCF & TSV (SNPsNotinRef file) appropriatelyFile ksnp3_core_vcf
->File ksnp3_vcf_ref_genome
File ksnp3_vcf_snps_not_in_ref
String ksnp3_vcf_ref_samplename
which is the samplename of the genome used to call SNPsworkflows/phylogenetics/wf_ksnp3.wdl
changesFile ksnp3_core_vcf
->File ksnp3_vcf_ref_genome
File ksnp3_vcf_snps_not_in_ref
String ksnp3_vcf_ref_samplename
🧠 Context and Rationale
A user pointed out that the ksnp3 output File called
ksnp3_core_vcf
is misleading as these are not the core genome SNPs, but rather SNPs that were found between all samples relative to the reference (reference is one of the samples included in the analysis). This PR changes this column name toksnp3_vcf_ref_genome
Additionally, we needed a way to capture the name of the sample used as the reference so we have added 1 new output String column called
ksnp3_vcf_ref_samplename
which informs the user which of their samples was used as the reference for calling SNPs.And lastly, one file that was captured in a glob task-level output, but not exposed as a workflow output is the SNPsNotinRef file (kindof a VCF file, but not really. It's more TSV-like) is now exposed as a workflow output for users to view
📋 Workflow/Task Steps
Inputs
N/A
Outputs
New outputs described above
🧪 Testing
Locally
Tested successfully locally with miniwdl (not shown)
Terra
2 independent successful tests in Terra:
🔬 Quality checks
Pull Request (PR) checklist: