Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated VCF output file renaming in kSNP3 task #207

Merged
merged 3 commits into from
Oct 26, 2023
Merged

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented Oct 4, 2023

also added 1 new File output and change the output names to be more descriptive See below for better description

Draft for now while we test in Terra

🛠️ Changes Being Made

tasks/phylogenetic_inference/task_ksnp3.wdl changes

  • added line break to 2 ksnp3 options for readability
  • changed mv commands to rename the output VCF & TSV (SNPsNotinRef file) appropriately
  • re-named output File ksnp3_core_vcf -> File ksnp3_vcf_ref_genome
  • added 2 new outputs:
    • File ksnp3_vcf_snps_not_in_ref
    • String ksnp3_vcf_ref_samplename which is the samplename of the genome used to call SNPs

workflows/phylogenetics/wf_ksnp3.wdl changes

  • re-named output File ksnp3_core_vcf -> File ksnp3_vcf_ref_genome
  • added 2 new outputs:
    • File ksnp3_vcf_snps_not_in_ref
    • String ksnp3_vcf_ref_samplename

🧠 Context and Rationale

A user pointed out that the ksnp3 output File called ksnp3_core_vcf is misleading as these are not the core genome SNPs, but rather SNPs that were found between all samples relative to the reference (reference is one of the samples included in the analysis). This PR changes this column name to ksnp3_vcf_ref_genome

Additionally, we needed a way to capture the name of the sample used as the reference so we have added 1 new output String column called ksnp3_vcf_ref_samplename which informs the user which of their samples was used as the reference for calling SNPs.

And lastly, one file that was captured in a glob task-level output, but not exposed as a workflow output is the SNPsNotinRef file (kindof a VCF file, but not really. It's more TSV-like) is now exposed as a workflow output for users to view

📋 Workflow/Task Steps

Inputs

N/A

Outputs

New outputs described above

🧪 Testing

Locally

Tested successfully locally with miniwdl (not shown)

Terra

2 independent successful tests in Terra:

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

… output and change the output names to be more descriptive
@kapsakcj
Copy link
Contributor Author

kapsakcj commented Oct 4, 2023

needs more work. ran successfully in Terra but the 2 VCF output files are not truly saved, but rather the text file that contains the path to the VCF files.

May just need to re-name the files to something predictable (in other words remove the samplename of the reference genome)

…tions to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling
@kapsakcj
Copy link
Contributor Author

kapsakcj commented Oct 6, 2023

OK, everything is running successfully/as expected in Terra. This PR is ready for review

@kapsakcj kapsakcj marked this pull request as ready for review October 6, 2023 18:26
Copy link
Contributor

@frankambrosio3 frankambrosio3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested here: https://app.terra.bio/#workspaces/theiagen-validations/ambrosio_validation_sandbox/job_history/9af8f312-2559-45d5-ae9c-40972ac02757

vcf output: gs://fc-e093cd23-1f79-4914-ba40-fef4f27492cf/submissions/9af8f312-2559-45d5-ae9c-40972ac02757/ksnp3_workflow/84b1c766-e267-4c6d-b896-ebe0e03e9a57/call-ksnp3_task/ksnp3/ecoli_VCF.reference_genome.vcf

snps not in ref: gs://fc-e093cd23-1f79-4914-ba40-fef4f27492cf/submissions/9af8f312-2559-45d5-ae9c-40972ac02757/ksnp3_workflow/84b1c766-e267-4c6d-b896-ebe0e03e9a57/call-ksnp3_task/ksnp3/ecoli_VCF_.SNPsNotinRef.tsv

@frankambrosio3 frankambrosio3 merged commit 1da4a58 into main Oct 26, 2023
5 checks passed
@kapsakcj kapsakcj deleted the cjk-ksnp3-vcf branch October 26, 2023 22:35
sage-wright pushed a commit that referenced this pull request Nov 13, 2023
* updated VCF output file renaming in kSNP3 task (#207)

* updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive

* ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling

* added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename"

* reduce unnecessary logging in MIDAS task (#210)

* made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!)

* update CI

* expose tbprofiler parameters as inputs in merlin

* input spelling

---------

Co-authored-by: Curtis Kapsak <[email protected]>
kevinlibuit added a commit that referenced this pull request Dec 29, 2023
…245)

* output tbprofiler vcf

* update default docker

* fix path

* add sample id to the beginning of the coverage report

* update default docker

* Enable TBProfiler parameter changes (#246)

* updated VCF output file renaming in kSNP3 task (#207)

* updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive

* ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling

* added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename"

* reduce unnecessary logging in MIDAS task (#210)

* made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!)

* update CI

* expose tbprofiler parameters as inputs in merlin

* input spelling

---------

Co-authored-by: Curtis Kapsak <[email protected]>

* update md5sums

* caller_options tbprofiler

* caller_options merlin magic

* --calling_params tbprofiler

* calling_params tbprofiler

* quotes around params tbprofiler

* added quotes around calling params tbprofiler

* "-C 1 -F 0.0" tbprof

* removed caller options

* hardcoded tbprofiler freebayes params

* re-optionalize

* update md5sums

* Add branch name to versioning task

* version reversion for merge

* update checksums

---------

Co-authored-by: frankambrosio3 <[email protected]>
Co-authored-by: Curtis Kapsak <[email protected]>
Co-authored-by: frankambrosio3 <[email protected]>
Co-authored-by: kevinlibuit <[email protected]>
cimendes pushed a commit that referenced this pull request Apr 15, 2024
* output tbprofiler vcf

* update default docker

* fix path

* add sample id to the beginning of the coverage report

* update default docker

* Enable TBProfiler parameter changes (#246)

* updated VCF output file renaming in kSNP3 task (#207)

* updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive

* ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling

* added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename"

* reduce unnecessary logging in MIDAS task (#210)

* made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!)

* update CI

* expose tbprofiler parameters as inputs in merlin

* input spelling

---------

Co-authored-by: Curtis Kapsak <[email protected]>

* update md5sums

* caller_options tbprofiler

* caller_options merlin magic

* --calling_params tbprofiler

* calling_params tbprofiler

* quotes around params tbprofiler

* added quotes around calling params tbprofiler

* "-C 1 -F 0.0" tbprof

* removed caller options

* hardcoded tbprofiler freebayes params

* re-optionalize

* update md5sums

* draft tbprofiler tngs

* add versioning

* add to dockstore

* commenting out clockwork to try and fix bugs?

* chop 30 bp from both sides

* update workflow to use trimmomatic chop

* remove whitespace cruft

* merge into regular trimmomatic task

* change naming

* prevent widespread failures

* update to latest version of tbp_parser and enable tngs bed file

* update md5sum

* tngs updates

* update docker

* update docker

* udpate paths

* update docker

* enable rrs & rrl frequency changeing

* update docker

* update md5sum

* write all tbprofiler outputs

* remove comment cruft

* remove database for tbprofiler; broken

* fix issue

* update version and allow for user-modifable expert rule regions bed file

* add rpob & etha freq modification params

* add new parameters for modification

* optionalize

* v1.3.9

* new version!!!!!!1

* update md5sums

---------

Co-authored-by: frankambrosio3 <[email protected]>
Co-authored-by: Curtis Kapsak <[email protected]>
Co-authored-by: frankambrosio3 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants