-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TheiaCoV/TheiaProk/TheiaMeta/TheiaEuk/Freyja_FASTQ] fastq-scan
updates & improvements. Adding JSON as wf output file
#662
Conversation
…educe disk to 50gb and cpu to 1; added set -euo pipefail; removed capture of date; added debug statements to cleanup STDOUT/logs; removed unnecessary cat commands with parsing output JSON; renamed 2 output files; enabled preemptible VM usage
…; reduced disk to 50gb and cpu to 1; added set -euo pipefail; added DEBUG statements and cleaned up STDOUT for clear log files; renamed outputs to mention json; removed collection and output of DATE; enabled preemptible VMs
…l/setup CI env. hopefully that doesn't break everything
@@ -2,7 +2,6 @@ name: pytest-env-CI | |||
channels: | |||
- conda-forge | |||
- bioconda | |||
- defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have more than 200 people in our organization, but most folks are moving away from using the defaults
channel due to Anaconda updating their ToS: https://www.anaconda.com/blog/is-conda-free
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking up on this & making the quick fix!
|
…tq-scan JSON outputs to theiameta_illumina_pe wf; updated read_qc_trim_ont subworkflow description since it was inaccurate
fastq-scan
updates & improvements. Adding JSON as wf output filefastq-scan
updates & improvements. Adding JSON as wf output file
|
…s inputs to theiaprok_illumina_pe and se workflows. need to test in terra
fastq-scan
updates & improvements. Adding JSON as wf output filefastq-scan
updates & improvements. Adding JSON as wf output file
… theiameta, and theiaprok wfs
@kapsakcj I know this is still in draft form, but what do you think about outputting the fastq-scan JSONs as an array? That way they only add a single column to Terra table (instead of up to 4), but also allows you to easily access them for support troubleshooting? |
moving back to to draft state since CI is failing for some unknown reason and since we're considering changing output to an array instead |
After some consideration with the team, we will leave these as separate outputs, not an Array. We will engage the "output consolidation" aspect when we implement TheiaCoV/TheiaProk/etc. "light" versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests successful
Freyja_FASTQ
theiacov_clearlabs
theiacov_illumina_pe
theiacov_illumina_se
theiameta_illumina_pe
theiaprok_illumina_pe
theiaeuk_illumina_pe
Clean, simple updates! Thanks @kapsakcj
@@ -2,7 +2,6 @@ name: pytest-env-CI | |||
channels: | |||
- conda-forge | |||
- bioconda | |||
- defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking up on this & making the quick fix!
Will update this message laterThis PR closes #471 and closes #571
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
This PR updates 2 frequently used tasks (PE and SE versions of the task)
tasks/quality_control/basic_statistics/task_fastq_scan.wdl
and all workflows that use this task to runfastq-scan
I mainly did this PR so that the fastq-scan JSON output files are output at the workflow level, but improved other aspects along the way
⚡ Impacted Workflows/Tasks
tasks/quality_control/basic_statistics/task_fastq_scan.wdl
set -euo pipefail
in case an error is thrownread1_fastq_scan_report
toread1_fastq_scan_json
fastq-scan
, depending on paired end or single end dataworkflows/freyja/wf_freyja_fastq.wdl
workflows/theiacov/wf_theiacov_clearlabs.wdl
workflows/theiacov/wf_theiacov_illumina_pe.wdl
workflows/theiacov/wf_theiacov_illumina_se.wdl
workflows/theiaeuk/wf_theiaeuk_illumina_pe.wdl
workflows/theiameta/wf_theiameta_illumina_pe.wdl
workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
workflows/theiaprok/wf_theiaprok_illumina_se.wdl
workflows/utilities/wf_read_QC_trim_pe.wdl
workflows/utilities/wf_read_QC_trim_se.wdl
defaults
channel due to Anaconda changing their ToS so you have to pay if you use that channel when your organization includes 200+ employees. AFAIK the CI environment does not use any packages from that channel, but better to be safe than sorry. CI runs totally fine without the channel.workflows/utilities/wf_read_QC_trim_ont.wdl
(no new output files, just corrected themeta: description
section)'This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
⚙️ Algorithm
upgraded to v1.0.1 of fastq-scan which has support for "large" FASTQs. Not entirely sure what that means other than a slight change in the cpp code, but it's more robust
➡️ Inputs
listed above. Inputs that changed are all runtime related
⬅️ Outputs
new JSON outputs for read1 and read2, both raw FASTQs and "cleaned" FASTQs to all subworkflows (read_QC subwfs) and workflows that use this task
🧪 Testing
Suggested Scenarios for Reviewer to Test
Would be good to test any of the impacted workflows. No need to test export_taxon_table functionality as I've already done so for both TheiaProk wfs.
🔬 Final Developer Checklist
🎯 Reviewer Checklist