-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KmerFinder to TheiaProk #188
Conversation
Launched test of TheiaProk_FASTA_PHB: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/3b03f4db-7036-4289-a4cc-020b22c89f82 |
Thanks for all of the changes @cimendes! I just launched a test of TheiaProk_Illumina_PE_PHB https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/f644e3cd-a563-473f-b9d3-9e9f4c0fbc8b |
Bug: If kmerfinder does not make any prediction (the result file is empty), the task will populate the string output with the header line. See example sample in this run: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/f644e3cd-a563-473f-b9d3-9e9f4c0fbc8b. If there is no output from Kmerfinder, the ideal outcome is for these strings to be populated with "No match detected" or similar. |
Thanks @michellescribner for finding this nasty bug! I've added a patch that fixes this scenario and the headers no longer are populated in the datatable. Instead the "No hit detected in database" string appears! |
Thank you @cimendes for making this change so quickly! I am relaunching my tests for: |
Tested by @kelseykropp and found that 27/27 samples showed results that matched the kmerfinder online tool. https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Kropp_Sandbox/job_history/be7a6dca-b5b0-4376-92a9-a827bbbc1da8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @cimendes!
…l TheiaCov wfs (#208) * update default nextclade dataset tag for SC2 to "2023-09-21T12:00:00Z". applied on wfs theiacov clearlabs, fasta, ilmn pe, ilmn se, ont * KmerFinder to TheiaProk (#188) * skeleton on kmerfinder task * remove kmerfinder_db_name * still trying to get kmerfinder to run * is this working now?!? * first working version of kmerfinder task * parse top hit * make results output file optional * final version of task (for now..) * add kmerfinder_bacteria to theiaprok! * fix typo * pass empty file for tests - kmerfinder_db * fix input json * add kmerfinder query coverage to theiaprok SE, PE, FASTA and ONT * remove undeeded skip on kmerfinder (set to false by default) * fix ops * update md5sum (part1) * update md5sum (part2) * fiz sister typo * missed a file * update CI * add kmerfinder_template_coverage to theiaprok output * expose database name on kmerfinder outputs * update md5sum * fix bug - headers being outputed to datatable * update default nextclade dataset tag for SC2 to "2023-09-21T12:00:00Z". applied on wfs theiacov clearlabs, fasta, ilmn pe, ilmn se, ont * update CI --------- Co-authored-by: Inês Mendes <[email protected]>
Closes #187
🛠️ Changes Being Made
This PR adds the option to run Kmerfinder for bacterial genomes if the user requests it.
Bonus: sistr's sistr_allele_fasta no longer reads sister_allele_fasta (@emmadoughty)
🧠 Context and Rationale
gs://theiagen-public-files-rp/terra/theiaprok-files/kmerfinder_bacteria.tar.gz
. It's about 30Gb in size.📋 Workflow/Task Steps
Inputs
call_kmerfinder
set tofalse
by defaultOutputs
🧪 Testing
Locally
Tested task locally with
miniwdl
Terra
Underway!!!TheiaProk_Illumina_PE (n=55): https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/349c9d0f-b2b6-4e94-8156-7680a372a7b6
TheiaProk_Illumina_SE (n=55): https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/47a60d7f-e33b-4cda-97d3-2bb5fd22917b
TheiaProk_FASTA (n=55): https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/ef0e66e5-9d3f-4199-bde3-807b1db7e2f2
Table of differences between Gambit and KmerFinder:
amrfinderplus_testing_sample.zip
TheiaProk_ONT (n=44): https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/6f46335b-1d52-490b-b289-ee274a4fbc86
🔬 Quality checks
Pull Request (PR) checklist: