Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plasmidfinder task bugfix and updates #191

Merged
merged 2 commits into from
Sep 19, 2023
Merged

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented Sep 15, 2023

TY @erikwolfsohn for reporting this bug!

🛠️ Changes Being Made

  • decrease cpus to 2,
  • decrease memory to 8
  • decrease disk_size to 50
  • Fixed bug with output string of plasmid replicons identified
  • Also made some mv commands verbose

🧠 Context and Rationale

Previously, if plasmidfinder identified 2 identical plasmid replicon genes, those genes would not appear in the plasmidfinder_plasmids output string. Example:

Database	Plasmid	Identity	Query / Template length	Contig	Position in contig	Note	Accession number
Rep1	rep21	100.0	763 / 996	contig00175 len=3332 cov=1724.3 corr=0 origname=Contig_2_1724.33_Circ_pilon sw=shovill-skesa/1.1.0 date=20230728	2570..3332	rep(pSA1308)	AB254848
Rep1	rep21	100.0	763 / 996	contig00175 len=3332 cov=1724.3 corr=0 origname=Contig_2_1724.33_Circ_pilon sw=shovill-skesa/1.1.0 date=20230728	2570..3332	rep(CN1	plasmid2)NC022227

Would results in an output string of No plasmids detected in database due to the sort | uniq -u piping on line 44.

After the change, this string now correctly reports all plasmid replicon genes rep21,rep21, regardless if they are unique or not.

We would prefer our users to quickly see all matches (even if they appear twice/more than once in the assembly)

Also, plasmidfinder essentially runs blast, so it does not need 8 cpus and 16 GBs of RAM.

This applies to all TheiaProk workflows that utilize this task (so all of them)

📋 Workflow/Task Steps

N/A

Inputs

N/A

Outputs

N/A

🧪 Testing

Locally

Only tested in Terra.

Terra

TheiaProk_Illumina_PE successful test: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/1cf771be-2dd1-459b-8440-2b6b35e1080e

TheiaProk_ONT successful test: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/2786992b-0462-49c6-b1f5-0a015b209ed6

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

…_size to 50. Also fixed bug with output string of plasmid replicons found. Also made some move commands verbose
@kapsakcj kapsakcj marked this pull request as ready for review September 15, 2023 21:51
@michellescribner
Copy link
Contributor

Tested TheiaProk_FASTA_PHB on diverse set of genomes. Workflow ran successfully except for E. coli/Shigella genomes, which failed because of the naming of the input fasta files (does not end in .fasta). https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/4da58a2c-3439-4316-90d1-24ea245ba085.

Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fantastic work!!!!

@sage-wright sage-wright merged commit f229903 into main Sep 19, 2023
@kapsakcj kapsakcj deleted the cjk-plasmidfinder-bugfix branch October 11, 2023 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants