-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Retrieve_SRR_Metadata] New wf to retrieve SRR after Terra2NCBI wf #668
Conversation
In the documentation, the new page needs to be added to:
|
Can you standardize the name of this workflow? It's referred to as many things throughout the code and documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clean up required
I tested with c_auris_nv using the datatables ids and all runs succeeded" The expected behavior for this was to fail. |
ERROR is captured in stderr of execution directory. Might want to implement a pipefail command like set -eou pipefail so that thestatus is the status of the last command that failed, rather than just the last command in the pipeline. |
definitely agree with michal to add a set -euo pipefail to prevent silent errors |
Re-ran same data that succeeded when expected behavior was failure after update with setting the pipefail and I am getting correct failures now: |
wait a second,, i think i misunderstood, i think we do want it to succeed as the column in the table ends up saying no srr accession identified? not sure what i'd prefer, an actual failure or a success with a message saying it doesn't work. @kapsakcj or @theiadeb, what would be preferred by our users? |
To me the way I designed the workflow is it should fail if the input ID/string is invalid (not a valid BioSample or SRA ID), as this indicates an error that needs user attention. However, if the input is valid (a proper BioSample or SRA ID), the workflow should succeed and output 'no SRR accession found' if no SRR is identified. But open to suggestions if people feel it would work better a different way |
ran on 2 additional samples with no SRR produced expected result string "no SRR accession found", https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/063ea71c-7001-4482-b715-db05ff26de2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the logic regarding valid and invalid srr accessions is not correct; please fix that, and then also address the other small non-functional changes I've requested.
770233c
to
e186b30
Compare
Valid Biosample IDs succeed, 10 samples with SRR, 2 samples - "no SRR accession" and 1 samples multiple SRRs handled |
I am able to confirm this behavior. All success found here with 'no SRR accession' found for 2 samples, and 1 with multiples: |
Code changes are working as we want. Once the docs are updated, I'll merge! ⭐ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐ this will be so useful, well done!
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
This PR introduces a new workflow, Retrieve_SRR_Metadata, designed to retrieve the SRA accession (SRR) associated with a given sample accession (such as a BioSample ID or SRA Experiment ID). Documentation has been added to guide users on accepted input types and workflow usage.
⚡ Impacted Workflows/Tasks
A new Retrieve_SRR_Metadata workflow and task with a clarified input description for sample_accession, specifying accepted accession types like BioSample ID or SRA Experiment ID. The documentation has been enhanced to reflect these updates.
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
⚙️ Algorithm
No algorithm or processing changes; this is a new workflow. We use fastq-dl docker image
➡️ Inputs
sample_accession: New input added, accepting BioSample ID or SRA Experiment ID for metadata retrieval.
⬅️ Outputs
srr_accession: New output that stores the retrieved SRR accession.
🧪 Testing
Verified with Biosample Accession numbers
Successful with list of Biosample accesstions - retrieved correct SRR based on previous manual curation of SRRs (column SRR) after a Terra2NCBI run
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/933438ad-65ee-413a-ab87-101b08751297
see table output - c_auris_mv (10)
Testing incorrect ID fails
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/04b443bb-10ee-48bc-b3c2-4006cbbbcf5b
Suggested Scenarios for Reviewer to Test
Valid BioSample ID: Test the workflow with a BioSample ID to confirm SRR metadata retrieval.
Could also test with SRA Experiment ID: Test with an SRA Experiment ID and ensure correct SRR retrieval.
Invalid Accession Input: Provide an invalid ID format to verify that the workflow fails gracefully.
🔬 Final Developer Checklist
🎯 Reviewer Checklist