Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Retrieve_SRR_Metadata] New wf to retrieve SRR after Terra2NCBI wf #668

Merged
merged 28 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
71a3ed6
inital commit part 1 retrieve srr from Biosample
fraser-combe Oct 29, 2024
fe2914a
update task and wf names and meta
fraser-combe Nov 4, 2024
47890c8
dockstore add
fraser-combe Nov 4, 2024
3f2ac27
Documentation and update column name
fraser-combe Nov 4, 2024
2bb0df4
update dockstore name
fraser-combe Nov 4, 2024
de5b45d
fraser-combe Nov 4, 2024
4eeb546
Remove unnecessary blank lines in fetch_srr_metadata WDL task
fraser-combe Nov 7, 2024
a0b9fec
Update SRR metadata workflow and documentation for clarity and accuracy
fraser-combe Nov 8, 2024
71a17fd
Remove redundant docker input from wf_update_srr_metadata workflow
fraser-combe Nov 8, 2024
2564f59
update
fraser-combe Nov 8, 2024
e99ea72
update dockstore
fraser-combe Nov 14, 2024
3ebb9fe
initial updates
fraser-combe Nov 14, 2024
8c80fec
handle multiple SRR accessionss as string version outputs
fraser-combe Nov 15, 2024
690ab6a
update task path
fraser-combe Nov 15, 2024
2799eaf
forgot to import task versioning
fraser-combe Nov 15, 2024
705c766
update dockstore yml
fraser-combe Nov 15, 2024
3ec8105
comma sep output as string instead of array
fraser-combe Nov 18, 2024
cbf6bcf
update wf name
fraser-combe Nov 18, 2024
26f5c4f
test local worked
fraser-combe Nov 18, 2024
7bcc842
set euo pipefail
fraser-combe Nov 20, 2024
988fc17
more explicit fail invalid biosample
fraser-combe Nov 21, 2024
f00cdd0
update logic failure
fraser-combe Nov 22, 2024
f9de101
logic handling valid biosample or SRA
fraser-combe Nov 22, 2024
26d8c49
enhance error handling and logging for biosample ID or SRA fetching
fraser-combe Nov 22, 2024
e186b30
Update logic for no SRR accessions and invalid samples
fraser-combe Nov 22, 2024
4995aa2
update docs version in table
fraser-combe Nov 22, 2024
e4a5bec
add sample level to docs
fraser-combe Nov 25, 2024
6cba0fc
update input and ouptut tables
fraser-combe Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,11 @@ workflows:
primaryDescriptorPath: /workflows/utilities/data_import/wf_terra_2_bq.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: Update_srr_metadata_PHB
subclass: WDL
primaryDescriptorPath: /workflows/utilities/data_import/wf_update_srr_metadata.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: Concatenate_Column_Content_PHB
subclass: WDL
primaryDescriptorPath: /workflows/utilities/file_handling/wf_concatenate_column.wdl
Expand Down
50 changes: 50 additions & 0 deletions docs/workflows/public_data_sharing/retrieve_srr_metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Retrieve SRR Metadata Workflow

## Quick Facts

| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
|---|---|---|---|---|
| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.2.1 | Yes | Sample-level |
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved

## Retrieve SRR Metadata

This workflow retrieves the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. It utilizes the `fastq-dl` tool to fetch metadata from SRA and outputs the SRR accession.
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved

### Inputs

<div class="searchable-table" markdown="1">
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved

| **Terra Task Name** | **Variable** | **Type** | **Description**| **Default Value** | **Terra Status** |
| --- | --- | --- | --- | --- | --- |
| fetch_srr_metadata | **sample_accession** | String | SRA-compatible accession, such as a **BioSample ID** (e.g., "SAMN00000000") or **SRA Experiment ID** (e.g., "SRX000000"), used to retrieve SRR metadata. | N/A | Required |
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
| fetch_srr_metadata | **docker**| String | Docker image for metadata retrieval. | `us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0` | Optional |
| fetch_srr_metadata | **disk_size** | Int | Disk space in GB allocated for the task. | 10 | Optional |
| fetch_srr_metadata | **cpu** | Int | Number of CPUs allocated for the task. | 2 | Optional |
| fetch_srr_metadata | **memory** | Int | Memory in GB allocated for the task. | 8 | Optional |

</div>

### Workflow Tasks

This workflow has a single task that performs metadata retrieval for the specified sample accession.

??? task "`fastq-dl`: Fetches SRR metadata for sample accession"
Fetches metadata for a given sample accession using the `fastq-dl` tool. This task uses a Docker container and retrieves the SRR accession by parsing the metadata output.
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved

!!! techdetails "fastq-dl Technical Details"
| | Links |
| --- | --- |
| Task | [Task on GitHub](https://github.com/theiagen-org/phb-workflows/blob/main/tasks/utilities/data_handling/task_fetch_srr_metadata.wdl) |
| Software Source Code | [fastq-dl Source](https://github.com/rvalieris/fastq-dl) |
| Software Documentation | [fastq-dl Documentation](https://github.com/rvalieris/fastq-dl#documentation) |
| Original Publication | [fastq-dl Publication](https://doi.org/10.1186/s12859-021-04346-3) |

### Outputs

| **Variable** | **Type** | **Description**|
|---|---|---|
| srr_accession| String | The SRR accession associated with the sample ID.|

## References

> Valieris, R. et al., "fastq-dl: A fast and reliable tool for downloading SRA metadata." Bioinformatics, 2021.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ nav:
- Usher_PHB: workflows/phylogenetic_placement/usher.md
- Public Data Sharing:
- Mercury_Prep_N_Batch: workflows/public_data_sharing/mercury_prep_n_batch.md
- Retrieve_SRR_Metadata: workflows/public_data_sharing/retrieve_srr_metadata.md
- Terra_2_GISAID: workflows/public_data_sharing/terra_2_gisaid.md
- Terra_2_NCBI: workflows/public_data_sharing/terra_2_ncbi.md
- Exporting Data from Terra:
Expand Down
52 changes: 52 additions & 0 deletions tasks/utilities/data_handling/task_fetch_srr_metadata.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
version 1.0

task fetch_srr_metadata {
input {
String sample_accession
String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0"
Int disk_size = 10
Int cpu = 2
Int memory = 8
}
meta {
volatile: true
}

fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
command <<<
mkdir -p metadata_output
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
date -u | tee DATE

# Debug output to show the sample being processed
echo "Fetching metadata for sample accession: ${sample_accession}"

# Use fastq-dl to fetch metadata only
fastq-dl --accession ~{sample_accession} --provider SRA --outdir metadata_output --only-download-metadata --verbose
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved

if [[ -f metadata_output/fastq-run-info.tsv ]]; then
echo "Metadata written for ${sample_accession}:"
echo "TSV content:"
cat metadata_output/fastq-run-info.tsv
sage-wright marked this conversation as resolved.
Show resolved Hide resolved

# Extract the SRR accession (assuming it's in the first column)
SRR_accessions=$(awk -F'\t' 'NR>1 {print $1}' metadata_output/fastq-run-info.tsv)
echo "Extracted SRR accessions: ${SRR_accessions}"

# Output the SRR accessions as a single string
echo "${SRR_accessions}" > metadata_output/srr_accession.txt
else
echo "No metadata found for ${sample_accession}"
exit 1
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
fi
>>>

output {
String srr_accession = read_string("metadata_output/srr_accession.txt")
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
}
runtime {
docker: docker
memory: memory + " GB"
cpu: cpu
disk: disk_size + " GB"
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
preemptible: 1
}
}
23 changes: 23 additions & 0 deletions workflows/utilities/data_import/wf_update_srr_metadata.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
version 1.0

import "../../../tasks/utilities/data_handling/task_fetch_srr_metadata.wdl" as srr_task

workflow wf_retrieve_srr {
meta {
sage-wright marked this conversation as resolved.
Show resolved Hide resolved
description: "This workflow retrieves the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. It uses the fastq-dl tool to fetch metadata from SRA and outputs the SRR accession that can be used for downstream analysis."
}
input {
String sample_accession
String retrieve_srr_docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0"
fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
}

fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
call srr_task.fetch_srr_metadata {
input:
sample_accession = sample_accession,
docker = retrieve_srr_docker
}

fraser-combe marked this conversation as resolved.
Show resolved Hide resolved
output {
String srr_accession = fetch_srr_metadata.srr_accession
}
}