Skip to content

Commit

Permalink
Update SRR metadata workflow and documentation for clarity and accuracy
Browse files Browse the repository at this point in the history
  • Loading branch information
fraser-combe committed Nov 8, 2024
1 parent 4eeb546 commit a0b9fec
Show file tree
Hide file tree
Showing 7 changed files with 22 additions and 14 deletions.
2 changes: 1 addition & 1 deletion .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ workflows:
primaryDescriptorPath: /workflows/utilities/data_import/wf_terra_2_bq.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: Update_srr_metadata_PHB
- name: Update_SRR_Metadata_PHB
subclass: WDL
primaryDescriptorPath: /workflows/utilities/data_import/wf_update_srr_metadata.wdl
testParameterFiles:
Expand Down
10 changes: 6 additions & 4 deletions docs/workflows/public_data_sharing/retrieve_srr_metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,21 @@

| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
|---|---|---|---|---|
| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.2.1 | Yes | Sample-level |
| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.3.0 | Yes | Sample-level |

## Retrieve SRR Metadata

This workflow retrieves the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. It utilizes the `fastq-dl` tool to fetch metadata from SRA and outputs the SRR accession.
This workflow is designed to retrieve the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. The primary inputs are BioSample IDs (e.g., SAMN00000000) or SRA Experiment IDs (e.g., SRX000000), which link to sequencing data in the SRA repository.

The workflow uses the fastq-dl tool to fetch metadata from SRA and specifically parses this metadata to extract the associated SRR accession and outputs the SRR accession.

### Inputs

<div class="searchable-table" markdown="1">

| **Terra Task Name** | **Variable** | **Type** | **Description**| **Default Value** | **Terra Status** |
| --- | --- | --- | --- | --- | --- |
| fetch_srr_metadata | **sample_accession** | String | SRA-compatible accession, such as a **BioSample ID** (e.g., "SAMN00000000") or **SRA Experiment ID** (e.g., "SRX000000"), used to retrieve SRR metadata. | N/A | Required |
| fetch_srr_metadata | **sample_accession** | String | SRA-compatible accession, such as a **BioSample ID** (e.g., "SAMN00000000") or **SRA Experiment ID** (e.g., "SRX000000"), used to retrieve SRR metadata. | | Required |
| fetch_srr_metadata | **docker**| String | Docker image for metadata retrieval. | `us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0` | Optional |
| fetch_srr_metadata | **disk_size** | Int | Disk space in GB allocated for the task. | 10 | Optional |
| fetch_srr_metadata | **cpu** | Int | Number of CPUs allocated for the task. | 2 | Optional |
Expand All @@ -43,7 +45,7 @@ This workflow has a single task that performs metadata retrieval for the specifi

| **Variable** | **Type** | **Description**|
|---|---|---|
| srr_accession| String | The SRR accession associated with the sample ID.|
| srr_accession| String | The SRR accession associated with the input sample accession.|

## References

Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_alphabetically.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ title: Alphabetical Workflows
| [**TheiaValidate**](../workflows/standalone/theiavalidate.md)| This workflow performs basic comparisons between user-designated columns in two separate tables. | Any taxa | | No | v2.0.0 | [TheiaValidate_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaValidate_PHB:main?tab=info) |
| [**Transfer_Column_Content**](../workflows/data_export/transfer_column_content.md)| Transfer contents of a specified Terra data table column for many samples ("entities") to a GCP storage bucket location | Any taxa | Set-level | Yes | v1.3.0 | [Transfer_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Transfer_Column_Content_PHB:main?tab=info) |
| [**Samples_to_Ref_Tree**](../workflows/phylogenetic_placement/usher.md)| Use UShER to rapidly and accurately place your samples on any existing phylogenetic tree | Monkeypox virus, SARS-CoV-2, Viral | Sample-level, Set-level | Yes | v2.1.0 | [Usher_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Usher_PHB:main?tab=info) |
| [**Update_SRR_Metadata**](../workflows/public_data_sharing/update_srr_metadata.md)| Update SRR metadata in a Terra data table | Any taxa | | Yes | v2.3.0 | [Update_SRR_Metadata_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Update_SRR_Metadata_PHB:main?tab=info) |
| [**Usher_PHB**](../workflows/genomic_characterization/vadr_update.md)| Update VADR assignments | HAV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | Sample-level | Yes | v1.2.1 | [VADR_Update_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/VADR_Update_PHB:main?tab=info) |
| [**Zip_Column_Content**](../workflows/data_export/zip_column_content.md)| Zip contents of a specified Terra data table column for many samples ("entities") | Any taxa | Set-level | Yes | v2.1.0 | [Zip_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Zip_Column_Content_PHB:main?tab=info) |

Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_kingdom.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ title: Workflows by Kingdom
| [**TheiaMeta**](../workflows/genomic_characterization/theiameta.md) | Genome assembly and QC from metagenomic sequencing | Any taxa | Sample-level | Yes | v2.0.0 | [TheiaMeta_Illumina_PE_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaMeta_Illumina_PE_PHB:main?tab=info) |
| [**TheiaValidate**](../workflows/standalone/theiavalidate.md)| This workflow performs basic comparisons between user-designated columns in two separate tables. | Any taxa | | No | v2.0.0 | [TheiaValidate_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/TheiaValidate_PHB:main?tab=info) |
| [**Transfer_Column_Content**](../workflows/data_export/transfer_column_content.md)| Transfer contents of a specified Terra data table column for many samples ("entities") to a GCP storage bucket location | Any taxa | Set-level | Yes | v1.3.0 | [Transfer_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Transfer_Column_Content_PHB:main?tab=info) |
| [**Update_SRR_Metadata**](../workflows/data_import/update_srr_metadata.md)| Update SRR metadata in a Terra data table | Any taxa | Set-level | Yes | v2.1.0 | [Update_SRR_Metadata_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Update_SRR_Metadata_PHB:main?tab=info) |
| [**Zip_Column_Content**](../workflows/data_export/zip_column_content.md)| Zip contents of a specified Terra data table column for many samples ("entities") | Any taxa | Set-level | Yes | v2.1.0 | [Zip_Column_Content_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Zip_Column_Content_PHB:main?tab=info) |

</div>
Expand Down
1 change: 1 addition & 0 deletions docs/workflows_overview/workflows_type.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ title: Workflows by Type
| [**Mercury_Prep_N_Batch**](../workflows/public_data_sharing/mercury_prep_n_batch.md)| Prepare metadata and sequence data for submission to NCBI and GISAID | Influenza, Monkeypox virus, SARS-CoV-2, Viral | Set-level | No | v2.2.0 | [Mercury_Prep_N_Batch_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Mercury_Prep_N_Batch_PHB:main?tab=info) |
| [**Terra_2_GISAID**](../workflows/public_data_sharing/terra_2_gisaid.md)| Upload of assembly data to GISAID | SARS-CoV-2, Viral | Set-level | Yes | v1.2.1 | [Terra_2_GISAID_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Terra_2_GISAID_PHB:main?tab=info) |
| [**Terra_2_NCBI**](../workflows/public_data_sharing/terra_2_ncbi.md)| Upload of sequence data to NCBI | Bacteria, Mycotics, Viral | Set-level | No | v2.1.0 | [Terra_2_NCBI_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Terra_2_NCBI_PHB:main?tab=info) |
| [**Update_SRR_Metadata**](../workflows/public_data_sharing/update_srr_metadata.md)| Update SRR metadata in a Terra data table | Any taxa | | Yes | v2.3.0 | [Update_SRR_Metadata_PHB](https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/Update_SRR_Metadata_PHB:main?tab=info) |

</div>

Expand Down
20 changes: 12 additions & 8 deletions tasks/utilities/data_handling/task_fetch_srr_metadata.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,25 @@ task fetch_srr_metadata {
echo "Fetching metadata for sample accession: ${sample_accession}"

# Use fastq-dl to fetch metadata only
fastq-dl --accession ~{sample_accession} --provider SRA --outdir metadata_output --only-download-metadata --verbose
fastq-dl --accession ~{sample_accession} --outdir metadata_output --only-download-metadata --verbose


if [[ -f metadata_output/fastq-run-info.tsv ]]; then
echo "Metadata written for ${sample_accession}:"
echo "TSV content:"
cat metadata_output/fastq-run-info.tsv

# Extract the SRR accession (assuming it's in the first column)
# Extract the SRR accession (It is typically in the first column)
SRR_accessions=$(awk -F'\t' 'NR>1 {print $1}' metadata_output/fastq-run-info.tsv)
echo "Extracted SRR accessions: ${SRR_accessions}"

# Output the SRR accessions as a single string
echo "${SRR_accessions}" > metadata_output/srr_accession.txt
if [[ -z "${SRR_accessions}" ]]; then
echo "No SRR accession found for ${sample_accession}" > metadata_output/srr_accession.txt
else
echo "Extracted SRR accessions: ${SRR_accessions}"
echo "${SRR_accessions}" > metadata_output/srr_accession.txt
fi
else
echo "No metadata found for ${sample_accession}"
exit 1
echo "No SRR accession found" > metadata_output/srr_accession.txt
fi
>>>

Expand All @@ -44,8 +47,9 @@ task fetch_srr_metadata {
}
runtime {
docker: docker
memory: memory + " GB"
memory: "~{memory} GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
preemptible: 1
}
Expand Down
1 change: 0 additions & 1 deletion workflows/utilities/data_import/wf_update_srr_metadata.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ workflow wf_retrieve_srr {
}
input {
String sample_accession
String retrieve_srr_docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0"
}

call srr_task.fetch_srr_metadata {
Expand Down

0 comments on commit a0b9fec

Please sign in to comment.