Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Freyja] Update freyja to version 1.5.2, expose pathogen flag and minor update to docs #684

Merged
merged 6 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/assets/figures/Freyja_FASTQ.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 29 additions & 19 deletions docs/workflows/genomic_characterization/freyja.md
sage-wright marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
# Freyja Workflow Series

!!! dna inline end "Wastewater and more"
The typical use case of Freyja is to **analyze mixed SARS-CoV-2 samples** from a sequencing dataset, most often **wastewater**.

!!! warning "Default Values"
The defaults included in the Freyja workflows reflect this use case but **can be adjusted for other pathogens**. See the [**Running Freyja on other pathogens**](freyja.md#running-freyja-on-other-pathogens) section for more information.

## Quick Facts

| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
|---|---|---|---|---|
| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.2.0 | Yes | Sample-level, Set-level |
| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.3.0 | Yes | Sample-level, Set-level |

## Freyja Overview

Expand All @@ -21,9 +15,15 @@

Additional post-processing steps can produce visualizations of aggregated samples.

!!! dna "Wastewater and more"
The typical use case of Freyja is to **analyze mixed SARS-CoV-2 samples** from a sequencing dataset, most often **wastewater**.

!!! warning "Default Values"
The defaults included in the Freyja workflows reflect this use case but **can be adjusted for other pathogens**. See the [**Running Freyja on other pathogens**](freyja.md#running-freyja-on-other-pathogens) section for more information.

!!! caption "Figure 1: Workflow Diagram for Freyja_FASTQ_PHB workflow"
##### Figure 1 { #figure1 }
![**Figure 1: Workflow diagram for Freyja_FASTQ_PHB workflow.**](../../assets/figures/Freyja_FASTQ.png){width=25%}
![**Figure 1: Workflow diagram for Freyja_FASTQ_PHB workflow.**](../../assets/figures/Freyja_FASTQ.png){width=100%}

Depending on the type of data (Illumina or Oxford Nanopore), the Read QC and Filtering steps, as well as the Read Alignment steps use different software. The user can specify if the barcodes and lineages file should be updated with `freyja update` before running Freyja or if bootstrapping is to be performed with `freyja boot`.

Expand Down Expand Up @@ -63,7 +63,7 @@ We recommend running this workflow with **"Run inputs defined by file paths"** s
| freyja_update | **gcp_uri** | String | The path where you want the Freyja reference files to be stored. Include gs:// at the beginning of the string. Full example with a Terra workspace bucket: "gs://fc-87ddd67a-c674-45a8-9651-f91e3d2f6bb7" | | Required |
| freyja_update_refs | **cpu** | Int | Number of CPUs to allocate to the task | 4 | Optional |
| freyja_update_refs | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| freyja_update_refs | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22" | Optional |
| freyja_update_refs | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02" | Optional |
| freyja_update_refs | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
| transfer_files | **cpu** | Int | Number of CPUs to allocate to the task | 2 | Optional |
| transfer_files | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
Expand Down Expand Up @@ -110,12 +110,14 @@ This workflow runs on the sample level.
| freyja | **confirmed_only** | Boolean | Include only confirmed SARS-CoV-2 lineages | FALSE | Optional |
| freyja | **cpu** | Int | Number of CPUs to allocate to the task | 2 | Optional |
| freyja | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| freyja | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22" | Optional |
| freyja | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02" | Optional |
| freyja | **eps** | Float | The minimum lineage abundance cut-off value | 0.001 | Optional |
| freyja | **freyja_lineage_metadata** | File | (found in the optional section, but is required) File containing the lineage metadata; the "curated_lineages.json" file found <https://github.com/andersen-lab/Freyja/tree/main/freyja/data> can be used for this variable. Does not need to be provided if update_db is true. | None | Optional, Required |
| freyja | **freyja_barcodes** | String | Custom barcode file. Does not need to be provided if update_db is true if the freyja_pathogen is provided. | None | Optional |
| freyja | **freyja_lineage_metadata** | File | File containing the lineage metadata; the "curated_lineages.json" file found <https://github.com/andersen-lab/Freyja/tree/main/freyja/data> can be used for this variable. Does not need to be provided if update_db is true or if the freyja_pathogen is provided. | None | Optional, Required |
| freyja | **freyja_pathogen** | String | Pathogen of interest, used if not providing the barcodes and lineage metadata files. Options: SARS-CoV-2, MPXV, H5NX, H1N1pdm, FLU-B-VIC, MEASLESN450, MEASLES, RSVa, RSVb | None | Optional |
| freyja | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
| freyja | **number_bootstraps** | Int | The number of bootstraps to perform (only used if bootstrap = true) | 100 | Optional |
| freyja | **update_db** | Boolean | Updates the Freyja reference files (the usher barcodes and lineage metadata files) but will not save them as output (use Freyja_Update for that purpose). If set to true, the `freyja_lineage_metadata` and `freyja_usher_barcodes` files are not required. | FALSE | Optional |
| freyja | **update_db** | Boolean | Updates the Freyja reference files (the usher barcodes and lineage metadata files) but will not save them as output (use Freyja_Update for that purpose). If set to true, the `freyja_lineage_metadata` and `freyja_barcodes` files are not required. | FALSE | Optional |
| freyja_fastq | **depth_cutoff** | Int | The minimum coverage depth with which to exclude sites below this value and group identical barcodes | 10 | Optional |
| freyja_fastq | **kraken2_target_organism** | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | "Severe acute respiratory syndrome coronavirus 2" | Optional |
| freyja_fastq | **ont** | Boolean | Indicates if the input data is derived from an ONT instrument. | FALSE | Optional |
Expand Down Expand Up @@ -364,7 +366,7 @@ The main output file used in subsequent Freyja workflows is found under the `fre
| freyja_fastq_wf_version | String | The version of the Public Health Bioinformatics (PHB) repository used | ONT, PE, SE |
| freyja_lineage_metadata_file | File | Lineage metadata JSON file used. Can be the one provided as input or downloaded by Freyja if update_db is true | ONT, PE, SE |
| freyja_metadata_version | String | Name of lineage metadata file used, or the date if update_db is true | ONT, PE, SE |
| freyja_usher_barcode_file | File | USHER barcode feather file used. Can be the one provided as input or downloaded by Freyja if update_db is true | ONT, PE, SE |
| freyja_barcode_file | File | Barcode feather file used. Can be the one provided as input or downloaded by Freyja if update_db is true | ONT, PE, SE |
| freyja_variants | File | The TSV file containing the variants identified by Freyja | ONT, PE, SE |
| freyja_version | String | version of Freyja used | ONT, PE, SE |
| ivar_version_primtrim | String | Version of iVar for running the iVar trim command | ONT, PE, SE |
Expand Down Expand Up @@ -431,7 +433,7 @@ This workflow runs on the set level.
| freyja_plot | **collection_date** | Array[String] | An array containing the collection dates for the sample (YYYY-MM-DD format) | | Optional |
| freyja_plot_task | **cpu** | Int | Number of CPUs to allocate to the task | 2 | Optional |
| freyja_plot_task | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| freyja_plot_task | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22 | Optional |
| freyja_plot_task | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02 | Optional |
| freyja_plot_task | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
| freyja_plot_task | **mincov** | Int | The minimum genome coverage used as a cut-off of data to include in the plot | 60 | Optional |
| freyja_plot_task | **plot_day_window** | Int | The width of the rolling average window; only used if plot_time_interval is "D" | 14 | Optional |
Expand Down Expand Up @@ -492,7 +494,7 @@ This workflow runs on the set level.
| freyja_dashboard | **dashboard_intro_text** | File | A file containing the text to be contained at the top of the dashboard. | SARS-CoV-2 lineage de-convolution performed by the Freyja workflow (<https://github.com/andersen-lab/Freyja>). | Optional |
| freyja_dashboard_task | **config** | File | (found in the optional section, but is required) A yaml file that applies various configurations to the dashboard, such as grouping lineages together, applying colorings, etc. See also <https://github.com/andersen-lab/Freyja/blob/main/freyja/data/plot_config.yml>. | None | Optional, Required |
| freyja_dashboard_task | **cpu** | Int | Number of CPUs to allocate to the task | 2 | Optional |
| freyja_dashboard_task | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22 | Optional |
| freyja_dashboard_task | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02 | Optional |
| freyja_dashboard_task | **headerColor** | String | A hex color code to change the color of the header | | Optional |
| freyja_dashboard_task | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
| freyja_dashboard_task | **mincov** | Float | The minimum genome coverage used as a cut-off of data to include in the dashboard. Default is set to 60 by the freyja command-line tool (not a WDL task default, per se) | None | Optional |
Expand Down Expand Up @@ -532,26 +534,34 @@ This workflow runs on the set level.

The main requirement to run Freyja on other pathogens is **the existence of a barcode file for your pathogen of interest**. Currently, barcodes exist for the following organisms

- MEASLES
- SARS-CoV-2 (default)
- MPXV
- H5NX
- H1N1pdm
- FLU-B-VIC
- MEASLESN450
- MEASLES
- RSVa
- RSVb

The appropriate barcode file and reference sequence need to be downloaded and uploaded to your [Terra.bio](http://Terra.bio) workspace.

!!! warning "Freyja barcodes for other pathogens"

Data for various pathogens can be found in the following repository: [Freyja Barcodes](https://github.com/gp201/Freyja-barcodes)

Folders are organized by pathogen, with each subfolder named after the date the barcode was generated, using the format YYYY-MM-DD. Barcode files are named `barcode.csv`, and reference genome files are named `reference.fasta`.

There's two ways of
The appropriate barcode file and reference sequence need to be downloaded and uploaded to your [Terra.bio](http://Terra.bio) workspace.



When running **Freyja_FASTQ_PHB**, the appropriate reference and barcodes file need to be passed as inputs. The first is a required input and will show up at the top of the workflows inputs page on [Terra.bio](http://Terra.bio) ([Figure 2](freyja.md/#figure2)).

!!! caption "Figure 2: Required input for Freyja_FASTQ_PHB to provide the reference genome to be used by Freyja"
##### Figure 2 { #figure2 }
![**Figure 2: Required input for Freyja_FASTQ_PHB to provide the reference genome to be used by Freyja.**](../../assets/figures/Freyja_figure2.png)

The barcodes file can be passed directly to Freyja by the `freyja_usher_barcodes` optional input ([Figure 3](freyja.md/#figure3)).
The barcodes file can be passed directly to Freyja by the `freyja_barcodes` optional input ([Figure 3](freyja.md/#figure3)).

!!! caption "Figure 3: Optional input for Freyja_FASTQ_PHB to provide the barcodes file to be used by Freyja"
##### Figure 3 {#figure3}
Expand Down
18 changes: 10 additions & 8 deletions tasks/taxon_id/freyja/task_freyja.wdl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Programmatically changes look good and make sense. Version for docker is bumped, freyja_pathogen is exposed as an optional string, freyja_usher_barcodes --> freyja-barcodes. --pathogen added to freyja boot command. Again freyja_usher_barcodes variable switched for more general freyja_barcodes.

Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ task freyja_one_sample {
File primer_trimmed_bam
String samplename
File reference_genome
File? freyja_usher_barcodes
String? freyja_pathogen
File? freyja_barcodes
File? freyja_lineage_metadata
Float? eps
Float? adapt
Expand All @@ -16,7 +17,7 @@ task freyja_one_sample {
Int? depth_cutoff
Int memory = 8
Int cpu = 2
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
Int disk_size = 100
}
command <<<
Expand Down Expand Up @@ -44,9 +45,9 @@ task freyja_one_sample {
freyja_metadata_version="freyja update: $(date +"%Y-%m-%d")"
else
# configure barcode
if [[ ! -z "~{freyja_usher_barcodes}" ]]; then
echo "User freyja usher barcodes identified; ~{freyja_usher_barcodes} will be utilized for freyja demixing"
freyja_usher_barcode_version=$(basename -- "~{freyja_usher_barcodes}")
if [[ ! -z "~{freyja_barcodes}" ]]; then
echo "User freyja usher barcodes identified; ~{freyja_barcodes} will be utilized for freyja demixing"
freyja_usher_barcode_version=$(basename -- "~{freyja_barcodes}")
else
freyja_usher_barcode_version="unmodified from freyja container: ~{docker}"
fi
Expand Down Expand Up @@ -74,9 +75,10 @@ task freyja_one_sample {
# Calculate Boostraps, if specified
if ~{bootstrap}; then
freyja boot \
~{"--pathogen" + freyja_pathogen} \
~{"--eps " + eps} \
~{"--meta " + freyja_lineage_metadata} \
~{"--barcodes " + freyja_usher_barcodes} \
~{"--barcodes " + freyja_barcodes} \
~{"--depthcutoff " + depth_cutoff} \
~{"--nb " + number_bootstraps } \
~{true='--confirmedonly' false='' confirmed_only} \
Expand All @@ -91,7 +93,7 @@ task freyja_one_sample {
freyja demix \
~{'--eps ' + eps} \
~{'--meta ' + freyja_lineage_metadata} \
~{'--barcodes ' + freyja_usher_barcodes} \
~{'--barcodes ' + freyja_barcodes} \
~{'--depthcutoff ' + depth_cutoff} \
~{true='--confirmedonly' false='' confirmed_only} \
~{'--adapt ' + adapt} \
Expand Down Expand Up @@ -144,7 +146,7 @@ task freyja_one_sample {
File? freyja_bootstrap_summary = "~{samplename}_summarized.csv"
File? freyja_bootstrap_summary_pdf = "~{samplename}_summarized.pdf"
# capture barcode file - first is user supplied, second appears if the user did not supply a barcode file
File freyja_usher_barcode_file = select_first([freyja_usher_barcodes, "usher_barcodes.feather"])
File freyja_barcode_file = select_first([freyja_barcodes, "usher_barcodes.feather"])
File freyja_lineage_metadata_file = select_first([freyja_lineage_metadata, "curated_lineages.json"])
String freyja_barcode_version = read_string("FREYJA_BARCODES")
String freyja_metadata_version = read_string("FREYJA_METADATA")
Expand Down
2 changes: 1 addition & 1 deletion tasks/taxon_id/freyja/task_freyja_dashboard.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ task freyja_dashboard_task {
Boolean scale_by_viral_load = false
String freyja_dashboard_title
File? dashboard_intro_text
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
Int disk_size = 100
Int memory = 4
Int cpu = 2
Expand Down
2 changes: 1 addition & 1 deletion tasks/taxon_id/freyja/task_freyja_plot.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ task freyja_plot_task {
String plot_time_interval="MS"
Int plot_day_window=14
String freyja_plot_name
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
Int disk_size = 100
Int mincov = 60
Int memory = 4
Expand Down
2 changes: 1 addition & 1 deletion tasks/taxon_id/freyja/task_freyja_update.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ version 1.0

task freyja_update_refs {
input {
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
Int disk_size = 100
Int memory = 16
Int cpu = 4
Expand Down
2 changes: 1 addition & 1 deletion workflows/freyja/wf_freyja_fastq.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ workflow freyja_fastq {
File freyja_depths = freyja.freyja_depths
File freyja_demixed = freyja.freyja_demixed
Float freyja_coverage = freyja.freyja_coverage
File freyja_usher_barcode_file = freyja.freyja_usher_barcode_file
File freyja_barcode_file = freyja.freyja_barcode_file
File freyja_lineage_metadata_file = freyja.freyja_lineage_metadata_file
String freyja_barcode_version = freyja.freyja_barcode_version
String freyja_metadata_version = freyja.freyja_metadata_version
Expand Down
Loading