Merge branch 'main' into smw-flye-dev

theiagen · Dec 17, 2024 · 47155c0 · 47155c0
2 parents fcd39b9 + 4fbb73e
commit 47155c0
Showing 128 changed files with 2,834 additions and 1,047 deletions.
diff --git a/.dockstore.yml b/.dockstore.yml
@@ -195,6 +195,11 @@ workflows:
    primaryDescriptorPath: /workflows/utilities/data_import/wf_terra_2_bq.wdl
    testParameterFiles:
     - /tests/inputs/empty.json
+ - name: Fetch_SRR_Accession_PHB
+   subclass: WDL
+   primaryDescriptorPath: /workflows/utilities/data_import/wf_fetch_srr_accession.wdl
+   testParameterFiles:
+    - /tests/inputs/empty.json
  - name: Concatenate_Column_Content_PHB
    subclass: WDL
    primaryDescriptorPath: /workflows/utilities/file_handling/wf_concatenate_column.wdl
@@ -282,4 +287,9 @@ workflows:
    subclass: WDL
    primaryDescriptorPath: /workflows/phylogenetics/wf_snippy_streamline_fasta.wdl
    testParameterFiles:
-    - /tests/inputs/empty.json
+    - /tests/inputs/empty.json
+ - name: Concatenate_Illumina_Lanes_PHB
+   subclass: WDL
+   primaryDescriptorPath: /workflows/utilities/file_handling/wf_concatenate_illumina_lanes.wdl
+   testParameterFiles:
+     - /tests/inputs/empty.json
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -45,7 +45,8 @@ This PR uses an element that could cause duplicate runs to have different result
 - [ ] The workflow/task has been tested and results, including file contents, are as anticipated
 - [ ] The CI/CD has been adjusted and tests are passing (Theiagen developers)
 - [ ] Code changes follow the [style guide](https://theiagen.notion.site/Style-Guide-WDL-Workflow-Development-51b66a47dde54c798f35d673fff80249)
-- [ ] Documentation and/or workflow diagrams have been updated if applicable (Theiagen developers only)
+- [ ] Documentation and/or workflow diagrams have been updated if applicable
+  - [ ] You have updated the latest version for any affected worklows in the respective workflow documentation page and for every entry in the three `workflows_overview` tables.
 
 ## 🎯 Reviewer Checklist
 <!--  Indicate NA when not applicable  -->

diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
 cromwell*
 _LAST
-2024*
+2024*site/
+site/
diff --git a/README.md b/README.md
@@ -42,30 +42,32 @@ You can expect a careful review of every PR and feedback as needed before mergin
 
 ### Authorship
 
-(Ordered by contribution [# of lines changed] as of 2024-08-01)
+(Ordered by contribution [# of lines changed] as of 2024-12-04)
 
 * **Sage Wright** ([@sage-wright](https://github.com/sage-wright)) - Conceptualization, Software, Validation, Supervision
 * **Inês Mendes** ([@cimendes](https://github.com/cimendes)) - Software, Validation
 * **Curtis Kapsak** ([@kapsakcj](https://github.com/kapsakcj)) - Conceptualization, Software, Validation
-* **James Otieno** ([@jrotieno](https://github.com/jrotieno)) - Software, Validation
 * **Frank Ambrosio** ([@frankambrosio3](https://github.com/frankambrosio3)) - Conceptualization, Software, Validation
 * **Michelle Scribner** ([@michellescribner](https://github.com/michellescribner)) - Software, Validation
 * **Kevin Libuit** ([@kevinlibuit](https://github.com/kevinlibuit)) - Conceptualization, Project Administration, Software, Validation, Supervision
-* **Emma Doughty** ([@emmadoughty](https://github.com/emmadoughty)) - Software, Validation
+* **Fraser Combe** ([@fraser-combe](https://github.com/fraser-combe)) - Software, Validation
 * **Andrew Page** ([@andrewjpage](https://github.com/andrewjpage)) - Project Administration, Software, Supervision
+* **Michal Babinski** ([@Michal-Babins](https://github.com/Michal-Babins)) - Software, Validation
 * **Andrew Lang** ([@AndrewLangVt](https://github.com/AndrewLangVt)) - Software, Supervision
 * **Kelsey Kropp** ([@kelseykropp](https://github.com/kelseykropp)) - Validation
-* **Emily Smith** ([@emily-smith1](https://github.com/emily-smith1)) - Validation
 * **Joel Sevinsky** ([@sevinsky](https://github.com/sevinsky)) - Conceptualization, Project Administration, Supervision
 
 ### External Contributors
 
 We would like to gratefully acknowledge the following individuals from the public health community for their contributions to the PHB repository:
 
+* **James Otieno** ([@jrotieno](https://github.com/jrotieno)) 
 * **Robert Petit** ([@rpetit3](https://github.com/rpetit3))
+* **Emma Doughty** ([@emmadoughty](https://github.com/emmadoughty))
 * **Ash O'Farrel** ([@aofarrel](https://github.com/aofarrel))
 * **Sam Baird** ([@sam-baird](https://github.com/sam-baird))
 * **Holly Halstead** ([@HNHalstead](https://github.com/HNHalstead))
+* **Emily Smith** ([@emily-smith1](https://github.com/emily-smith1))
 
 ### Maintaining PHB Pipelines
 

diff --git a/docs/assets/figures/Freyja_FASTQ.png b/docs/assets/figures/Freyja_FASTQ.png
diff --git a/docs/assets/figures/TheiaEuk_Illumina_PHB_20241106.png b/docs/assets/figures/TheiaEuk_Illumina_PHB_20241106.png
diff --git a/docs/assets/figures/TheiaProk.png b/docs/assets/figures/TheiaProk.png
diff --git a/docs/assets/new_workflow_template.md b/docs/assets/new_workflow_template.md
@@ -4,14 +4,16 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Workflow Type](../../workflows_overview/workflows_type.md/#link-to-workflow-type) | [Applicable Kingdom](../../workflows_overview/workflows_kingdom.md/#link-to-applicable-kingdom) | PHB <version with last changes> | <command-line compatibility> | <workflow level on terra> |
+| [Link to Workflow Type](../../workflows_overview/workflows_type.md/#link-to-workflow-type) | [Link to Applicable Kingdom](../../workflows_overview/workflows_kingdom.md/#link-to-applicable-kingdom) | PHB <version with last changes\> | <command-line compatibility\> | <workflow level on terra (set or sample)\> |
 
 ## Workflow_Name_On_Terra
 
 Description of the workflow.
 
 ### Inputs
 
+Input should be ordered as they appear on Terra
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | task_name | **variable_name** | Type | Description | Default Value | Required/Optional |
@@ -24,12 +26,12 @@ Description of the workflow tasks
     Description of the task
 
     !!! techdetails "Tool Name Technical Details"
-        |  | Links | 
-        | --- | --- | 
+        |  | Links |
+        | --- | --- |
         | Task | [link to task on GitHub] |
         | Software Source Code | [link to tool's source code] |
         | Software Documentation | [link to tool's documentation] |
-        | Original Publication | [link to tool's publication] |
+        | Original Publication(s) | [link to tool's publication] |
 
 ### Outputs
 

diff --git a/docs/contributing/code_contribution.md b/docs/contributing/code_contribution.md
diff --git a/docs/contributing/doc_contribution.md b/docs/contributing/doc_contribution.md
@@ -14,7 +14,7 @@ To test your documentation changes, you will need to have the following packages
 pip install mkdocs-material mkdocs-material-extensions mkdocs-git-revision-date-localized-plugin mike mkdocs-glightbox
 ```
 
-The live preview server can be activated by running the following command:
+Once installed, navigate to the top directory in PHB. The live preview server can be activated by running the following command:
 
 ```bash
 mkdocs serve
@@ -34,49 +34,7 @@ Here are some VSCode Extensions can help you write and edit your markdown files
 
 - [Excel to Markdown Table](https://tableconvert.com/excel-to-markdown) - This website will convert an Excel table into markdown format, which can be copied and pasted into your markdown file.
 - [Material for MkDocs Reference](https://squidfunk.github.io/mkdocs-material/reference/) - This is the official reference for the Material for MkDocs theme, which will help you understand how to use the theme's features.
-- [Broken Link Check](https://www.brokenlinkcheck.com/) - This website will scan your website to ensure that all links are working correctly. This will only work on the deployed version of the documentation, not the local version.
-
-## Documentation Structure
-
-A brief description of the documentation structure is as follows:
-
-- `docs/` - Contains the Markdown files for the documentation.
-    - `assets/` - Contains images and other files used in the documentation.
-        - `figures/` - Contains images, figures, and workflow diagrams used in the documentation. For workflows that contain many images (such as BaseSpace_Fetch), it is recommended to create a subdirectory for the workflow.
-        - `files/` - Contains files that are used in the documentation. This may include example outputs or templates. For workflows that contain many files (such as TheiaValidate), it is recommended to create a subdirectory for the workflow.
-        - `logos/` - Contains Theiagen logos and symbols used int he documentation.
-        - `metadata_formatters/` - Contains the most up-to-date metadata formatters for our submission workflows.
-        - `new_workflow_template.md` - A template for adding a new workflow page to the documentation.
-    - `contributing/` - Contains the Markdown files for our contribution guides, such as this file
-    - `javascripts/` - Contains JavaScript files used in the documentation.
-        - `tablesort.js` - A JavaScript file used to enable table sorting in the documentation.
-    - `overrides/` - Contains HTMLs used to override theme defaults
-        - `main.html` - Contains the HTML used to display a warning when the latest version is not selected
-    - `stylesheets/` - Contains CSS files used in the documentation.
-        - `extra.css` - A custom CSS file used to style the documentation; contains all custom theme elements (scrollable tables, resizable columns, Theiagen colors), and custom admonitions.
-    - `workflows/` - Contains the Markdown files for each workflow, organized into subdirectories by workflow category
-    - `workflows_overview/` - Contains the Markdown files for the overview tables for each display type: alphabetically, by applicable kingdom, and by workflow type.
-    - `index.md` - The home/landing page for our documentation.
-
-### Adding a Page for a New Workflow {#new-page}
-
-If you are adding a new workflow, there are a number of things to do in order to include the page in the documentation:
-
-1. Add a page with the title of the workflow to appropriate subdirectory in `docs/workflows/`. Feel free to use the template found in the `assets/` folder.
-2. Collect the following information for your new workflow:
-     - Workflow Name - Link the name with a relative path to the workflow page in appropriate `docs/workflows/` subdirectory
-     - Workflow Description - Brief description of the workflow
-     - Applicable Kingdom - Options: "Any taxa", "Bacteria", "Mycotics", "Viral"
-     - Workflow Level (_on Terra_) - Options: "Sample-level", "Set-level", or neither
-     - Command-line compatibility - Options: "Yes", "No", and/or "Some optional features incompatible"
-     - The version where the last known changes occurred (likely the upcoming version if it is a new workflow)
-     - Link to the workflow on Dockstore (if applicable) - Workflow name linked to the information tab on Dockstore.
-3. Format this information in a table.
-4. Copy the previously gathered information to ==**ALL THREE**== overview tables in `docs/workflows_overview/`:
-     - `workflows_alphabetically.md` - Add the workflow in the appropriate spot based on the workflow name.
-     - `workflows_kingdom.md` - Add the workflow in the appropriate spot(s) based on the kingdom(s) the workflow is applicable to. Make sure it is added alphabetically within the appropriate subsection(s).
-     - `workflows_type.md` - Add the workflow in the appropriate spot based on the workflow type. Make sure it is added alphabetically within the appropriate subsection.
-5. Copy the path to the workflow to ==**ALL**== of the appropriate locations in the `mkdocs.yml` file (under the `nav:` section) in the main directory of this repository. These should be the exact same spots as in the overview tables but without additional information. This ensures the workflow can be accessed from the navigation sidebar.
+- [Dead Link Check](https://www.deadlinkchecker.com/) - This website will scan your website to ensure that all links are working correctly. This will only work on the deployed version of the documentation, not the local version.
 
 ## Standard Language & Formatting Conventions
 
@@ -98,10 +56,11 @@ The following language conventions should be followed when writing documentation
 - **Bold Text** - Use `**bold text**` to indicate text that should be bolded.
 - _Italicized Text_ - Use `_italicized text_` to indicate text that should be italicized.
 - ==Highlighted Text== - Use `==highlighted text==` to indicate text that should be highlighted.
-- `Code` - Use \`code\` to indicate text that should be formatted as code.
+- `Code` - Use ````code` ``` (backticks) to indicate text that should be formatted as code.
 - ^^Underlined Text^^ - Use `^^underlined text^^` to indicate text that should be underlined (works with our theme; not all Markdown renderers support this).
 - > Citations
     - Use a `>` to activate quote formatting for a citation. Make sure to separate multiple citations with a comment line (`<!-- -->`) to prevent the citations from running together.
+    - Use a reputable citation style (e.g., Vancouver, Nature, etc.) for all citations.
 - Callouts/Admonitions - These features are called "call-outs" in Notion, but are "Admonitions" in MkDocs. [I highly recommend referring to the Material for MkDocs documentation page on Admonitions to learn how best to use this feature](https://squidfunk.github.io/mkdocs-material/reference/admonitions/). Use the following syntax to create a callout:
 
     ```markdown
@@ -116,26 +75,45 @@ The following language conventions should be followed when writing documentation
     !!! dna
         This is a DNA admonition. Admire the cute green DNA emoji. You can create this with the `!!! dna` syntax.
 
+        Use this admonition when wanting to convey general information or highlight specific facts.
+
     ???+ toggle
         This is a toggle-able section. The emoji is an arrow pointing to the right downward. You can create this with the `??? toggle` syntax. I have added a `+` at the end of the question marks to make it open by default.
 
+        Use this admonition when wanting to provide additional _optional_ information or details that are not strictly necessary, or take up a lot of space.
+
     ???+ task
         This is a toggle-able section **for a workflow task**. The emoji is a gear. Use the `??? task` syntax to create this admonition. Use `!!! task` if you want to have it be permanently expanded. I have add a `+` at the end of the question marks to make this admonition open by default and still enable its collapse.
 
+        Use this admonition when providing details on a workflow, task, or tool.
+
     !!! caption
-        This is a caption. The emoji is a painting. You can create this with the `!!! caption` syntax. This is used to enclose an image in a box and looks nice. A caption can be added beneath the picture and will also look nice.
+        This is a caption. The emoji is a painting. You can create this with the `!!! caption` syntax. A caption can be added beneath the picture and will also look nice.
+
+        Use this admonition when including images or diagrams in the documentation.
 
     !!! techdetails
         This is where you will put technical details for a workflow task. You can create this by `!!! techdetails` syntax.
 
+        Use this admonition when providing technical details for a workflow task or tool. These admonitions should include the following table:
+
+        |  | Links |
+        | --- | --- |
+        | Task | [link to the task file in the PHB repository on GitHub] |
+        | Software Source Code | [link to tool's source code] |
+        | Software Documentation | [link to tool's documentation] |
+        | Original Publication(s) | [link to tool's publication] |
+
+        If any of these items are unfillable, delete the row.
+
 - Images - Use the following syntax to insert an image:
 
     ```markdown
     !!! caption "Image Title"
         ![Alt Text](/path/to/image.png)
     ```
 
-- Indentation - **_FOUR_** spaces are required instead of the typical two. This is a side effect of using this theme. If you use two spaces, the list and/or indentations will not render correctly. This will make your linter sad :( 
+- Indentation - **_FOUR_** spaces are required instead of the typical two. This is a side effect of using this theme. If you use two spaces, the list and/or indentations will not render correctly. This will make your linter sad :(
 
     ```markdown
     - first item
@@ -160,3 +138,45 @@ The following language conventions should be followed when writing documentation
     ```
 
 - End all pages with an empty line
+
+## Documentation Structure
+
+A brief description of the documentation structure is as follows:
+
+- `docs/` - Contains the Markdown files for the documentation.
+    - `assets/` - Contains images and other files used in the documentation.
+        - `figures/` - Contains images, figures, and workflow diagrams used in the documentation. For workflows that contain many images (such as BaseSpace_Fetch), it is recommended to create a subdirectory for the workflow.
+        - `files/` - Contains files that are used in the documentation. This may include example outputs or templates. For workflows that contain many files (such as TheiaValidate), it is recommended to create a subdirectory for the workflow.
+        - `logos/` - Contains Theiagen logos and symbols used in the documentation.
+        - `metadata_formatters/` - Contains the most up-to-date metadata formatters for our submission workflows.
+        - `new_workflow_template.md` - A template for adding a new workflow page to the documentation. You can see this template [here](../assets/new_workflow_template.md)
+    - `contributing/` - Contains the Markdown files for our contribution guides, such as this file
+    - `javascripts/` - Contains JavaScript files used in the documentation.
+        - `tablesort.js` - A JavaScript file used to enable table sorting in the documentation.
+    - `overrides/` - Contains HTMLs used to override theme defaults
+        - `main.html` - Contains the HTML used to display a warning when the latest version is not selected
+    - `stylesheets/` - Contains CSS files used in the documentation.
+        - `extra.css` - A custom CSS file used to style the documentation; contains all custom theme elements (scrollable tables, resizable columns, Theiagen colors), and custom admonitions.
+    - `workflows/` - Contains the Markdown files for each workflow, organized into subdirectories by workflow category
+    - `workflows_overview/` - Contains the Markdown files for the overview tables for each display type: alphabetically, by applicable kingdom, and by workflow type.
+    - `index.md` - The home/landing page for our documentation.
+
+### Adding a Page for a New Workflow {#new-page}
+
+If you are adding a new workflow, there are a number of things to do in order to include the page in the documentation:
+
+1. Add a page with the title of the workflow to appropriate subdirectory in `docs/workflows/`. Feel free to use the template found in the `assets/` folder.
+2. Collect the following information for your new workflow:
+     - Workflow Name - Link the name with a relative path to the workflow page in appropriate `docs/workflows/` subdirectory
+     - Workflow Description - Brief description of the workflow
+     - Applicable Kingdom - Options: "Any taxa", "Bacteria", "Mycotics", "Viral"
+     - Workflow Level (_on Terra_) - Options: "Sample-level", "Set-level", or neither
+     - Command-line compatibility - Options: "Yes", "No", and/or "Some optional features incompatible"
+     - The version where the last known changes occurred (likely the upcoming version if it is a new workflow)
+     - Link to the workflow on Dockstore (if applicable) - Workflow name linked to the information tab on Dockstore.
+3. Format this information in a table.
+4. Copy the previously gathered information to ==**ALL THREE**== overview tables in `docs/workflows_overview/`:
+     - `workflows_alphabetically.md` - Add the workflow in the appropriate spot based on the workflow name.
+     - `workflows_kingdom.md` - Add the workflow in the appropriate spot(s) based on the kingdom(s) the workflow is applicable to. Make sure it is added alphabetically within the appropriate subsection(s).
+     - `workflows_type.md` - Add the workflow in the appropriate spot based on the workflow type. Make sure it is added alphabetically within the appropriate subsection.
+5. Copy the path to the workflow to ==**ALL**== of the appropriate locations in the `mkdocs.yml` file (under the `nav:` section) in the main directory of this repository. These should be the exact same spots as in the overview tables but without additional information. This ensures the workflow can be accessed from the navigation sidebar.
diff --git a/docs/index.md b/docs/index.md
@@ -46,7 +46,7 @@ When undertaking genomic analysis using the command-line, via Terra, or other da
 We continuously work to improve our codebase and usability of our workflows by the public health community, so changes from version to version are expected.  This documentation page reflects the state of the workflow at the version stated in the title.
 
 !!! dna "What's new?"
-    You can see the changes since PHB v2.2.0 [**here**](https://theiagen.notion.site/Public-Health-Bioinformatics-v2-2-1-Patch-Release-Notes-104cb013bc9380bcbd70dab04bf671a8?pvs=74)!
+    You can see the changes since PHB v2.2.1 [**here**](https://theiagen.notion.site/public-health-bioinformatics-v2-3-0-minor-release-notes?pvs=4)!
 
 ## Contributing to the PHB Repository
 
@@ -60,30 +60,32 @@ You can expect a careful review of every PR and feedback as needed before mergin
 
 ### Authorship
 
-(Ordered by contribution [# of lines changed] as of 2024-08-01)
+(Ordered by contribution [# of lines changed] as of 2024-12-04)
 
 - **Sage Wright** ([@sage-wright](https://github.com/sage-wright)) - Conceptualization, Software, Validation, Supervision
 - **Inês Mendes** ([@cimendes](https://github.com/cimendes)) - Software, Validation
 - **Curtis Kapsak** ([@kapsakcj](https://github.com/kapsakcj)) - Conceptualization, Software, Validation
-- **James Otieno** ([@jrotieno](https://github.com/jrotieno)) - Software, Validation
 - **Frank Ambrosio** ([@frankambrosio3](https://github.com/frankambrosio3)) - Conceptualization, Software, Validation
 - **Michelle Scribner** ([@michellescribner](https://github.com/michellescribner)) - Software, Validation
 - **Kevin Libuit** ([@kevinlibuit](https://github.com/kevinlibuit)) - Conceptualization, Project Administration, Software, Validation, Supervision
-- **Emma Doughty** ([@emmadoughty](https://github.com/emmadoughty)) - Software, Validation
+- **Fraser Combe** ([@fraser-combe](https://github.com/fraser-combe)) - Software, Validation
 - **Andrew Page** ([@andrewjpage](https://github.com/andrewjpage)) - Project Administration, Software, Supervision
+- **Michal Babinski** ([@Michal-Babins](https://github.com/Michal-Babins)) - Software, Validation
 - **Andrew Lang** ([@AndrewLangVt](https://github.com/AndrewLangVt)) - Software, Supervision
 - **Kelsey Kropp** ([@kelseykropp](https://github.com/kelseykropp)) - Validation
-- **Emily Smith** ([@emily-smith1](https://github.com/emily-smith1)) - Validation
 - **Joel Sevinsky** ([@sevinsky](https://github.com/sevinsky)) - Conceptualization, Project Administration, Supervision
 
 ### External Contributors
 
 We would like to gratefully acknowledge the following individuals from the public health community for their contributions to the PHB repository:
 
+- **James Otieno** ([@jrotieno](https://github.com/jrotieno))
 - **Robert Petit** ([@rpetit3](https://github.com/rpetit3))
+- **Emma Doughty** ([@emmadoughty](https://github.com/emmadoughty))
 - **Ash O'Farrel** ([@aofarrel](https://github.com/aofarrel))
 - **Sam Baird** ([@sam-baird](https://github.com/sam-baird))
 - **Holly Halstead** ([@HNHalstead](https://github.com/HNHalstead))
+- **Emily Smith** ([@emily-smith1](https://github.com/emily-smith1))
 
 ### On the Shoulder of Giants
 

diff --git a/docs/javascripts/table-search.js b/docs/javascripts/table-search.js
@@ -0,0 +1,64 @@
+function addTableSearch() {
+    // Select all containers with the class 'searchable-table'
+    const containers = document.querySelectorAll('.searchable-table');
+
+    containers.forEach((container) => {
+        // Find the table within this container
+        const table = container.querySelector('table');
+
+        if (table) {
+            // Ensure we don't add multiple search boxes
+            if (!container.querySelector('input[type="search"]')) {
+                // Create the search input element
+                const searchInput = document.createElement("input");
+                searchInput.setAttribute("type", "search");
+                searchInput.setAttribute("placeholder", "Search table...");
+                searchInput.classList.add('table-search-input');
+                searchInput.style.marginBottom = "10px";
+                searchInput.style.display = "block";
+
+                // Insert the search input before the table
+                container.insertBefore(searchInput, container.firstChild);
+
+                // Add event listener for table search
+                searchInput.addEventListener("input", function () {
+                    const filter = searchInput.value.toUpperCase();
+                    const rows = table.getElementsByTagName("tr");
+
+                    for (let i = 1; i < rows.length; i++) { // Skip header row
+                        const cells = rows[i].getElementsByTagName("td");
+                        let match = false;
+
+                        for (let j = 0; j < cells.length; j++) {
+                            if (cells[j].innerText.toUpperCase().includes(filter)) {
+                                match = true;
+                                break;
+                            }
+                        }
+
+                        rows[i].style.display = match ? "" : "none";
+                    }
+                });
+            }
+        } else {
+            console.log('Table not found within container.');
+        }
+    });
+}
+
+// Run on page load
+addTableSearch();
+
+// Reapply search bar on page change
+function observeDOMChanges() {
+    const targetNode = document.querySelector('body');
+    const config = { childList: true, subtree: true };
+
+    const observer = new MutationObserver(() => {
+        addTableSearch();
+    });
+
+    observer.observe(targetNode, config);
+}
+
+observeDOMChanges();
diff --git a/docs/overrides/main.html b/docs/overrides/main.html
@@ -6,8 +6,3 @@
     <strong>Click here to go to the latest version release.</strong>
   </a>
 {% endblock %}
-
-
-{% block announce %}
-    <center>🏗️ I'm under construction! Pardon the dust while we remodel! 👷</center>
-{% endblock %}
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
@@ -184,5 +184,34 @@ th {
 td {
   word-break: break-all;
 }
+/* Base styles for the search box */
+div.searchable-table input.table-search-input {
+  width: 25%;
+  padding: 10px;
+  margin-bottom: 12px;
+  font-size: 12px;
+  box-sizing: border-box;
+  border-radius: 2px; 
+}
 
+/* Light mode styles */
+[data-md-color-scheme="light"] div.searchable-table input.table-search-input {
+  background-color: #fff;
+  color: #000;
+  border: 1px solid #E0E1E1;
+}
+[data-md-color-scheme="light"] div.searchable-table input.table-search-input::placeholder {
+  color: #888;
+  font-style: italic;
+}
 
+/* Dark mode styles */
+[data-md-color-scheme="slate"] div.searchable-table input.table-search-input {
+  background-color: #1d2125;
+  color: #fff;
+  border: 1px solid #373B40;
+}
+[data-md-color-scheme="slate"] div.searchable-table input.table-search-input::placeholder {
+  color: #bbb;
+  font-style: italic;
+}
diff --git a/docs/workflows/data_export/concatenate_column_content.md b/docs/workflows/data_export/concatenate_column_content.md
@@ -16,6 +16,8 @@ This set-level workflow will create a file containing all of the items from a gi
 
 This workflow runs on the set level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | concatenate_column_content | **concatenated_file_name** | String | The name of the output file. ***Include the extension***, such as ".fasta" or ".txt". |  | Required |
@@ -28,6 +30,8 @@ This workflow runs on the set level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
 !!! info

diff --git a/docs/workflows/data_export/transfer_column_content.md b/docs/workflows/data_export/transfer_column_content.md
@@ -25,6 +25,8 @@ This set-level workflow will transfer all of the items from a given column in a
 
 This workflow runs on the set level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task name** | **input_variable** | **Type** | **Description** | **Default attribute** | **Status** |
 |---|---|---|---|---|---|
 | transfer_column_content | **files_to_transfer** | Array[File] | The column that has the files you want to concatenate. | | Required |
@@ -36,6 +38,8 @@ This workflow runs on the set level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
 !!! info

diff --git a/docs/workflows/data_export/zip_column_content.md b/docs/workflows/data_export/zip_column_content.md
@@ -16,6 +16,8 @@ This workflow will create a zip file that contains all of the items in a column
 
 This workflow runs on the set level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | zip_column_content | **files_to_zip** | Array[File] | The column that has the files you want to zip. |  | Required |
@@ -27,6 +29,8 @@ This workflow runs on the set level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
 !!! info

diff --git a/docs/workflows/data_import/assembly_fetch.md b/docs/workflows/data_import/assembly_fetch.md
@@ -23,6 +23,8 @@ Assembly_Fetch requires the input samplename, and either the accession for a ref
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | reference_fetch | **samplename** | String | Your sample's name |  | Required |
@@ -44,6 +46,8 @@ This workflow runs on the sample level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Analysis Tasks
 
 ??? task "ReferenceSeeker (optional) Details"
@@ -90,6 +94,8 @@ This workflow runs on the sample level.
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | assembly_fetch_analysis_date | String | Date of assembly download |
@@ -101,11 +107,13 @@ This workflow runs on the sample level.
 | assembly_fetch_ncbi_datasets_version | String | NCBI datasets version used |
 | assembly_fetch_referenceseeker_database | String | ReferenceSeeker database used |
 | assembly_fetch_referenceseeker_docker | String | Docker file used for ReferenceSeeker |
-| assembly_fetch_referenceseeker_top_hit_ncbi_accession | String | NCBI Accession for the top it identified by Assembly_Fetch |
+| assembly_fetch_referenceseeker_top_hit_ncbi_accession | String | NCBI Accession for the top hit identified by Assembly_Fetch |
 | assembly_fetch_referenceseeker_tsv | File | TSV file of the top hits between the query genome and the Reference Seeker database |
 | assembly_fetch_referenceseeker_version | String | ReferenceSeeker version used |
 | assembly_fetch_version | String | The version of the repository the Assembly Fetch workflow is in |
 
+</div>
+
 ## References
 
 > **ReferenceSeeker:** Schwengers O, Hain T, Chakraborty T, Goesmann A. ReferenceSeeker: rapid determination of appropriate reference genomes. J Open Source Softw. 2020 Feb 4;5(46):1994.

diff --git a/docs/workflows/data_import/basespace_fetch.md b/docs/workflows/data_import/basespace_fetch.md
@@ -153,6 +153,8 @@ This process must be performed on a command-line (ideally on a Linux or MacOS co
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | basespace_fetch | **access_token** | String | The access token is used in place of a username and password to allow the workflow to access the user account in BaseSpace from which the data is to be transferred. It is an alphanumeric string that is 32 characters in length. Example: 9e08a96471df44579b72abf277e113b7 | | Required |
@@ -168,6 +170,8 @@ This workflow runs on the sample level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### **Outputs**
 
 The outputs of this workflow will be the fastq files imported from BaseSpace into the data table where the sample ID information had originally been uploaded.

diff --git a/docs/workflows/data_import/create_terra_table.md b/docs/workflows/data_import/create_terra_table.md
@@ -19,6 +19,8 @@ The manual creation of Terra tables can be tedious and error-prone. This workflo
 
     **_This can be changed_** by providing information in the `file_ending` optional input parameter. See below for more information.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | create_terra_table | **assembly_data** | Boolean | Set to true if your data is in FASTA format; set to false if your data is FASTQ format | | Required |
@@ -33,6 +35,8 @@ The manual creation of Terra tables can be tedious and error-prone. This workflo
 | make_table | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21" | Optional |
 | make_table | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
 
+</div>
+
 ### Finding the `data_location_path`
 
 #### Using the Terra data uploader

diff --git a/docs/workflows/data_import/sra_fetch.md b/docs/workflows/data_import/sra_fetch.md
@@ -16,6 +16,8 @@ Read files associated with the SRA run accession provided as input are copied to
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | fetch_sra_to_fastq | **sra_accession** | String | SRA, ENA, or DRA accession number | | Required |
@@ -25,6 +27,8 @@ This workflow runs on the sample level.
 | fetch_sra_to_fastq | **fastq_dl_options** | String | Additional parameters to pass to fastq_dl from [here](https://github.com/rpetit3/fastq-dl?tab=readme-ov-file#usage) | "--provider sra" | Optional |
 | fetch_sra_to_fastq | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
 
+</div>
+
 The only required input for the SRA_Fetch workflow is an SRA run accession beginning "SRR", an ENA run accession beginning "ERR", or a DRA run accession which beginning "DRR".
 
 Please see the [NCBI Metadata and Submission Overview](https://www.ncbi.nlm.nih.gov/sra/docs/submitmeta/) for assistance with identifying accessions. Briefly, NCBI-accessioned objects have the following naming scheme:
@@ -41,6 +45,8 @@ Read data are available either with full base quality scores (**SRA Normalized F
 
 Given the lack of usefulness of SRA Lite formatted FASTQ files, we try to avoid these by selecting as provided SRA directly (SRA-Lite is more probably to be the file synced to other repositories), but some times downloading these files is unavoidable. To make the user aware of this, a warning column is present that is populated when an SRA-Lite file is detected.
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** | **Production Status** |
 |---|---|---|---|
 | read1 | File | File containing the forward reads | Always produced |
@@ -51,6 +57,8 @@ Given the lack of usefulness of SRA Lite formatted FASTQ files, we try to avoid
 | fastq_dl_version | String | Fastq_dl version used | Always produced |
 | fastq_dl_warning | String |  This warning field is populated if SRA-Lite files are detected. These files contain all quality encoding as Phred-30 or Phred-3. | Depends on internal workflow logic |
 
+</div>
+
 ## References
 
 > This workflow relies on [fastq-dl](https://github.com/rpetit3/fastq-dl), a very handy bioinformatics tool by Robert A. Petit III
diff --git a/docs/workflows/genomic_characterization/freyja.md b/docs/workflows/genomic_characterization/freyja.md
diff --git a/docs/workflows/genomic_characterization/pangolin_update.md b/docs/workflows/genomic_characterization/pangolin_update.md
@@ -14,6 +14,8 @@ The Pangolin_Update workflow re-runs Pangolin updating prior lineage calls from
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | pangolin_update | **assembly_fasta** | File | SARS-CoV-2 assembly file in FASTA format |  | Required |
@@ -42,8 +44,12 @@ This workflow runs on the sample level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | **pango_lineage** | String | Pango lineage as determined by Pangolin |
@@ -58,3 +64,9 @@ This workflow runs on the sample level.
 | **pangolin_update_version** | String | Version of the Public Health Bioinformatics (PHB) repository used |
 | **pangolin_updates** | String | Result of Pangolin Update (lineage changed versus unchanged) with lineage assignment and date of analysis |
 | **pangolin_versions** | String | All Pangolin software and database versions |
+
+</div>
+
+## References
+
+> **Pangolin**: RRambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020 Nov;5(11):1403-1407. doi: 10.1038/s41564-020-0770-5. Epub 2020 Jul 15. PMID: 32669681; PMCID: PMC7610519.
diff --git a/docs/workflows/genomic_characterization/theiacov.md b/docs/workflows/genomic_characterization/theiacov.md
diff --git a/docs/workflows/genomic_characterization/theiaeuk.md b/docs/workflows/genomic_characterization/theiaeuk.md
diff --git a/docs/workflows/genomic_characterization/theiameta.md b/docs/workflows/genomic_characterization/theiameta.md
diff --git a/docs/workflows/genomic_characterization/theiaprok.md b/docs/workflows/genomic_characterization/theiaprok.md
diff --git a/docs/workflows/genomic_characterization/vadr_update.md b/docs/workflows/genomic_characterization/vadr_update.md
@@ -5,7 +5,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v1.2.1 | Yes | Sample-level |
+| [Genomic Characterization](../../workflows_overview/workflows_type.md/#genomic-characterization) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.2.0 | Yes | Sample-level |
 
 ## Vadr_Update_PHB
 
@@ -29,6 +29,8 @@ Please note the default values are for SARS-CoV-2.
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | vadr_update | **assembly_length_unambiguous** | Int | Number of unambiguous basecalls within the consensus assembly |  | Required |
@@ -44,6 +46,8 @@ This workflow runs on the sample level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
 | **Variable** | **Type** | **Description** |

diff --git a/docs/workflows/phylogenetic_construction/augur.md b/docs/workflows/phylogenetic_construction/augur.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.1.0 | Yes | Sample-level, Set-level |
+| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.3.0 | Yes | Sample-level, Set-level |
 
 ## Augur Workflows
 
@@ -14,10 +14,10 @@ Two workflows are offered: **Augur_Prep_PHB** and **Augur_PHB**. These must be r
 
 !!! dna "**Helpful resources for epidemiological interpretation**"
 
-    - [introduction to Nextstrain](https://www.cdc.gov/amd/training/covid-toolkit/module3-1.html) (which includes Auspice)
-    - guide to Nextstrain [interactive trees](https://www.cdc.gov/amd/training/covid-toolkit/module3-4.html)
-    - an [introduction to UShER](https://www.cdc.gov/amd/training/covid-toolkit/module3-3.html)
-    - a video about [how to read trees](https://www.cdc.gov/amd/training/covid-toolkit/module1-3.html) if this is new to you
+    - [introduction to Nextstrain](https://www.cdc.gov/advanced-molecular-detection/php/training/module-3-1.html) (which includes Auspice)
+    - guide to Nextstrain [interactive trees](https://www.cdc.gov/advanced-molecular-detection/php/training/module-3-4.html)
+    - an [introduction to UShER](https://www.cdc.gov/advanced-molecular-detection/php/training/module-3-3.html)
+    - a video about [how to read trees](https://www.cdc.gov/advanced-molecular-detection/php/training/module-1-3.html) if this is new to you
     - documentation on [how to identify SARS-CoV-2 recombinants](https://github.com/pha4ge/pipeline-resources/blob/main/docs/sc2-recombinants.md)
 
 ### Augur_Prep_PHB
@@ -30,6 +30,8 @@ The Augur_Prep_PHB workflow takes assembly FASTA files and associated metadata f
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | augur_prep | **assembly** | File | Assembly/consensus file (single FASTA file per sample) |  | Required |
@@ -48,6 +50,8 @@ This workflow runs on the sample level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 #### Augur_Prep Outputs
 
 | **Variable** | **Type** | **Description** |
@@ -70,7 +74,7 @@ The Augur_PHB workflow takes in a ***set*** of SARS-CoV-2 (or any other viral
 !!! dna "Optional Inputs"
     There are **many** optional user inputs. For SARS-CoV-2, Flu, rsv-a, rsv-b, and mpxv, default values that mimic the NextStrain builds have been preselected. To use these defaults, you must write either `"sars-cov-2"`,`"flu"`, `"rsv-a"`, `"rsv-b"`, or `"mpxv"` for the `organism` variable.
 
-    For Flu - it is **required** to set `flu_segment` to either `"HA"` or `"NA"` & `flu_subtype` to either `"H1N1"` or `"H3N2"` or `"Victoria"` or `"Yamagata"` depending on your set of samples.
+    For Flu - it is **required** to set `flu_segment` to either `"HA"` or `"NA"` & `flu_subtype` to either `"H1N1"` or `"H3N2"` or `"Victoria"` or `"Yamagata"` or `"H5N1"` (`"H5N1"` will only work with `"HA"`) depending on your set of samples.
 
 ???+ toggle "A Note on Optional Inputs"
     ??? toggle "Default values for SARS-CoV-2"
@@ -121,6 +125,11 @@ The Augur_PHB workflow takes in a ***set*** of SARS-CoV-2 (or any other viral
                 - clades_tsv = `"gs://theiagen-public-files-rp/terra/flu-references/clades_yam_ha.tsv"`
             - NA
                 - reference_fasta = `"gs://theiagen-public-files-rp/terra/flu-references/reference_yam_na.gb"`
+        ??? toggle "H5N1"
+            - auspice_config = `"gs://theiagen-public-files-rp/terra/flu-references/auspice_config_h5n1.json"`
+            - HA
+                - reference_fasta = `"gs://theiagen-public-files-rp/terra/flu-references/reference_h5n1_ha.gb"`
+                - clades_tsv = `"gs://theiagen-public-files-rp/terra/flu-references/h5nx-clades.tsv"`
 
     ??? toggle "Default values for MPXV"
         - min_num_unambig = 150000
@@ -165,6 +174,8 @@ The Augur_PHB workflow takes in a ***set*** of SARS-CoV-2 (or any other viral
 
 This workflow runs on the set level. Please note that for every task, runtime parameters are modifiable (cpu, disk_size, docker, and memory); most of these values have been excluded from the table below for convenience.
 
+<div class="searchable-table" markdown="1" width=100vw>
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | augur | **assembly_fastas** | Array[File] | An array of the assembly files to use; use either the HA or NA segment for flu samples |  | Required |
@@ -173,7 +184,7 @@ This workflow runs on the set level. Please note that for every task, runtime pa
 | augur | **clades_tsv** | File | TSV file containing clade mutation positions in four columns | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: <https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl>. For an organism without set defaults, an empty clades file is provided to prevent workflow failure, "gs://theiagen-public-files-rp/terra/augur-defaults/minimal-clades.tsv", but will not be as useful as an organism specific clades file. | Optional, Required |
 | augur | **distance_tree_only** | Boolean | Create only a distance tree (skips all Augur steps after augur_tree) | TRUE | Optional |
 | augur | **flu_segment** | String | Required if organism = "flu". The name of the segment to be analyzed; options: "HA" or "NA" | "HA" (only used if organism = "flu") | Optional, Required |
-| augur | **flu_subtype** | String | Required if organism = "flu". The subtype of the flu samples being analyzed; options: "H1N1", "H3N2", "Victoria", "Yamagata" |  | Optional, Required |
+| augur | **flu_subtype** | String | Required if organism = "flu". The subtype of the flu samples being analyzed; options: "H1N1", "H3N2", "Victoria", "Yamagata", "H5N1" |  | Optional, Required |
 | augur | **lat_longs_tsv** | File | Tab-delimited file of geographic location names with corresponding latitude and longitude values | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: <https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl>. For an organism without set defaults, a minimal lat-long file is provided to prevent workflow failure, "gs://theiagen-public-files-rp/terra/augur-defaults/minimal-lat-longs.tsv", but will not be as useful as a detailed lat-longs file covering all the locations for the samples to be visualized. | Optional |
 | augur | **min_date** | Float | Minimum date to begin filtering or frequencies calculations | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: <https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl>. For an organism without set defaults, the default value is 0.0 | Optional |
 | augur | **min_num_unambig** | Int | Minimum number of called bases in genome to pass prefilter | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: <https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl>. For an organism without set defaults, the default value is 0 | Optional |
@@ -187,7 +198,7 @@ This workflow runs on the set level. Please note that for every task, runtime pa
 | augur_ancestral | **inference** | String | Calculate joint or marginal maximum likelihood ancestral sequence states; options: "joint", "marginal" | joint | Optional |
 | augur_ancestral | **keep_ambiguous** | Boolean | If true, do not infer nucleotides at ambiguous (N) sides | FALSE | Optional |
 | augur_ancestral | **keep_overhangs** | Boolean | If true, do not infer nucleotides for gaps on either side of the alignment | FALSE | Optional |
-| augur_export | **colors_tsv** | File | Custom color definitions, one per line in the format TRAIT_TYPE \| TRAIT_VALUE\tHEX_CODE |  | Optional |
+| augur_export | **colors_tsv** | File | Custom color definitions, one per line in TSV format with the following fields: TRAIT_TYPE TRAIT_VALUE HEX_CODE |  | Optional |
 | augur_export | **description_md** | File | Markdown file with description of build and/or acknowledgements |  | Optional |
 | augur_export | **include_root_sequence** | Boolean | Export an additional JSON containing the root sequence used to identify mutations | FALSE | Optional |
 | augur_export | **title** | String | Title to be displayed by Auspice |  | Optional |
@@ -209,7 +220,7 @@ This workflow runs on the set level. Please note that for every task, runtime pa
 | augur_tree | **exclude_sites** | File | File of one-based sites to exclude for raw tree building (BED format in .bed files, DRM format in tab-delimited files, or one position per line) |  | Optional |
 | augur_tree | **method** | String | Which method to use to build the tree; options: "fasttree", "raxml", "iqtree" | iqtree | Optional |
 | augur_tree | **override_default_args** | Boolean | If true, override default tree builder arguments instead of augmenting them | FALSE | Optional |
-| augur_tree | **substitution_model** | String | The substitution model to use; only available for iqtree. Specify "auto" to run ModelTest; options: "GTR" | GTR | Optional |
+| augur_tree | **substitution_model** | String | The substitution model to use; only available for iqtree. Specify "auto" to run ModelTest; model options can be found [here](http://www.iqtree.org/doc/Substitution-Models) | GTR | Optional |
 | augur_tree | **tree_builder_args** | String | Additional tree builder arguments either augmenting or overriding the default arguments. FastTree defaults: "-nt -nosupport". RAxML defaults: "-f d -m GTRCAT -c 25 -p 235813". IQ-TREE defaults: "-ninit 2 -n 2 -me 0.05 -nt AUTO -redo" |  | Optional |
 | sc2_defaults | **nextstrain_ncov_repo_commit** | String | The version of the <https://github.com/nextstrain/ncov/> from which to draw default values for SARS-CoV-2. | `23d1243127e8838a61b7e5c1a72bc419bf8c5a0d` | Optional |
 | organism_parameters | **gene_locations_bed_file** | File | Use to provide locations of interest where average coverage will be calculated | Defaults are organism-specific. Please find default values for some organisms here: <https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl>. For an organism without set defaults, an empty file is provided, "gs://theiagen-public-files/terra/theiacov-files/empty.bed", but will not be as useful as an organism specific gene locations bed file. | Optional |
@@ -230,6 +241,8 @@ This workflow runs on the set level. Please note that for every task, runtime pa
 | mutation_context | **docker** | String | Docker image used for the mutation_context task that is specific to Mpox. Do not modify. | us-docker.pkg.dev/general-theiagen/theiagen/nextstrain-mpox-mutation-context:2024-06-27 | Do Not Modify, Optional |
 | mutation_context | **memory** | Int | Memory size in GB requested for the mutation_context task that is specific to Mpox. | 4 | Optional |
 
+</div>
+
 ??? task "Workflow Tasks"
     ##### Augur Workflow Tasks {#augur-tasks}
 
@@ -271,8 +284,13 @@ The Nextstrain team hosts documentation surrounding the Augur workflow → Auspi
 | **Variable** | **Type** | **Description** |
 | --- | --- | --- |
 | aligned_fastas | File | A FASTA file of the aligned genomes |
+| augur_fasttree_version | String | The fasttree version used, blank if other tree method used |
+| augur_iqtree_model_used | String | The iqtree model used during augur tree, blank if iqtree not used |
+| augur_iqtree_version | String | The iqtree version used during augur tree (defualt), blank if other tree method used |
+| augur_mafft_version | String | The mafft version used in augur align |
 | augur_phb_analysis_date | String | The date the analysis was run |
 | augur_phb_version | String | The version of the Public Health Bioinformatics (PHB) repository used |
+| augur_raxml_version | String | The version of raxml used during augur tree, blank if other tree method used |
 | augur_version | String | Version of Augur used |
 | auspice_input_json | File | JSON file used as input to Auspice |
 | combined_assemblies | File | Concatenated FASTA file containing all samples |

diff --git a/docs/workflows/phylogenetic_construction/core_gene_snp.md b/docs/workflows/phylogenetic_construction/core_gene_snp.md
@@ -22,6 +22,8 @@ For further detail regarding Pirate options, please see [PIRATE's documentation)
 
 This workflow runs on the set level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | core_gene_snp_workflow | **cluster_name** | String | Name of sample set | | Required |
@@ -84,6 +86,8 @@ This workflow runs on the set level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Workflow Tasks
 
 By default, the Core_Gene_SNP workflow will begin by analyzing the input sample set using [PIRATE](https://github.com/SionBayliss/PIRATE). Pirate takes in GFF3 files and classifies the genes into gene families by sequence identity, outputting a pangenome summary file. The workflow will instruct Pirate to create core gene and pangenome alignments using this gene family data. Setting the "align" input variable to false will turn off this behavior, and the workflow will output only the pangenome summary. The workflow will then use the core gene alignment from `Pirate` to infer a phylogenetic tree using `IQ-TREE`. It will also produce an SNP distance matrix from this alignment using [snp-dists](https://github.com/tseemann/snp-dists). This behavior can be turned off by setting the `core_tree` input variable to false. The workflow will not create a pangenome tree or SNP-matrix by default, but this behavior can be turned on by setting the `pan_tree` input variable to true.
@@ -98,6 +102,8 @@ By default, this task appends a Phandango coloring tag to color all items from t
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | core_gene_snp_wf_analysis_date | String | Date of analysis using Core_Gene_SNP workflow |
@@ -118,6 +124,8 @@ By default, this task appends a Phandango coloring tag to color all items from t
 | pirate_snp_dists_version | String | Version of snp-dists used  |
 | pirate_summarized_data | File | The presence/absence matrix generated by the summarize_data task from the list of columns provided |
 
+</div>
+
 ## References
 
 >Sion C Bayliss, Harry A Thorpe, Nicola M Coyle, Samuel K Sheppard, Edward J Feil, PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria, *GigaScience*, Volume 8, Issue 10, October 2019, giz119, <https://doi.org/10.1093/gigascience/giz119>

diff --git a/docs/workflows/phylogenetic_construction/czgenepi_prep.md b/docs/workflows/phylogenetic_construction/czgenepi_prep.md
@@ -18,13 +18,15 @@ Variables with both the "Optional" and "Required" tag require the column (regard
 
 This workflow runs on the set level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | czgenepi_prep | **sample_names** | Array[String] | The array of sample ids you want to prepare for CZ GEN EPI |  | Required |
 | czgenepi_prep | **terra_table_name** | String | The name of the Terra table where the data is hosted |  | Required |
 | czgenepi_prep | **terra_project_name** | String | The name of the Terra project where the data is hosted |  | Required |
 | czgenepi_prep | **terra_workspace_name** | String | The name of the Terra workspace where the data is hosted |  | Required |
-| download_terra_table | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 10 | Optional |
+| download_terra_table | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
 | download_terra_table | **docker** | String | The Docker container to use for the task | quay.io/theiagen/terra-tools:2023-06-21 | Optional |
 | download_terra_table | **disk_size** | String | The size of the disk used when running this task | 1 | Optional |
 | download_terra_table | **cpu** | Int | Number of CPUs to allocate to the task | 1 | Optional |
@@ -46,6 +48,8 @@ This workflow runs on the set level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
 The concatenated_czgenepi_fasta and concatenated_czgenepi_metadata files can be uploaded directly to CZ GEN EPI without any adjustments.

diff --git a/docs/workflows/phylogenetic_construction/find_shared_variants.md b/docs/workflows/phylogenetic_construction/find_shared_variants.md
@@ -20,6 +20,8 @@ The primary intended input of the workflow is the `snippy_variants_results` outp
 
 All variant data included in the sample set should be generated from aligning sequencing reads to the **same reference genome**. If variant data was generated using different reference genomes, shared variants cannot be identified and results will be less useful.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 | --- | --- | --- | --- | --- | --- |
 | shared_variants_wf | **concatenated_file_name** | String | String of your choice to prefix output files | | Required |
@@ -33,6 +35,8 @@ All variant data included in the sample set should be generated from aligning se
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Tasks
 
 ??? task "Concatenate Variants"

diff --git a/docs/workflows/phylogenetic_construction/ksnp3.md b/docs/workflows/phylogenetic_construction/ksnp3.md
@@ -19,6 +19,8 @@ You can learn more about the kSNP3 workflow, including how to visualize the outp
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | ksnp3_workflow | **assembly_fasta** | Array[File] | The assembly files to be analyzed | | Required |
@@ -62,6 +64,8 @@ You can learn more about the kSNP3 workflow, including how to visualize the outp
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Workflow Actions
 
 The `ksnp3` workflow is run on the set of assembly files to produce both pan-genome and core-genome phylogenies. This also results in alignment files which - are used by [`snp-dists`](https://github.com/tseemann/snp-dists) to produce a pairwise SNP distance matrix for both the pan-genome and core-genomes.
@@ -86,6 +90,8 @@ If you fill out the `data_summary_*` and `sample_names` optional variables, you
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | ksnp3_core_snp_matrix | File | The SNP matrix made with the core genome; formatted for Phandango if `phandango_coloring` input is `true` |
@@ -109,6 +115,8 @@ If you fill out the `data_summary_*` and `sample_names` optional variables, you
 | ksnp3_wf_analysis_date | String | The date the workflow was run |
 | ksnp3_wf_version | String | The version of the repository the workflow is hosted in |
 
+</div>
+
 ## References
 
 >Shea N Gardner, Tom Slezak, Barry G. Hall, kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome, *Bioinformatics*, Volume 31, Issue 17, 1 September 2015, Pages 2877–2878, <https://doi.org/10.1093/bioinformatics/btv271>

diff --git a/docs/workflows/phylogenetic_construction/lyve_set.md b/docs/workflows/phylogenetic_construction/lyve_set.md
@@ -17,6 +17,8 @@ The Lyve_SET WDL workflow runs the [Lyve-SET](https://github.com/lskatz/lyve-SET
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | lyveset_workflow | **dataset_name** | String | Free text string used to label output files | | Required |
@@ -45,6 +47,8 @@ The Lyve_SET WDL workflow runs the [Lyve-SET](https://github.com/lskatz/lyve-SET
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Workflow Actions
 
 The Lyve_SET WDL workflow is run using read data from a set of samples. The workflow will produce a pairwise SNP matrix for the sample set and a maximum likelihood phylogenetic tree. Details regarding the default implementation of Lyve_SET and optional modifications are listed below.

diff --git a/docs/workflows/phylogenetic_construction/mashtree_fasta.md b/docs/workflows/phylogenetic_construction/mashtree_fasta.md
@@ -16,6 +16,8 @@ This workflow also features an optional module, `summarize_data`, that creates a
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | mashtree_fasta | **assembly_fasta** | Array[File] | The set of assembly fastas | | Required |
@@ -49,6 +51,8 @@ This workflow also features an optional module, `summarize_data`, that creates a
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Workflow Actions
 
 `MashTree_Fasta` is run on a set of assembly fastas and creates a phylogenetic tree and matrix. These outputs are passed to a task that will rearrange the matrix to match the order of the terminal ends in the phylogenetic tree.
@@ -63,6 +67,8 @@ By default, this task appends a Phandango coloring tag to color all items from t
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | mashtree_docker | String | The Docker image used to run the mashtree task |
@@ -74,6 +80,8 @@ By default, this task appends a Phandango coloring tag to color all items from t
 | mashtree_wf_analysis_date | String | The date the workflow was run |
 | mashtree_wf_version | String | The version of PHB the workflow is hosted in |
 
+</div>
+
 ## References
 
 > Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, <https://doi.org/10.21105/joss.01762>

diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline.md b/docs/workflows/phylogenetic_construction/snippy_streamline.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.2.0 | Yes; some optional features incompatible | Set-level |
+| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.3.0 | Yes; some optional features incompatible | Set-level |
 
 ## Snippy_Streamline_PHB
 
@@ -65,6 +65,8 @@ To run Snippy_Streamline, either a reference genome must be provided (`reference
         - Using the core genome
             - `core_genome` = true (as default)
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | snippy_streamline | **read1** | Array[File] | The forward read files |  | Required |
@@ -133,6 +135,8 @@ To run Snippy_Streamline, either a reference genome must be provided (`reference
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Workflow Tasks
 
 For automatic reference selection by the workflow (optional):
@@ -169,6 +173,36 @@ For all cases:
 
     `Snippy_Variants` aligns reads for each sample against the reference genome. As part of `Snippy_Streamline`, the only output from this workflow is the `snippy_variants_outdir_tarball` which is provided in the set-level data table. Please see the full documentation for [Snippy_Variants](./snippy_variants.md) for more information.
 
+    This task also extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics include the following columns:
+
+    - **samplename**: The name of the sample.
+    - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome.
+    - **total_reads**: The total number of reads in the sample.
+    - **percent_reads_aligned**: The percentage of reads that aligned to the reference genome.
+    - **variants_total**: The total number of variants detected between the sample and the reference genome.
+    - **percent_ref_coverage**: The percentage of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10).
+    - **#rname**: Reference sequence name (e.g., chromosome or contig name).
+    - **startpos**: Starting position of the reference sequence.
+    - **endpos**: Ending position of the reference sequence.
+    - **numreads**: Number of reads covering the reference sequence.
+    - **covbases**: Number of bases with coverage.
+    - **coverage**: Percentage of the reference sequence covered (depth ≥ 1).
+    - **meandepth**: Mean depth of coverage over the reference sequence.
+    - **meanbaseq**: Mean base quality over the reference sequence.
+    - **meanmapq**: Mean mapping quality over the reference sequence.
+
+    These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`). The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome.
+
+    !!! tip "QC Metrics for Phylogenetic Analysis"
+        These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses
+
+    !!! techdetails "Snippy Variants Technical Details"
+        |  | Links |
+        | --- | --- |
+        | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl) |
+        | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+        | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+
 ??? task "Snippy_Tree workflow"
 
     ##### Snippy_Tree {#snippy_tree}
@@ -179,6 +213,8 @@ For all cases:
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | snippy_centroid_docker | String | Docker file used for Centroid |
@@ -188,6 +224,7 @@ For all cases:
 | snippy_centroid_version | String | Centroid version used |
 | snippy_cg_snp_matrix | File | CSV file of core genome pairwise SNP distances between samples, calculated from the final alignment  |
 | snippy_concatenated_variants | File | The concatenated variants file |
+| snippy_combined_qc_metrics | File | Combined QC metrics file containing concatenated QC metrics from all samples. |
 | snippy_filtered_metadata | File | TSV recording the columns of the Terra data table that were used in the summarize_data task |
 | snippy_final_alignment | File | Final alignment (FASTA file) used to generate the tree (either after snippy alignment, gubbins recombination removal, and/or core site selection with SNP-sites) |
 | snippy_final_tree | File | Final phylogenetic tree produced by Snippy_Streamline |
@@ -223,3 +260,5 @@ For all cases:
 | snippy_variants_snippy_docker | Array[String] | Docker file used for Snippy in the Snippy_Variants subworkfow |
 | snippy_variants_snippy_version | Array[String] | Version of Snippy_Tree subworkflow used |
 | snippy_wg_snp_matrix | File | CSV file of whole genome pairwise SNP distances between samples, calculated from the final alignment |
+
+</div>
diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md b/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.2.0 | Yes; some optional features incompatible | Set-level |
+| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.3.0 | Yes; some optional features incompatible | Set-level |
 
 ## Snippy_Streamline_FASTA_PHB
 
@@ -37,8 +37,46 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a
 
     **If reference genomes have multiple contigs, they will not be compatible with using Gubbins** to mask recombination in the phylogenetic tree. The automatic selection of a reference genome by the workflow may result in a reference with multiple contigs. In this case, an alternative reference genome should be sought.
 
+### Workflow Tasks
+
+??? task "Snippy_Variants QC Metrics Concatenation (optional)"
+
+    ##### Snippy_Variants QC Metric Concatenation (optional) {#snippy_variants}
+
+    Optionally, the user can provide the `snippy_variants_qc_metrics` file produced by the Snippy_Variants workflow as input to the workflow to concatenate the reports for each sample in the tree. These per-sample QC metrics include the following columns:
+
+    - **samplename**: The name of the sample.
+    - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome.
+    - **total_reads**: The total number of reads in the sample.
+    - **percent_reads_aligned**: The percentage of reads that aligned to the reference genome.
+    - **variants_total**: The total number of variants detected between the sample and the reference genome.
+    - **percent_ref_coverage**: The percentage of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10).
+    - **#rname**: Reference sequence name (e.g., chromosome or contig name).
+    - **startpos**: Starting position of the reference sequence.
+    - **endpos**: Ending position of the reference sequence.
+    - **numreads**: Number of reads covering the reference sequence.
+    - **covbases**: Number of bases with coverage.
+    - **coverage**: Percentage of the reference sequence covered (depth ≥ 1).
+    - **meandepth**: Mean depth of coverage over the reference sequence.
+    - **meanbaseq**: Mean base quality over the reference sequence.
+    - **meanmapq**: Mean mapping quality over the reference sequence.
+
+    The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome.
+
+    !!! tip "QC Metrics for Phylogenetic Analysis"
+        These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses, and we recommend examining them before proceeding with phylogenetic analysis if performing Snippy_Variants and Snippy_Tree separately.
+
+    !!! techdetails "Snippy Variants Technical Details"
+        |  | Links |
+        | --- | --- |
+        | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl) |
+        | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+        | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | snippy_streamline_fasta | **assembly_fasta** | Array[File] | The assembly files for your samples |  | Required |
@@ -107,8 +145,12 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | snippy_centroid_docker | String | Docker file used for Centroid |
@@ -117,6 +159,7 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a
 | snippy_centroid_samplename | String | Name of the centroid sample |
 | snippy_centroid_version | String | Centroid version used |
 | snippy_cg_snp_matrix | File | CSV file of core genome pairwise SNP distances between samples, calculated from the final alignment  |
+| snippy_combined_qc_metrics | File | Combined QC metrics file containing concatenated QC metrics from all samples. |
 | snippy_concatenated_variants | File | The concatenated variants file |
 | snippy_filtered_metadata | File | TSV recording the columns of the Terra data table that were used in the summarize_data task |
 | snippy_final_alignment | File | Final alignment (FASTA file) used to generate the tree (either after snippy alignment, gubbins recombination removal, and/or core site selection with SNP-sites) |
@@ -151,3 +194,5 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a
 | snippy_variants_snippy_docker | Array[String] | Docker file used for Snippy in the Snippy_Variants subworkfow |
 | snippy_variants_snippy_version | Array[String] | Version of Snippy_Tree subworkflow used |
 | snippy_wg_snp_matrix | File | CSV file of whole genome pairwise SNP distances between samples, calculated from the final alignment |
+
+</div>
diff --git a/docs/workflows/phylogenetic_construction/snippy_tree.md b/docs/workflows/phylogenetic_construction/snippy_tree.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.1.0 | Yes; some optional features incompatible | Set-level |
+| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.3.0 | Yes; some optional features incompatible | Set-level |
 
 ## Snippy_Tree_PHB
 
@@ -53,6 +53,8 @@ Sequencing data used in the Snippy_Tree workflow must:
         - Using the core genome
             - `core_genome` = true (as default)
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | snippy_tree_wf | **tree_name_updated** | String | Internal component, do not modify. Used for replacing spaces with underscores_ |  | Do not modify |
@@ -123,6 +125,8 @@ Sequencing data used in the Snippy_Tree workflow must:
 | wg_snp_dists | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
 | wg_snp_dists | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
 
+</div>
+
 ### Workflow Tasks
 
 ??? task "Snippy"
@@ -262,7 +266,7 @@ Sequencing data used in the Snippy_Tree workflow must:
 
         |  | Links |
         | --- | --- |
-        | Task | [task_summarize_data.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_summarize_data.wdl) |
+        | Task | [task_summarize_data.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/data_handling/task_summarize_data.wdl) |
 
 ??? task "Concatenate Variants (optional)"
 
@@ -306,12 +310,49 @@ Sequencing data used in the Snippy_Tree workflow must:
         | Task | task_shared_variants.wdl |
         | Software Source Code | [task_shared_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/utilities/task_shared_variants.wdl) |
 
+??? task "Snippy_Variants QC Metrics Concatenation (optional)"
+
+    ##### Snippy_Variants QC Metric Concatenation (optional) {#snippy_variants}
+
+    Optionally, the user can provide the `snippy_variants_qc_metrics` file produced by the Snippy_Variants workflow as input to the workflow to concatenate the reports for each sample in the tree. These per-sample QC metrics include the following columns:
+
+    - **samplename**: The name of the sample.
+    - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome.
+    - **total_reads**: The total number of reads in the sample.
+    - **percent_reads_aligned**: The percentage of reads that aligned to the reference genome.
+    - **variants_total**: The total number of variants detected between the sample and the reference genome.
+    - **percent_ref_coverage**: The percentage of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10).
+    - **#rname**: Reference sequence name (e.g., chromosome or contig name).
+    - **startpos**: Starting position of the reference sequence.
+    - **endpos**: Ending position of the reference sequence.
+    - **numreads**: Number of reads covering the reference sequence.
+    - **covbases**: Number of bases with coverage.
+    - **coverage**: Percentage of the reference sequence covered (depth ≥ 1).
+    - **meandepth**: Mean depth of coverage over the reference sequence.
+    - **meanbaseq**: Mean base quality over the reference sequence.
+    - **meanmapq**: Mean mapping quality over the reference sequence.
+
+    The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome.
+
+    !!! tip "QC Metrics for Phylogenetic Analysis"
+        These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses, and we recommend examining them before proceeding with phylogenetic analysis if performing Snippy_Variants and Snippy_Tree separately.
+
+    !!! techdetails "Snippy Variants Technical Details"
+        |  | Links |
+        | --- | --- |
+        | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl) |
+        | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+        | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | snippy_cg_snp_matrix | File | CSV file of core genome pairwise SNP distances between samples, calculated from the final alignment  |
 | snippy_concatenated_variants | File | Concatenated snippy_results file across all samples in the set |
+| snippy_combined_qc_metrics | File | Combined QC metrics file containing concatenated QC metrics from all samples. |
 | snippy_filtered_metadata | File | TSV recording the columns of the Terra data table that were used in the summarize_data task |
 | snippy_final_alignment | File | Final alignment (FASTA file) used to generate the tree (either after snippy alignment, gubbins recombination removal, and/or core site selection with SNP-sites) |
 | snippy_final_tree | File | Newick tree produced from the final alignment. Depending on user input for core_genome, the tree could be a core genome tree (default when core_genome is true) or whole genome tree (if core_genome is false) |
@@ -336,6 +377,8 @@ Sequencing data used in the Snippy_Tree workflow must:
 | snippy_tree_version | String | Version of Snippy_Tree workflow |
 | snippy_wg_snp_matrix | File | CSV file of whole genome pairwise SNP distances between samples, calculated from the final alignment |
 
+</div>
+
 ## References
 
 > **Gubbins:** Croucher, Nicholas J., Andrew J. Page, Thomas R. Connor, Aidan J. Delaney, Jacqueline A. Keane, Stephen D. Bentley, Julian Parkhill, and Simon R. Harris. 2015. "Rapid Phylogenetic Analysis of Large Samples of Recombinant Bacterial Whole Genome Sequences Using Gubbins." Nucleic Acids Research 43 (3): e15.

diff --git a/docs/workflows/phylogenetic_construction/snippy_variants.md b/docs/workflows/phylogenetic_construction/snippy_variants.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria), [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics), [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.2.0 | Yes | Sample-level |
+| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria), [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics), [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.3.0 | Yes | Sample-level |
 
 ## Snippy_Variants_PHB
 
@@ -29,6 +29,8 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f
 !!! info "Query String"
     The query string can be a gene or any other annotation that matches the GenBank file/output VCF **EXACTLY**
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | snippy_variants_wf | **reference_genome_file** | File | Reference genome (GenBank file or fasta) |  | Required |
@@ -54,9 +56,44 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Workflow Tasks
 
-`Snippy_Variants` uses the snippy tool to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the `grep` bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the `snippy_results` column.
+`Snippy_Variants` uses Snippy to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the `grep` bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the `snippy_results` column.
+
+!!! info "Quality Control Metrics"
+    Additionally, `Snippy_Variants` extracts quality control (QC) metrics from the Snippy output for each sample. These per-sample QC metrics are saved in TSV files (`snippy_variants_qc_metrics`). The QC metrics include:
+
+    - **samplename**: The name of the sample.
+    - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome.
+    - **total_reads**: The total number of reads in the sample.
+    - **percent_reads_aligned**: The percentage of reads that aligned to the reference genome; also available in the `snippy_variants_percent_reads_aligned` output column.
+    - **variants_total**: The total number of variants detected between the sample and the reference genome.
+    - **percent_ref_coverage**: The percentage of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10); also available in the `snippy_variants_percent_ref_coverage` output column.
+    - **#rname**: Reference sequence name (e.g., chromosome or contig name).
+    - **startpos**: Starting position of the reference sequence.
+    - **endpos**: Ending position of the reference sequence.
+    - **numreads**: Number of reads covering the reference sequence.
+    - **covbases**: Number of bases with coverage.
+    - **coverage**: Percentage of the reference sequence covered (depth ≥ 1).
+    - **meandepth**: Mean depth of coverage over the reference sequence.
+    - **meanbaseq**: Mean base quality over the reference sequence.
+    - **meanmapq**: Mean mapping quality over the reference sequence.
+
+    Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome.
+
+!!! tip "QC Metrics for Phylogenetic Analysis"
+    These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses, and we recommend examining them before proceeding with phylogenetic analysis if performing Snippy_Variants and Snippy_Tree separately.
+
+    These per-sample QC metrics can also be combined into a single file (`snippy_combined_qc_metrics`) in downstream workflows, such as `snippy_tree`, providing an overview of QC metrics across all samples.
+
+!!! techdetails "Snippy Variants Technical Details"
+    |  | Links |
+    | --- | --- |
+    | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl)<br>[task_snippy_gene_query.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_gene_query.wdl) |
+    | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) |
+    | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) |
 
 ### Outputs
 
@@ -66,6 +103,8 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f
 !!! warning "Note on coverage calculations"
     The outputs from `samtools coverage` (found in the `snippy_variants_coverage_tsv` file) may differ from the `snippy_variants_percent_ref_coverage` due to different calculation methods. `samtools coverage` computes genome-wide coverage metrics (e.g., the proportion of bases covered at depth ≥ 1), while `snippy_variants_percent_ref_coverage` uses a user-defined minimum coverage threshold (default is 10), calculating the proportion of the reference genome with a depth greater than or equal to this threshold.
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | snippy_variants_bai | File | Indexed bam file of the reads aligned to the reference |
@@ -79,9 +118,12 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f
 | snippy_variants_outdir_tarball | File | A compressed file containing the whole directory of snippy output files. This is used when running Snippy_Tree |
 | snippy_variants_percent_reads_aligned | Float | Percentage of reads aligned to the reference genome |
 | snippy_variants_percent_ref_coverage| Float | Proportion of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10). |
+| snippy_variants_qc_metrics | File | TSV file containing quality control metrics for the sample |
 | snippy_variants_query | String | Query strings specified by the user when running the workflow |
 | snippy_variants_query_check | String | Verification that query strings are found in the reference genome |
 | snippy_variants_results | File | CSV file detailing results for all mutations identified in the query sequence relative to the reference |
 | snippy_variants_summary | File | A summary TXT fie showing the number of mutations identified for each mutation type |
 | snippy_variants_version | String | Version of Snippy used |
 | snippy_variants_wf_version | String | Version of Snippy_Variants used |
+
+</div>
diff --git a/docs/workflows/phylogenetic_placement/samples_to_ref_tree.md b/docs/workflows/phylogenetic_placement/samples_to_ref_tree.md
@@ -17,6 +17,8 @@ However, nextclade can be used on any organism as long as an an existing, high-q
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | nextclade_addToRefTree | **assembly_fasta** | File | A fasta file with query sequence(s) to be placed onto the global tree |  | Required |
@@ -34,8 +36,12 @@ However, nextclade can be used on any organism as long as an an existing, high-q
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | treeUpdate_auspice_json | File | Phylogenetic tree with user placed samples |
@@ -45,3 +51,5 @@ However, nextclade can be used on any organism as long as an an existing, high-q
 | treeUpdate_nextclade_version | String | Nextclade version used |
 | samples_to_ref_tree_analysis_date | String | Date of analysis |
 | samples_to_ref_tree_version | String | Version of the Public Health Bioinformatics (PHB) repository used |
+
+</div>
diff --git a/docs/workflows/phylogenetic_placement/usher.md b/docs/workflows/phylogenetic_placement/usher.md
@@ -14,6 +14,8 @@
 
 While this workflow is technically a set-level workflow, it works on the sample-level too. When run on the set-level, the samples are placed with respect to each other.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | usher_workflow | **assembly_fasta** | Array[File] | The assembly files for the samples you want to place on the pre-existing; can either be a set of samples, an individual sample, or multiple individual samples |  | Required |
@@ -29,8 +31,12 @@ While this workflow is technically a set-level workflow, it works on the sample-
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | usher_clades | File | The clades predicted for the samples |
@@ -41,3 +47,5 @@ While this workflow is technically a set-level workflow, it works on the sample-
 | usher_subtrees | Array[File] | An array of subtrees where your samples have been placed |
 | usher_uncondensed_tree | File | The entire global tree with your samples included (warning: may be a very large file if the organism is "sars-cov-2") |
 | usher_version | String | The version of UShER used |
+
+</div>
diff --git a/docs/workflows/public_data_sharing/fetch_srr_accession.md b/docs/workflows/public_data_sharing/fetch_srr_accession.md
@@ -0,0 +1,52 @@
+# Fetch SRR Accession Workflow
+
+## Quick Facts
+
+| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
+|---|---|---|---|---|
+| [Data Import](../../workflows_overview/workflows_type.md/#data-import) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.3.0 | Yes | Sample-level |
+
+## Fetch SRR Accession
+
+This workflow retrieves the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. The primary inputs are BioSample IDs (e.g., SAMN00000000) or SRA Experiment IDs (e.g., SRX000000), which link to sequencing data in the SRA repository.
+
+The workflow uses the fastq-dl tool to fetch metadata from SRA and specifically parses this metadata to extract the associated SRR accession and outputs the SRR accession.
+
+### Inputs
+
+| **Terra Task Name** | **Variable** | **Type** | **Description**| **Default Value** | **Terra Status** |
+| --- | --- | --- | --- | --- | --- |
+| fetch_srr_metadata | **sample_accession** | String |  SRA-compatible accession, such as a **BioSample ID** (e.g., "SAMN00000000") or **SRA Experiment ID** (e.g., "SRX000000"), used to retrieve SRR metadata. | | Required |
+| fetch_srr_metadata | **cpu** | Int | Number of CPUs allocated for the task. | 2 | Optional |
+| fetch_srr_metadata | **disk_size** | Int | Disk space in GB allocated for the task. | 10 | Optional |
+| fetch_srr_metadata | **docker**| String | Docker image for metadata retrieval. | `us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0` | Optional |
+| fetch_srr_metadata | **memory** | Int | Memory in GB allocated for the task. | 8 | Optional |
+| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
+| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |
+
+### Workflow Tasks
+
+This workflow has a single task that performs metadata retrieval for the specified sample accession.
+
+??? task "`fastq-dl`: Fetches SRR metadata for sample accession"
+    When provided a BioSample accession or SRA experiment ID, 'fastq-dl' collects metadata and returns the appropriate SRR accession.
+
+    !!! techdetails "fastq-dl Technical Details"
+        |  | Links | 
+        | --- | --- | 
+        | Task | [Task on GitHub](https://github.com/theiagen-org/phb-workflows/blob/main/tasks/utilities/data_handling/task_fetch_srr_metadata.wdl) |
+        | Software Source Code | [fastq-dl Source](https://github.com/rvalieris/fastq-dl) |
+        | Software Documentation | [fastq-dl Documentation](https://github.com/rvalieris/fastq-dl#documentation) |
+        | Original Publication | [fastq-dl: A fast and reliable tool for downloading SRA metadata](https://doi.org/10.1186/s12859-021-04346-3) |
+
+### Outputs
+
+| **Variable** | **Type** | **Description**|
+|---|---|---|
+| srr_accession| String | The SRR accession's associated with the input sample accession.|
+| fetch_srr_accession_version | String | The version of the fetch_srr_accession workflow. |
+| fetch_srr_accession_analysis_date | String | The date the fetch_srr_accession analysis was run. |
+
+## References
+
+> Valieris, R. et al., "fastq-dl: A fast and reliable tool for downloading SRA metadata." Bioinformatics, 2021.
diff --git a/docs/workflows/public_data_sharing/mercury_prep_n_batch.md b/docs/workflows/public_data_sharing/mercury_prep_n_batch.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.2.0 | Yes | Set-level |
+| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.3.0 | Yes | Set-level |
 
 ## Mercury_Prep_N_Batch_PHB
 
@@ -52,36 +52,41 @@ To help users collect all required metadata, we have created the following Excel
 
     The `using_clearlabs_data` and `using_reads_dehosted` arguments change the default values for the `read1_column_name`, `assembly_fasta_column_name`, and `assembly_mean_coverage_column_name` metadata columns. The default values are shown in the table below in addition to what they are changed to depending on what arguments are used.
 
-    | Variable | Default Value | with `using_clearlabs_data` | with `using_reads_dehosted` | with both  `using_clearlabs_data` ***and*** `using_reads_dehosted` |
+    | Variable | Default Value | with `using_clearlabs_data` | with `using_reads_dehosted` | with both  `using_clearlabs_data` **_and_** `using_reads_dehosted` |
     | --- | --- | --- | --- | --- |
     | `read1_column_name` | `"read1_dehosted"` | `"clearlabs_fastq_gz"` | `"reads_dehosted"` | `"reads_dehosted"` |
     | `assembly_fasta_column_name` | `"assembly_fasta"` | `"clearlabs_fasta"` | `"assembly_fasta"` | `"clearlabs_fasta"` |
-    | `assembly_mean_coverage_column_name` | `"assembly_mean_coverage"` | `"clearlabs_assembly_coverage"` | `"assembly_mean_coverage"` | `"clearlabs_assembly_coverage"` |
+    | `assembly_mean_coverage_column_name` | `"assembly_mean_coverage"` | `"clearlabs_sequencing_depth"` | `"assembly_mean_coverage"` | `"clearlabs_sequencing_depth"` |
 
 ### Inputs
 
+!!! tip "Use the sample table for the `terra_table_name` input"
+    Make sure your entry for `terra_table_name` is for the _sample_ table! While the root entity needs to be the set table, the input value for `terra_table_name` should be the sample table.
+
 This workflow runs on the set-level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | mercury_prep_n_batch | **gcp_bucket_uri** | String | Google bucket where your SRA reads will be temporarily stored before transferring to SRA. Example: "gs://theiagen_sra_transfer" |  | Required |
 | mercury_prep_n_batch | **sample_names** | Array[String] | The samples you want to submit |  | Required |
-| mercury_prep_n_batch | **terra_project_name** | String | The name of your Terra project. You can find this information in the URL of the webpage of your Terra dashboard. For example, if your URL contains #workspaces/example/my_workspace/ then your project name is example |  | Required |
-| mercury_prep_n_batch | **terra_table_name** | String | The name of the Terra table where your samples can be found. Do not include the entity: prefix or the _id suffix, just the name of the table as listed in the sidebar on lefthand side of the Terra Data tab. |  | Required |
+| mercury_prep_n_batch | **terra_project_name** | String | The name of your Terra project. You can find this information in the URL of the webpage of your Terra dashboard. For example, if your URL contains `#workspaces/example/my_workspace/` then your project name is `example` |  | Required |
+| mercury_prep_n_batch | **terra_table_name** | String | The name of the Terra table where your **samples** can be found. Do not include the `entity:` prefix, the `_id` suffix, or the `_set_id` suffix, just the name of the sample-level data table as listed in the sidebar on lefthand side of the Terra Data tab. |  | Required |
 | mercury_prep_n_batch | **terra_workspace_name** | String | The name of your Terra workspace where your samples can be found. For example, if your URL contains #workspaces/example/my_workspace/ then your project name is my_workspace |  | Required |
 | download_terra_table | **cpu** | Int | Number of CPUs to allocate to the task | 1 | Optional |
 | download_terra_table | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
 | download_terra_table | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21 | Optional |
-| download_terra_table | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 1 | Optional |
+| download_terra_table | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
 | mercury | **cpu** | Int | Number of CPUs to allocate to the task | 2 | Optional |
-| mercury | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
-| mercury | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.7 | Optional |
-| mercury | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
+| mercury | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
+| mercury | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.9 | Optional |
+| mercury | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
 | mercury | **number_N_threshold** | Int | Only for "sars-cov-2" submissions; used to filter out any samples that contain more than the indicated number of Ns in the assembly file | 5000 | Optional |
 | mercury | **single_end** | Boolean | Set to true if your data is single-end; this ensures that a read2 column is not included in the metadata | FALSE | Optional |
 | mercury | **skip_county** | Boolean | Use if your Terra table contains a county column that you do not want to include in your submission.  | FALSE | Optional |
 | mercury | **usa_territory** | Boolean | If true, the "state" column will be used in place of the "country" column. For example, if "state" is Puerto Rico, then the GISAID virus name will be `hCoV-19/Puerto Rico/<name>/<year>`. The NCBI `geo_loc_name` will be "USA: Puerto Rico". This optional Boolean variable should only be used with clear understanding of what it does. | FALSE | Optional |
-| mercury | **using_clearlabs_data** | Boolean | When set to true will change read1_dehosted → clearlabs_fastq_gz; assembly_fasta → clearlabs_fasta; assembly_mean_coverage → clearlabs_assembly_coverage | FALSE | Optional |
+| mercury | **using_clearlabs_data** | Boolean | When set to `true` will change `read1_dehosted` → `clearlabs_fastq_gz`; `assembly_fasta` → `clearlabs_fasta`; `assembly_mean_coverage` → `clearlabs_sequencing_depth` | FALSE | Optional |
 | mercury | **using_reads_dehosted** | Boolean | When set to true will only change read1_dehosted → reads_dehosted. Takes priority over the replacement for read1_dehosted made with the using_clearlabs_data Boolean input | FALSE | Optional |
 | mercury | **vadr_alert_limit** | Int | Only for "sars-cov-2" submissions; used to filter out any samples that contain more than the indicated number of vadr alerts | 0 | Optional |
 | mercury_prep_n_batch | **authors_sbt** | File | Only for "mpox" submissions; a file that contains author information. This file can be created here: <https://submit.ncbi.nlm.nih.gov/genbank/template/submission/> |  | Optional |
@@ -101,8 +106,12 @@ This workflow runs on the set-level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | bankit_sqn_to_email | File | **Only for mpox submission**: the sqn file that you will use to submit mpox assembly files to NCBI via email |
@@ -117,6 +126,8 @@ This workflow runs on the set-level.
 | mercury_script_version | String | Version of the Mercury tool that was used in this workflow |
 | sra_metadata | File | SRA metadata TSV file for upload |
 
+</div>
+
 ???+ toggle "An example excluded_samples.tsv file"
 
     ##### An example excluded_samples.tsv file {#example-excluded-samples}

diff --git a/docs/workflows/public_data_sharing/terra_2_gisaid.md b/docs/workflows/public_data_sharing/terra_2_gisaid.md
@@ -28,6 +28,8 @@ The optional variable `frameshift_notification` has three options that correspon
 
 This workflow runs on the sample level.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | Terra_2_GISAID | **client_id** | String | This value should be filled with the client-ID provided by GISAID | | Required |
@@ -43,12 +45,18 @@ This workflow runs on the sample level.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | failed_uploads | Boolean | The metadata for any failed uploads |
 | gisaid_cli_version | String | The verison of the GISAID CLI tool |
 | gisaid_logs | File | The log files regarding the submission |
 | terra_2_gisaid_analysis_date | String | The date of the analysis |
 | terra_2_gisaid_version | String | The version of the PHB repository that this workflow is hosted in |
+
+</div>
diff --git a/docs/workflows/public_data_sharing/terra_2_ncbi.md b/docs/workflows/public_data_sharing/terra_2_ncbi.md
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Bacteria](../../workflows_overview/workflows_kingdom.md#bacteria), [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics) [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.1.0 | No | Set-level |
+| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Bacteria](../../workflows_overview/workflows_kingdom.md#bacteria), [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics) [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.3.0 | No | Set-level |
 
 ## Terra_2_NCBI_PHB
 
@@ -103,6 +103,8 @@ This workflow runs on set-level data tables.
 !!! info "Production Submissions"
     Please note that an optional Boolean variable, `submit_to_production`, is **required** for a production submission.
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 | --- | --- | --- | --- | --- | --- |
 | Terra_2_NCBI | **bioproject** | String | BioProject accession that the samples will be submitted to  |  | Required |
@@ -143,6 +145,8 @@ This workflow runs on set-level data tables.
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ??? task "Workflow Tasks"
 
     ##### Workflow Tasks {#workflow-tasks}
@@ -178,6 +182,8 @@ If the workflow ends unsuccessfully, no outputs will be shown on Terra and the `
 
 The output files contain information mostly for debugging purposes. Additionally, if your submission is successful, the point of contact for the submission should also receive an email from NCBI notifying them of their submission success.
 
+<div class="searchable-table" markdown="1">
+
 | Variable | Description | Type |
 | --- | --- | --- |
 | biosample_failures | Text file listing samples that failed BioSample submission | File |
@@ -193,6 +199,8 @@ The output files contain information mostly for debugging purposes. Additionally
 | terra_2_ncbi_analysis_date | Date that the workflow was run | String |
 | terra_2_ncbi_version | Version of the PHB repository where the workflow is hosted | String |
 
+</div>
+
 ???+ toggle "An example excluded_samples.tsv file"
 
     ##### An example excluded_samples.tsv file {#example-excluded-samples}

@@ -0,0 +1,47 @@
+# Concatenate Illumina Lanes
+
+## Quick Facts
+
+| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
+|---|---|---|---|---|
+| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB 2.3.0 | Yes | Sample-level |
+
+## Concatenate_Illumina_Lanes_PHB
+
+Some Illumina machines produce multi-lane FASTQ files for a single sample. This workflow concatenates the multiple lanes into a single FASTQ file per read type (forward or reverse).
+
+### Inputs
+
+| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
+|---|---|---|---|---|---|
+| concatenate_illumina_lanes | **read1_lane1** | File | The first lane for the forward reads | | Required |
+| concatenate_illumina_lanes | **read1_lane2** | File | The second lane for the forward reads | | Required |
+| concatenate_illumina_lanes | **samplename** | String | The name of the sample, used to name the output files | | Required |
+| cat_lanes | **cpu** | Int | Number of CPUs to allocate to the task | 2 | Optional |
+| cat_lanes | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
+| cat_lanes | **docker** | String | The Docker container to use for the task |  "us-docker.pkg.dev/general-theiagen/theiagen/utility:1.2" | Optional |
+| cat_lanes | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
+| concatenate_illumina_lanes | **read1_lane3** | File | The third lane for the forward reads | | Optional |
+| concatenate_illumina_lanes | **read1_lane4** | File | The fourth lane for the forward reads | | Optional |
+| concatenate_illumina_lanes | **read2_lane1** | File | The first lane for the reverse reads | | Optional |
+| concatenate_illumina_lanes | **read2_lane2** | File | The second lane for the reverse reads | | Optional |
+| concatenate_illumina_lanes | **read2_lane3** | File | The third lane for the reverse reads | | Optional |
+| concatenate_illumina_lanes | **read2_lane4** | File | The fourth lane for the reverse reads | | Optional |
+| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
+| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |
+
+### Workflow Tasks
+
+This workflow concatenates the Illumina lanes for forward and (if provided) reverse reads. The output files are named as followed:
+
+- Forward reads: `<samplename>_merged_R1.fastq.gz`
+- Reverse reads: `<samplename>_merged_R2.fastq.gz`
+
+### Outputs
+
+| **Variable** | **Type** | **Description** |
+|---|---|---|
+| concatenate_illumina_lanes_analysis_date | String | Date of analysis |
+| concatenate_illumina_lanes_version | String | Version of PHB used for the analysis |
+| read1_concatenated | File | Concatenated forward reads |
+| read2_concatenated | File | Concatenated reverse reads |
@@ -12,6 +12,8 @@ The GAMBIT_Query_PHB workflow performs taxon assignment of a genome assembly usi
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | gambit_query | **assembly_fasta** | File | Assembly file in FASTA format |  | Required |
@@ -23,6 +25,8 @@ The GAMBIT_Query_PHB workflow performs taxon assignment of a genome assembly usi
 | gambit | **gambit_db_genomes** | File | Database of metadata for assembled query genomes; requires complementary signatures file. If not provided, uses default database "/gambit-db" | "gs://gambit-databases-rp/2.0.0/gambit-metadata-2.0.0-20240628.gdb" | Optional |
 | gambit | **gambit_db_signatures** | File | Signatures file; requires complementary genomes file. If not specified, the file from the docker container will be used. | "gs://gambit-databases-rp/2.0.0/gambit-signatures-2.0.0-20240628.gs" | Optional |
 
+</div>
+
 ### Workflow Tasks
 
 [`GAMBIT`](https://github.com/jlumpe/gambit) determines the taxon of the genome assembly using a k-mer based approach to match the assembly sequence to the closest complete genome in a database, thereby predicting its identity. Sometimes, GAMBIT can confidently designate the organism to the species level. Other times, it is more conservative and assigns it to a higher taxonomic rank.
@@ -40,6 +44,8 @@ For additional details regarding the GAMBIT tool and a list of available GAMBIT
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | gambit_closest_genomes | File | CSV file listing genomes in the GAMBIT database that are most similar to the query assembly |
@@ -50,6 +56,8 @@ For additional details regarding the GAMBIT tool and a list of available GAMBIT
 | gambit_query_wf_analysis_date | String | Date of analysis |
 | gambit_query_wf_version | String | PHB repository version |
 | gambit_report | File | GAMBIT report in a machine-readable format |
-| gambit_version | String | Version of gambit software used
+| gambit_version | String | Version of gambit software used |
+
+</div>
 
 > GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. Lumpe et al. PLOS ONE, 2022. DOI: [10.1371/journal.pone.0277575](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277575)
@@ -4,7 +4,7 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.0.0 | Yes | Sample-level |
+| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.3.0 | Yes | Sample-level |
 
 ## Kraken2 Workflows
 
@@ -30,6 +30,8 @@ Besides the data input types, there are minimal differences between these two wo
 
 #### Suggested databases
 
+<div class="searchable-table" markdown="1">
+
 | Database name | Database Description | Suggested Applications | GCP URI (for usage in Terra) | Source | Database Size (GB) | Date of Last Update |
 | --- | --- | --- | --- | --- | --- | --- |
 | **Kalamari v5.1** | Kalamari is a database of complete public assemblies, that has been fine-tuned for enteric pathogens and is backed by trusted institutions. [Full list available here ( in chromosomes.tsv and plasmids.tsv)](https://github.com/lskatz/Kalamari/tree/master/src) | Single-isolate enteric bacterial pathogen analysis (Salmonella, Escherichia, Shigella, Listeria, Campylobacter, Vibrio, Yersinia) | **`gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2.kalamari_5.1.tar.gz`** | ‣ | 1.5 | 18/5/2022 |
@@ -40,8 +42,12 @@ Besides the data input types, there are minimal differences between these two wo
 | **EuPathDB48** | Eukaryotic pathogen genomes with contaminants removed. [Full list available here](https://genome-idx.s3.amazonaws.com/kraken/k2_eupathdb48_20201113/EuPathDB48_Contents.txt) | Eukaryotic organisms (Candida spp., Aspergillus spp., etc) | **`gs://theiagen-public-files-rp/terra/theiaprok-files/k2_eupathdb48_20201113.tar.gz`** | https://benlangmead.github.io/aws-indexes/k2 | 30.3 | 13/11/2020 |
 | **EuPathDB48** | Eukaryotic pathogen genomes with contaminants removed. [Full list available here](https://genome-idx.s3.amazonaws.com/kraken/k2_eupathdb48_20201113/EuPathDB48_Contents.txt) | Eukaryotic organisms (Candida spp., Aspergillus spp., etc) | **`gs://theiagen-large-public-files-rp/terra/databases/kraken/k2_eupathdb48_20230407.tar.gz`** | https://benlangmead.github.io/aws-indexes/k2 | 11 | 7/4/2023 |
 
+</div>
+
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | **Workflow** |
 |---|---|---|---|---|---|---|
 | *workflow_name | **kraken2_db** | File | A Kraken2 database in .tar.gz format |  | Required | ONT, PE, SE |
@@ -67,8 +73,12 @@ Besides the data input types, there are minimal differences between these two wo
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional | ONT, PE, SE |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional | ONT, PE, SE |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | kraken2_classified_read1 | File | FASTQ file of classified forward/R1 reads |
@@ -85,6 +95,8 @@ Besides the data input types, there are minimal differences between these two wo
 | krona_html | File | HTML report of krona with visualisation of taxonomic classification of reads (if PE or SE) |
 | krona_version | String | krona version (if PE or SE) |
 
+</div>
+
 #### Interpretation of results
 
 The most important outputs of the Kraken2 workflows are the `kraken2_report` files. These will include a breakdown of the number of sequences assigned to a particular taxon, and the percentage of reads assigned. [A complete description of the report format can be found here](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format).

@@ -19,6 +19,8 @@ You can check if a gene or point mutation is in the AMRFinderPlus database [here
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | amrfinderplus_wf | **assembly** | File | Genome assembly file in FASTA format. Can be generated by TheiaProk workflow or other bioinformatics workflows. | | Required |
@@ -35,8 +37,12 @@ You can check if a gene or point mutation is in the AMRFinderPlus database [here
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | amrfinderplus_all_report | File | Output TSV file from AMRFinderPlus (described [here](https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#fields)) |
@@ -54,6 +60,8 @@ You can check if a gene or point mutation is in the AMRFinderPlus database [here
 | amrfinderplus_wf_analysis_date | String | Date of analysis |
 | amrfinderplus_wf_version | String | Version of PHB used for the analysis |
 
+</div>
+
 ## References
 
 >Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J, Haft DH, Hoffmann M, Pettengill JB, Prasad AB, Tillman GE, Tyson GH, Klimke W. AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci Rep. 2021 Jun 16;11(1):12728. doi: 10.1038/s41598-021-91456-0. PMID: 34135355; PMCID: PMC8208984. <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208984/>

@@ -16,11 +16,14 @@ There are three Kraken2 workflows:
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | **Workflow** |
 |---|---|---|---|---|---|---|
 | dehost_pe or dehost_se | **read1** | File | | | Required | PE, SE |
 | dehost_pe or dehost_se | **read2** | File | | | Required | PE |
 | dehost_pe or dehost_se | **samplename** | String | | | Required | PE, SE |
+| dehost_pe or dehost_se | **target_organism** | String | Target organism for Kraken2 reporting | "Severe acute respiratory syndrome coronavirus 2" | Optional | PE, SE |
 | kraken2 | **cpu** | Int | Number of CPUs to allocate to the task | 4 | Optional | PE, SE |
 | kraken2 | **disk_size** | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database) | 100 | Optional | PE, SE |
 | kraken2 | **docker_image** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.0.8-beta_hv | Optional | PE, SE |
@@ -35,6 +38,8 @@ There are three Kraken2 workflows:
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional | PE, SE |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional | PE, SE |
 
+</div>
+
 ### Workflow Tasks
 
 This workflow is composed of two tasks, one to dehost the input reads and another to screen the clean reads with kraken2 and the viral+human database.
@@ -62,13 +67,15 @@ This workflow is composed of two tasks, one to dehost the input reads and anothe
         
         |  | Links |
         | --- | --- |
-        | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_kraken2.wdl) |
+        | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) |
         | Software Source Code | [Kraken2 on GitHub](https://github.com/DerrickWood/kraken2/) |
         | Software Documentation | <https://github.com/DerrickWood/kraken2/wiki> |
         | Original Publication(s) | [Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) |
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** | **Workflow** |
 |---|---|---|---|
 | kraken_human_dehosted | Float | Percent of human read data detected using the Kraken2 software after host removal | PE, SE |
@@ -82,3 +89,4 @@ This workflow is composed of two tasks, one to dehost the input reads and anothe
 | read1_dehosted | File | Dehosted forward reads | PE, SE |
 | read2_dehosted | File | Dehosted reverse reads | PE |
 
+</div>
@@ -27,6 +27,8 @@ RASUSA functions to randomly downsample the number of raw reads to a user-define
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Attribute** | **Terra Status** |
 |---|---|---|---|---|---|
 | rasusa_workflow | **coverage** | Float | Use to specify the desired coverage of reads after downsampling; actual coverage of subsampled reads will not be exact and may be slightly higher; always check the estimated clean coverage after performing downstream workflows to verify coverage values, when necessary | | Required |
@@ -45,8 +47,12 @@ RASUSA functions to randomly downsample the number of raw reads to a user-define
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | rasusa_version | String | Version of RASUSA used for the analysis |
@@ -55,6 +61,8 @@ RASUSA functions to randomly downsample the number of raw reads to a user-define
 | read1_subsampled | File | New read1 FASTQ files downsampled to desired coverage |
 | read2_subsampled | File | New read2 FASTQ files downsampled to desired coverage |
 
+</div>
+
 !!! tip "Don't Forget!"
     Remember to use the subsampled reads in downstream analyses with `this.read1_subsampled` and `this.read2_subsampled` inputs.
 

@@ -12,6 +12,8 @@ This sample-level workflow receives a read file or a pair of read files (FASTQ),
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | rename_fastq_files | **new_filename** | String | New name for the FASTQ file(s) | | Required |
@@ -24,6 +26,8 @@ This sample-level workflow receives a read file or a pair of read files (FASTQ),
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Outputs
 
 If a reverse read (`read2`) is provided, the files get renamed to the provided `new_filename` input with the notation `<new_filename>_R1.fastq.gz` and `<new_filename>_R2.fastq.gz`. If only `read1` is provided, the file is renamed to `<new_filename>.fastq.gz`. 

@@ -4,14 +4,16 @@
 
 | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** |
 |---|---|---|---|---|
-| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.0.0 | Yes | Sample-level |
+| [Standalone](../../workflows_overview/workflows_type.md/#standalone) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.3.0 | Yes | Sample-level |
 
 ## TBProfiler_tNGS_PHB
 
 This workflow is still in experimental research stages. Documentation is minimal as changes may occur in the code; it will be fleshed out when a stable state has been achieved.
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
 |---|---|---|---|---|---|
 | tbprofiler_tngs | **read1** | File | Illumina forward read file in FASTQ file format (compression optional) |  | Required |
@@ -21,7 +23,7 @@ This workflow is still in experimental research stages. Documentation is minimal
 | tbp_parser | **coverage_threshold** | Int | The minimum percentage of a region to exceed the minimum depth for a region to pass QC in tbp_parser | 100 | Optional |
 | tbp_parser | **cpu** | Int | Number of CPUs to allocate to the task | 1 | Optional |
 | tbp_parser | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
-| tbp_parser | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0 | Optional |
+| tbp_parser | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.2.2 | Optional |
 | tbp_parser | **etha237_frequency** | Float | Minimum frequency for a mutation in ethA at protein position 237 to pass QC in tbp-parser | 0.1 | Optional |
 | tbp_parser | **expert_rule_regions_bed** | File | A file that contains the regions where R mutations and expert rules are applied |  | Optional |
 | tbp_parser | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
@@ -62,8 +64,12 @@ This workflow is still in experimental research stages. Documentation is minimal
 | version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
 | version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) |  | Optional |
 
+</div>
+
 ### Terra Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | tbp_parser_average_genome_depth | Float | The mean depth of coverage across all target regions included in the analysis |
@@ -95,3 +101,5 @@ This workflow is still in experimental research stages. Documentation is minimal
 | trimmomatic_read2_trimmed | File | The read2 file post trimming |
 | trimmomatic_stats | File | The read trimming statistics |
 | trimmomatic_version | String | The version of trimmomatic used in this analysis |
+
+</div>
@@ -39,6 +39,8 @@ If a column consists of only GCP URIs (Google Cloud file paths), the files will
 
 ### Inputs
 
+<div class="searchable-table" markdown="1">
+
 Please note that all string inputs **must** be enclosed in quotation marks; for example, "column1,column2" or "workspace1"
 
 | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
@@ -62,6 +64,8 @@ Please note that all string inputs **must** be enclosed in quotation marks; for
 | export_two_tsvs | **cpu** | Int | Number of CPUs to allocate to the task | 1 | Optional |
 | export_two_tsvs | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
 
+</div>
+
 The optional `validation_criteria_tsv` file takes the following format (tab-delimited; _a header line is required_):
 
 ```text linenums="1"
@@ -95,6 +99,8 @@ Please note that the name in the **second column** will be displayed and used in
 
 ### Outputs
 
+<div class="searchable-table" markdown="1">
+
 | **Variable** | **Type** | **Description** |
 |---|---|---|
 | theiavalidate_criteria_differences | File | A TSV file that lists only the differences that fail to meet the validation criteria |
@@ -108,6 +114,8 @@ Please note that the name in the **second column** will be displayed and used in
 | theiavalidate_version | String | The version of the TheiaValidate Python Docker |
 | theiavalidate_wf_version | String | The version of the PHB repository |
 
+</div>
+
 ### Example Data and Outputs
 
 To help demonstrate how TheiaValidate works, please observe the following example and outputs:

@@ -23,7 +23,7 @@ nav:
                - Freyja Workflow Series: workflows/genomic_characterization/freyja.md
                - Pangolin_Update: workflows/genomic_characterization/pangolin_update.md
                - TheiaCoV Workflow Series: workflows/genomic_characterization/theiacov.md
-               - TheiaEuk: workflows/genomic_characterization/theiaeuk.md
+               - TheiaEuk Workflow Series: workflows/genomic_characterization/theiaeuk.md
                - TheiaMeta: workflows/genomic_characterization/theiameta.md
                - TheiaProk Workflow Series: workflows/genomic_characterization/theiaprok.md
                - VADR_Update: workflows/genomic_characterization/vadr_update.md
@@ -43,6 +43,7 @@ nav:
                - Samples_to_Ref_Tree: workflows/phylogenetic_placement/samples_to_ref_tree.md
                - Usher_PHB: workflows/phylogenetic_placement/usher.md
             - Public Data Sharing:
+               - Fetch_SRR_Accession: workflows/public_data_sharing/fetch_srr_accession.md
                - Mercury_Prep_N_Batch: workflows/public_data_sharing/mercury_prep_n_batch.md
                - Terra_2_GISAID: workflows/public_data_sharing/terra_2_gisaid.md
                - Terra_2_NCBI: workflows/public_data_sharing/terra_2_ncbi.md
@@ -52,6 +53,7 @@ nav:
                - Zip_Column_Content: workflows/data_export/zip_column_content.md
             - Standalone:
                - Cauris_CladeTyper: workflows/standalone/cauris_cladetyper.md
+               - Concatenate_Illumina_Lanes: workflows/standalone/concatenate_illumina_lanes.md
                - GAMBIT_Query: workflows/standalone/gambit_query.md
                - Kraken2: workflows/standalone/kraken2.md
                - NCBI-AMRFinderPlus: workflows/standalone/ncbi_amrfinderplus.md
@@ -65,7 +67,8 @@ nav:
             - Any Taxa:
                - Assembly_Fetch: workflows/data_import/assembly_fetch.md
                - BaseSpace_Fetch: workflows/data_import/basespace_fetch.md
-               - Concatenate_Column_Content: workflows/data_export/concatenate_column_content.md
+               - Concatenate_Column_Content: workflows/data_export/concatenate_column_content.md               
+               - Concatenate_Illumina_Lanes: workflows/standalone/concatenate_illumina_lanes.md
                - Create_Terra_Table: workflows/data_import/create_terra_table.md
                - Kraken2: workflows/standalone/kraken2.md
                - NCBI-Scrub: workflows/standalone/ncbi_scrub.md
@@ -100,7 +103,7 @@ nav:
                - NCBI-AMRFinderPlus: workflows/standalone/ncbi_amrfinderplus.md
                - Snippy_Variants: workflows/phylogenetic_construction/snippy_variants.md
                - Terra_2_NCBI: workflows/public_data_sharing/terra_2_ncbi.md
-               - TheiaEuk: workflows/genomic_characterization/theiaeuk.md
+               - TheiaEuk Workflow Series: workflows/genomic_characterization/theiaeuk.md
             - Viral:
                - Augur: workflows/phylogenetic_construction/augur.md
                - CZGenEpi_Prep: workflows/phylogenetic_construction/czgenepi_prep.md
@@ -123,6 +126,7 @@ nav:
             - BaseSpace_Fetch: workflows/data_import/basespace_fetch.md
             - Cauris_CladeTyper: workflows/standalone/cauris_cladetyper.md
             - Concatenate_Column_Content: workflows/data_export/concatenate_column_content.md
+            - Concatenate_Illumina_Lanes: workflows/standalone/concatenate_illumina_lanes.md
             - Core_Gene_SNP: workflows/phylogenetic_construction/core_gene_snp.md
             - Create_Terra_Table: workflows/data_import/create_terra_table.md
             - CZGenEpi_Prep: workflows/phylogenetic_construction/czgenepi_prep.md
@@ -149,7 +153,7 @@ nav:
             - Terra_2_GISAID: workflows/public_data_sharing/terra_2_gisaid.md
             - Terra_2_NCBI: workflows/public_data_sharing/terra_2_ncbi.md
             - TheiaCoV Workflow Series: workflows/genomic_characterization/theiacov.md
-            - TheiaEuk: workflows/genomic_characterization/theiaeuk.md
+            - TheiaEuk Workflow Series: workflows/genomic_characterization/theiaeuk.md
             - TheiaMeta: workflows/genomic_characterization/theiameta.md
             - TheiaProk Workflow Series: workflows/genomic_characterization/theiaprok.md
             - TheiaValidate: workflows/standalone/theiavalidate.md
@@ -230,11 +234,12 @@ plugins:
   # - section-index
 
 extra_javascript:
-  - https://unpkg.com/tablesort@5.3.0/dist/tablesort.min.js
-  - javascripts/tablesort.js
+  - https://unpkg.com/tablesort@5.3.0/dist/tablesort.min.js  
+  - javascripts/tablesort.js 
+  - javascripts/table-search.js                                     
 
 extra_css:
-  - stylesheets/extra.css
+  - stylesheets/extra.css                                         
 
 extra:
   social:
@@ -251,4 +256,4 @@ extra:
   homepage: https://www.theiagen.com
 
 copyright: |
-  &copy; 2022-2024 <a href="https://www.theiagen.com" target="_blank" rel="noopener">Theiagen Genomics</a>
+  &copy; 2022-2024 <a href="https://www.theiagen.com" target="_blank" rel="noopener">Theiagen Genomics</a>
@@ -12,7 +12,7 @@ task consensus {
     Int memory = 16
     Int disk_size = 100
     String medaka_model = "r941_min_high_g360"
-    String docker = "us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019-epi2me"
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/artic:1.2.4-1.12.0"
   }
   String primer_name = basename(primer_bed)
   command <<<
@@ -61,7 +61,13 @@ task consensus {
     # version control
     echo "Medaka via $(artic -v)" | tee VERSION
     echo "~{primer_name}" | tee PRIMER_NAME
-    artic minion --medaka --medaka-model ~{medaka_model} --normalise ~{normalise} --threads ~{cpu} --scheme-directory ./primer-schemes --read-file ~{read1} ${scheme_name} ~{samplename}
+    artic minion \
+      --medaka \
+      --medaka-model ~{medaka_model} \
+      --normalise ~{normalise} \
+      --threads ~{cpu} \
+      --scheme-directory ./primer-schemes \
+      --read-file ~{read1} ${scheme_name} ~{samplename}
     gunzip -f ~{samplename}.pass.vcf.gz
 
     # clean up fasta header

@@ -87,9 +87,26 @@ task irma {
       echo "Type_"$(basename "$(echo "$(find ~{samplename}/*.fasta | head -n1)")" | cut -d_ -f1) > IRMA_TYPE
       # set irma_type bash variable which is used later
       irma_type=$(cat IRMA_TYPE)
-      # concatenate consensus assemblies into single file with all genome segments
-      echo "DEBUG: creating IRMA FASTA file containing all segments...."
-      cat ~{samplename}/*.fasta > ~{samplename}.irma.consensus.fasta
+
+      # flu segments from largest to smallest
+      segments=("PB2" "PB1" "PA" "HA" "NP" "NA" "MP" "NS")
+
+      echo "DEBUG: creating IRMA FASTA file containing all segments in order (largest to smallest)...."
+
+      # initialize an empty file
+      touch ~{samplename}.irma.consensus.fasta
+
+      # concatenate files in the order of the segments array
+      for segment in "${segments[@]}"; do
+        segment_file=$(find "~{samplename}" -name "*${segment}*.fasta")
+        if [ -n "$segment_file" ]; then
+          echo "DEBUG: Adding $segment_file to consensus FASTA"
+          cat "$segment_file" >> ~{samplename}.irma.consensus.fasta
+        else
+          echo "WARNING: No file containing ${segment} found for ~{samplename}"
+        fi
+      done
+
       echo "DEBUG: editing IRMA FASTA file to include sample name in FASTA headers...."
       sed -i "s/>/>~{samplename}_/g" ~{samplename}.irma.consensus.fasta
 

@@ -89,19 +89,52 @@ task snippy_variants {
     if [ "$reference_length" -eq 0 ]; then
       echo "Could not compute percent reference coverage: reference length is 0" > PERCENT_REF_COVERAGE
     else
-      # compute percent reference coverage
-      echo $reference_length_passed_depth $reference_length | awk '{ print ($1/$2)*100 }' > PERCENT_REF_COVERAGE
+      echo $reference_length_passed_depth $reference_length | awk '{ printf("%.2f", ($1/$2)*100) }' > PERCENT_REF_COVERAGE
     fi
 
     # Compute percentage of reads aligned
     reads_aligned=$(cat READS_ALIGNED_TO_REFERENCE)
     total_reads=$(samtools view -c "~{samplename}/~{samplename}.bam")
+    echo $total_reads > TOTAL_READS
     if [ "$total_reads" -eq 0 ]; then
       echo "Could not compute percent reads aligned: total reads is 0" > PERCENT_READS_ALIGNED
     else
-      echo $reads_aligned $total_reads | awk '{ print ($1/$2)*100 }' > PERCENT_READS_ALIGNED
+      echo $reads_aligned $total_reads | awk '{ printf("%.2f", ($1/$2)*100) }' > PERCENT_READS_ALIGNED
     fi
 
+    # Create QC metrics file
+    line_count=$(wc -l < "~{samplename}/~{samplename}_coverage.tsv")
+    # Check the number of lines in the coverage file, to consider scenarios e.g. for V. cholerae that has two chromosomes and therefore coverage metrics per chromosome
+    if [ "$line_count" -eq 2 ]; then
+      head -n 1 "~{samplename}/~{samplename}_coverage.tsv" | tr ' ' '\t' > COVERAGE_HEADER
+      sed -n '2p' "~{samplename}/~{samplename}_coverage.tsv" | tr ' ' '\t' > COVERAGE_VALUES
+    elif [ "$line_count" -gt 2 ]; then
+      # Multiple chromosomes (header + multiple data lines)
+      header=$(head -n 1 "~{samplename}/~{samplename}_coverage.tsv")
+      output_header=""
+      output_values=""
+      # while loop to iterate over each line in the coverage file
+      while read -r line; do
+        if [ -z "$output_header" ]; then
+          output_header="$header"
+          output_values="$line"
+        else
+          output_header="$output_header\t$header"
+          output_values="$output_values\t$line"
+        fi
+      done < <(tail -n +2 "~{samplename}/~{samplename}_coverage.tsv")
+      echo "$output_header" | tr ' ' '\t' > COVERAGE_HEADER
+      echo "$output_values" | tr ' ' '\t' > COVERAGE_VALUES
+    else
+      # Coverage file has insufficient data
+      echo "Coverage file has insufficient data." > COVERAGE_HEADER
+      echo "" > COVERAGE_VALUES
+    fi
+
+    # Build the QC metrics file
+    echo -e "samplename\treads_aligned_to_reference\ttotal_reads\tpercent_reads_aligned\tvariants_total\tpercent_ref_coverage\t$(cat COVERAGE_HEADER)" > "~{samplename}/~{samplename}_qc_metrics.tsv"
+    echo -e "~{samplename}\t$reads_aligned\t$total_reads\t$(cat PERCENT_READS_ALIGNED)\t$(cat VARIANTS_TOTAL)\t$(cat PERCENT_REF_COVERAGE)\t$(cat COVERAGE_VALUES)" >> "~{samplename}/~{samplename}_qc_metrics.tsv"
+
   >>>
   output {
     String snippy_variants_version = read_string("VERSION")
@@ -120,6 +153,7 @@ task snippy_variants {
     String snippy_variants_ref_length = read_string("REFERENCE_LENGTH")
     String snippy_variants_ref_length_passed_depth = read_string("REFERENCE_LENGTH_PASSED_DEPTH")
     String snippy_variants_percent_ref_coverage = read_string("PERCENT_REF_COVERAGE")
+    File snippy_variants_qc_metrics = "~{samplename}/~{samplename}_qc_metrics.tsv"
     String snippy_variants_percent_reads_aligned = read_string("PERCENT_READS_ALIGNED")
   }
   runtime {

@@ -12,8 +12,13 @@ task augur_align {
     String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/augur:22.0.2--pyhdfd78af_0"
   }
   command <<<
+    set -euo pipefail
+
     # capture version information
     augur version > VERSION
+    echo
+    echo "mafft version:"
+    mafft --version 2>&1 | tee MAFFT_VERSION
 
     # run augur align
     augur align \
@@ -26,6 +31,7 @@ task augur_align {
   output {
     File aligned_fasta = "alignment.fasta"
     String augur_version = read_string("VERSION")
+    String mafft_version = read_string("MAFFT_VERSION")
   }
   runtime {
     docker: docker

@@ -16,8 +16,30 @@ task augur_tree {
     String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/augur:22.0.2--pyhdfd78af_0"
   }
   command <<<
+    set -euo pipefail
+
     # capture version information
     augur version > VERSION
+    echo
+
+    # touch the version files to ensure they exist (so that read_string output function doesn't fail)
+    touch IQTREE_VERSION FASTTREE_VERSION RAXML_VERSION
+
+    # capture version information only for the method selected by user OR default of iqtree
+    if [ "~{method}" == "iqtree" ]; then
+      echo "iqtree version:"
+      iqtree --version | grep version | sed 's/.*version/version/;s/ for Linux.*//' | tee IQTREE_VERSION
+    elif [ "~{method}" == "fasttree" ]; then
+      echo "fasttree version:"
+      # fasttree prints to STDERR, so we need to redirect it to STDOUT, then grep for line with version info, then cut to extract version number (and nothing else)
+      fasttree -help 2>&1 | grep -m 1 "FastTree" | cut -d ' ' -f 2 | tee FASTTREE_VERSION
+    elif [ "~{method}" == "raxml" ]; then
+      echo "raxml version:"
+      raxmlHPC -v | grep RAxML | sed -e 's/.*RAxML version //' -e 's/released.*//' | tee RAXML_VERSION 
+    fi
+
+    echo
+    echo "Running augur tree now..."
 
     AUGUR_RECURSION_LIMIT=10000 augur tree \
       --alignment "~{aligned_fasta}" \
@@ -28,10 +50,32 @@ task augur_tree {
       ~{"--tree-builder-args " + tree_builder_args} \
       ~{true="--override-default-args" false="" override_default_args} \
       --nthreads auto
+
+    # If iqtree, get the model used
+    if [ "~{method}" == "iqtree" ]; then
+      if [ "~{substitution_model}" == "auto" ]; then
+        FASTA_BASENAME=$(basename ~{aligned_fasta} .fasta)
+        FASTA_DIR=$(dirname ~{aligned_fasta})
+        MODEL=$(grep "Best-fit model:" ${FASTA_DIR}/*${FASTA_BASENAME}-delim.iqtree.log | sed 's|Best-fit model: ||g;s|chosen.*||' | tr -d '\n\r')
+      else
+        MODEL="~{substitution_model}"
+      fi
+      echo "$MODEL" > FINAL_MODEL.txt
+    else
+      echo "" > FINAL_MODEL.txt
+    fi
+
+    echo 
+    echo "DEBUG: FINAL_MODEL.txt is: $(cat FINAL_MODEL.txt)"
   >>>
+
   output {
     File aligned_tree  = "~{build_name}_~{method}.nwk"
     String augur_version = read_string("VERSION")
+    String iqtree_version = read_string("IQTREE_VERSION")
+    String fasttree_version = read_string("FASTTREE_VERSION")
+    String raxml_version = read_string("RAXML_VERSION")
+    String iqtree_model_used = read_string("FINAL_MODEL.txt")
   }
   runtime {
     docker: docker

@@ -14,11 +14,11 @@ task stats_n_coverage {
     samtools --version | head -n1 | tee VERSION
 
     samtools stats ~{bamfile} > ~{samplename}.stats.txt
-
     samtools coverage ~{bamfile} -m -o ~{samplename}.cov.hist
     samtools coverage ~{bamfile} -o ~{samplename}.cov.txt
     samtools flagstat ~{bamfile} > ~{samplename}.flagstat.txt
 
+     # Extracting coverage, depth, meanbaseq, and meanmapq
     coverage=$(cut -f 6 ~{samplename}.cov.txt | tail -n 1)
     depth=$(cut -f 7 ~{samplename}.cov.txt | tail -n 1)
     meanbaseq=$(cut -f 8 ~{samplename}.cov.txt | tail -n 1)
@@ -33,6 +33,34 @@ task stats_n_coverage {
     echo $depth | tee DEPTH
     echo $meanbaseq | tee MEANBASEQ
     echo $meanmapq | tee MEANMAPQ
+
+    # Parsing stats.txt for total and mapped reads
+    total_reads=$(grep "^SN" ~{samplename}.stats.txt | grep "raw total sequences:" | cut -f 3)
+    mapped_reads=$(grep "^SN" ~{samplename}.stats.txt | grep "reads mapped:" | cut -f 3)
+
+    # Check for empty values and set defaults to avoid errors
+    if [ -z "$total_reads" ]; then total_reads="1"; fi  # Avoid division by zero
+    if [ -z "$mapped_reads" ]; then mapped_reads="0"; fi
+
+    # Calculate the percentage of mapped reads
+    percentage_mapped_reads=$(awk "BEGIN {printf \"%.2f\", ($mapped_reads / $total_reads) * 100}")
+
+    # If the percentage calculation fails, default to 0.0
+    if [ -z "$percentage_mapped_reads" ]; then percentage_mapped_reads="0.0"; fi
+
+    # Output the result
+    echo $percentage_mapped_reads | tee PERCENTAGE_MAPPED_READS
+
+    #output all metrics in one txt file
+    # Output header row (for CSV)
+    echo -e "Statistic\tValue" > ~{samplename}_metrics.txt
+
+    # Output each statistic as a row
+    echo -e "Coverage\t$coverage" >> ~{samplename}_metrics.txt
+    echo -e "Depth\t$depth" >> ~{samplename}_metrics.txt
+    echo -e "Mean Base Quality\t$meanbaseq" >> ~{samplename}_metrics.txt
+    echo -e "Mean Mapping Quality\t$meanmapq" >> ~{samplename}_metrics.txt
+    echo -e "Percentage Mapped Reads\t$percentage_mapped_reads" >> ~{samplename}_metrics.txt
   >>>
   output {
     String date = read_string("DATE")
@@ -45,6 +73,9 @@ task stats_n_coverage {
     Float depth = read_string("DEPTH")
     Float meanbaseq = read_string("MEANBASEQ")
     Float meanmapq = read_string("MEANMAPQ")
+    Float percentage_mapped_reads = read_string("PERCENTAGE_MAPPED_READS")
+    File metrics_txt = "~{samplename}_metrics.txt"
+
   }
   runtime {
     docker: docker
@@ -55,4 +86,4 @@ task stats_n_coverage {
     preemptible: 0
     maxRetries: 3
   }
-}
+}
@@ -6,14 +6,16 @@ task fastq_scan_pe {
     File read2
     String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
     String read2_name = basename(basename(basename(read2, ".gz"), ".fastq"), ".fq")
-    Int disk_size = 100
-    String docker = "quay.io/biocontainers/fastq-scan:0.4.4--h7d875b9_1"
+    Int disk_size = 50
+    String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3"
     Int memory = 2
-    Int cpu = 2
+    Int cpu = 1
   }
   command <<<
-    # capture date and version
-    date | tee DATE
+    # exit task in case anything fails in one-liners or variables are unset
+    set -euo pipefail
+
+    # capture version
     fastq-scan -v | tee VERSION
 
     # set cat command based on compression
@@ -24,11 +26,21 @@ task fastq_scan_pe {
     fi
 
     # capture forward read stats
+    echo "DEBUG: running fastq-scan on $(basename ~{read1})"
     eval "${cat_reads} ~{read1}" | fastq-scan | tee ~{read1_name}_fastq-scan.json
-    cat ~{read1_name}_fastq-scan.json | jq .qc_stats.read_total | tee READ1_SEQS
+    # using simple redirect so STDOUT is not confusing
+    jq .qc_stats.read_total ~{read1_name}_fastq-scan.json > READ1_SEQS
+    echo "DEBUG: number of reads in $(basename ~{read1}): $(cat READ1_SEQS)"
     read1_seqs=$(cat READ1_SEQS)
+    echo
+
+    # capture reverse read stats
+    echo "DEBUG: running fastq-scan on $(basename ~{read2})"
     eval "${cat_reads} ~{read2}" | fastq-scan | tee ~{read2_name}_fastq-scan.json
-    cat ~{read2_name}_fastq-scan.json | jq .qc_stats.read_total | tee READ2_SEQS
+
+    # using simple redirect so STDOUT is not confusing
+    jq .qc_stats.read_total ~{read2_name}_fastq-scan.json > READ2_SEQS
+    echo "DEBUG: number of reads in $(basename ~{read2}): $(cat READ2_SEQS)"
     read2_seqs=$(cat READ2_SEQS)
 
     # capture number of read pairs
@@ -37,26 +49,27 @@ task fastq_scan_pe {
     else
       read_pairs="Uneven pairs: R1=${read1_seqs}, R2=${read2_seqs}"
     fi
-
-    echo $read_pairs | tee READ_PAIRS
+
+    # use simple redirect so STDOUT is not confusing
+    echo "$read_pairs" > READ_PAIRS
+    echo "DEBUG: number of read pairs: $(cat READ_PAIRS)"
   >>>
   output {
-    File read1_fastq_scan_report = "~{read1_name}_fastq-scan.json"
-    File read2_fastq_scan_report = "~{read2_name}_fastq-scan.json"
+    File read1_fastq_scan_json = "~{read1_name}_fastq-scan.json"
+    File read2_fastq_scan_json = "~{read2_name}_fastq-scan.json"
     Int read1_seq = read_int("READ1_SEQS")
     Int read2_seq = read_int("READ2_SEQS")
     String read_pairs = read_string("READ_PAIRS")
     String version = read_string("VERSION")
-    String pipeline_date = read_string("DATE")
     String fastq_scan_docker = docker
   }
   runtime {
     docker: docker
     memory: memory + " GB"
     cpu: cpu
     disks:  "local-disk " + disk_size + " SSD"
-    disk: disk_size + " GB" # TES
-    preemptible: 0
+    disk: disk_size + " GB"
+    preemptible: 1
     maxRetries: 3
   }
 }
@@ -65,14 +78,16 @@ task fastq_scan_se {
   input {
     File read1
     String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
-    Int disk_size = 100
+    Int disk_size = 50
     Int memory = 2
-    Int cpu = 2
-    String docker = "quay.io/biocontainers/fastq-scan:0.4.4--h7d875b9_1"
+    Int cpu = 1
+    String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3"
   }
   command <<<
-    # capture date and version
-    date | tee DATE
+    # exit task in case anything fails in one-liners or variables are unset
+    set -euo pipefail
+
+    # capture version
     fastq-scan -v | tee VERSION
 
     # set cat command based on compression
@@ -83,23 +98,25 @@ task fastq_scan_se {
     fi
 
     # capture forward read stats
+    echo "DEBUG: running fastq-scan on $(basename ~{read1})"
     eval "${cat_reads} ~{read1}" | fastq-scan | tee ~{read1_name}_fastq-scan.json
-    cat ~{read1_name}_fastq-scan.json | jq .qc_stats.read_total | tee READ1_SEQS
+    # using simple redirect so STDOUT is not confusing
+    jq .qc_stats.read_total ~{read1_name}_fastq-scan.json > READ1_SEQS
+    echo "DEBUG: number of reads in $(basename ~{read1}): $(cat READ1_SEQS)"
   >>>
   output {
-    File fastq_scan_report = "~{read1_name}_fastq-scan.json"
+    File fastq_scan_json = "~{read1_name}_fastq-scan.json"
     Int read1_seq = read_int("READ1_SEQS")
     String version = read_string("VERSION")
-    String pipeline_date = read_string("DATE")
     String fastq_scan_docker = docker
   }
   runtime {
     docker: docker
     memory: memory + " GB"
     cpu: cpu
     disks:  "local-disk " + disk_size + " SSD"
-    disk: disk_size + " GB" # TES
-    preemptible: 0
+    disk: disk_size + " GB"
+    preemptible: 1
     maxRetries: 3
   }
 }
@@ -20,6 +20,9 @@ task check_reads {
     Int cpu = 1
   }
   command <<<
+    # just in case anything fails, throw an error
+    set -euo pipefail
+
     flag="PASS"
 
     # initalize estimated genome length
@@ -34,13 +37,13 @@ task check_reads {
       fi
 
       # check one: number of reads
-      read1_num=`eval "$cat_reads ~{read1}" | awk '{s++}END{print s/4}'`
-      read2_num=`eval "$cat_reads ~{read2}" | awk '{s++}END{print s/4}'`
-      # awk '{s++}END{print s/4' counts the number of lines and divides them by 4
-      # key assumption: in fastq there will be four lines per read
-      # sometimes fastqs do not have 4 lines per read, so this might fail one day
+      read1_num=$($cat_reads ~{read1} | fastq-scan | grep 'read_total' | sed 's/[^0-9]*\([0-9]\+\).*/\1/')
+      read2_num=$($cat_reads ~{read2} | fastq-scan | grep 'read_total' | sed 's/[^0-9]*\([0-9]\+\).*/\1/')
+      echo "DEBUG: Number of reads in R1: ${read1_num}"
+      echo "DEBUG: Number of reads in R2: ${read2_num}"
 
       reads_total=$(expr $read1_num + $read2_num)
+      echo "DEBUG: Number of reads total in R1 and R2: ${reads_total}"
 
       if [ "${reads_total}" -le "~{min_reads}" ]; then
         flag="FAIL; the total number of reads is below the minimum of ~{min_reads}"
@@ -51,13 +54,11 @@ task check_reads {
       # checks two and three: number of basepairs and proportion of sequence
       if [ "${flag}" == "PASS" ]; then
         # count number of basepairs
-        # this only works if the fastq has 4 lines per read, so this might fail one day
-        read1_bp=`eval "${cat_reads} ~{read1}" | paste - - - - | cut -f2 | tr -d '\n' | wc -c`
-        read2_bp=`eval "${cat_reads} ~{read2}" | paste - - - - | cut -f2 | tr -d '\n' | wc -c`
-        # paste - - - - (print 4 consecutive lines in one row, tab delimited)
-        # cut -f2 print only the second column (the second line of the fastq 4-line)
-        # tr -d '\n' removes line endings
-        # wc -c counts characters
+        # using fastq-scan to count the number of basepairs in each fastq
+        read1_bp=$(eval "${cat_reads} ~{read1}" | fastq-scan | grep 'total_bp' | sed 's/[^0-9]*\([0-9]\+\).*/\1/')
+        read2_bp=$(eval "${cat_reads} ~{read2}" | fastq-scan | grep 'total_bp' | sed 's/[^0-9]*\([0-9]\+\).*/\1/')
+        echo "DEBUG: Number of basepairs in R1: $read1_bp"
+        echo "DEBUG: Number of basepairs in R2: $read2_bp"
 
         # set proportion variables for easy comparison
         # removing the , 2) to make these integers instead of floats
@@ -147,7 +148,8 @@ task check_reads {
             flag="FAIL; the estimated coverage (${estimated_coverage}) is less than the minimum of ~{min_coverage}x"
           else
             flag="PASS"
-            echo $estimated_genome_length | tee EST_GENOME_LENGTH
+            echo ${estimated_genome_length} | tee EST_GENOME_LENGTH
+            echo "DEBUG: estimated_genome_length: ${estimated_genome_length}"
           fi 
         fi
       fi 
@@ -190,6 +192,9 @@ task check_reads_se {
     Int cpu = 1
   }
   command <<<
+    # just in case anything fails, throw an error
+    set -euo pipefail
+
     flag="PASS"
 
     # initalize estimated genome length
@@ -203,11 +208,9 @@ task check_reads_se {
         cat_reads="cat"
       fi
 
-      # check one: number of reads
-      read1_num=`eval "$cat_reads ~{read1}" | awk '{s++}END{print s/4}'`
-      # awk '{s++}END{print s/4' counts the number of lines and divides them by 4
-      # key assumption: in fastq there will be four lines per read
-      # sometimes fastqs do not have 4 lines per read, so this might fail one day
+      # check one: number of reads via fastq-scan
+      read1_num=$($cat_reads ~{read1} | fastq-scan | grep 'read_total' | sed 's/[^0-9]*\([0-9]\+\).*/\1/')
+      echo "DEBUG: Number of reads in R1: ${read1_num}"
 
       if [ "${read1_num}" -le "~{min_reads}" ] ; then
         flag="FAIL; the number of reads (${read1_num}) is below the minimum of ~{min_reads}"
@@ -218,12 +221,9 @@ task check_reads_se {
       # checks two and three: number of basepairs and proportion of sequence
       if [ "${flag}" == "PASS" ]; then
         # count number of basepairs
-        # this only works if the fastq has 4 lines per read, so this might fail one day
-        read1_bp=`eval "${cat_reads} ~{read1}" | paste - - - - | cut -f2 | tr -d '\n' | wc -c`
-        # paste - - - - (print 4 consecutive lines in one row, tab delimited)
-        # cut -f2 print only the second column (the second line of the fastq 4-line)
-        # tr -d '\n' removes line endings
-        # wc -c counts characters
+        # using fastq-scan to count the number of basepairs in each fastq
+        read1_bp=$(eval "${cat_reads} ~{read1}" | fastq-scan | grep 'total_bp' | sed 's/[^0-9]*\([0-9]\+\).*/\1/')
+        echo "DEBUG: Number of basepairs in R1: $read1_bp"
 
         if [ "$flag" == "PASS" ] ; then
           if [ "${read1_bp}" -le "~{min_basepairs}" ] ; then
@@ -309,7 +309,8 @@ task check_reads_se {
     fi 
 
     echo $flag | tee FLAG
-    echo $estimated_genome_length | tee EST_GENOME_LENGTH
+    echo ${estimated_genome_length} | tee EST_GENOME_LENGTH
+    echo "DEBUG: estimated_genome_length: ${estimated_genome_length}"
   >>>
   output {
     String read_screen = read_string("FLAG")

@@ -40,9 +40,9 @@ task trimmomatic_pe {
     -threads ~{cpu} \
     ~{read1} ~{read2} \
     -baseout ~{samplename}.fastq.gz \
+    "${CROPPING_VAR}" \
     SLIDINGWINDOW:~{trimmomatic_window_size}:~{trimmomatic_quality_trim_score} \
-    MINLEN:~{trimmomatic_min_length} &> ~{samplename}.trim.stats.txt \
-    "${CROPPING_VAR}"
+    MINLEN:~{trimmomatic_min_length} &> ~{samplename}.trim.stats.txt
 
   >>>
   output {

@@ -0,0 +1,122 @@
+version 1.0
+
+task stxtyper {
+  input {
+    File assembly
+    String samplename
+    Boolean enable_debugging = false # Additional messages are printed and files in $TMPDIR are not removed after running
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/stxtyper:1.0.24"
+    Int disk_size = 50
+    Int cpu = 1
+    Int memory = 4
+  }
+  command <<<
+    # fail task if any commands below fail since there's lots of bash conditionals below (AGH!)
+    set -eo pipefail
+
+    # capture version info
+    stxtyper --version | tee VERSION.txt
+
+    # NOTE: by default stxyper uses $TMPDIR or /tmp, so if we run into issues we may need to adjust in the future. Could potentially use PWD as the TMPDIR.
+    echo "DEBUG: TMPDIR is set to: $TMPDIR"
+
+    echo "DEBUG: running StxTyper now..."
+    # run StxTyper on assembly; may need to add/remove options in the future if they change
+    # NOTE: stxtyper can accept gzipped assemblies, so no need to unzip
+    stxtyper \
+      --nucleotide ~{assembly} \
+      --name ~{samplename} \
+      --output ~{samplename}_stxtyper.tsv \
+      ~{true='--debug' false='' enable_debugging} \
+      --log ~{samplename}_stxtyper.log
+
+    # parse output TSV
+    echo "DEBUG: Parsing StxTyper output TSV..."
+
+    # check for output file with only 1 line (meaning no hits found); exit cleanly if so
+    if [ "$(wc -l < ~{samplename}_stxtyper.tsv)" -eq 1 ]; then
+      echo "No hits found by StxTyper" > stxtyper_hits.txt
+      echo "0" > stxtyper_num_hits.txt
+      echo "DEBUG: No hits found in StxTyper output TSV. Exiting task with exit code 0 now."
+
+      # create empty output files
+      touch stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt  stx_novel_hits.txt
+      # put "none" into all of them so task does not fail
+      echo "None" | tee stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt 
+      exit 0
+    fi
+
+    # check for output file with more than 1 line (meaning hits found); count lines & parse output TSV if so
+    if [ "$(wc -l < ~{samplename}_stxtyper.tsv)" -gt 1 ]; then
+      echo "Hits found by StxTyper. Counting lines & parsing output TSV now..."
+      # count number of lines in output TSV (excluding header)
+      wc -l < ~{samplename}_stxtyper.tsv | awk '{print $1-1}' > stxtyper_num_hits.txt
+      # remove header line
+      sed '1d' ~{samplename}_stxtyper.tsv > ~{samplename}_stxtyper_noheader.tsv
+
+      ##### parse output TSV #####
+      ### complete operons
+      echo "DEBUG: Parsing complete operons..."
+      awk -F'\t' -v OFS=, '$4 == "COMPLETE" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stxtyper_complete_operons.txt
+      # if grep for COMPLETE fails, write "None" to file for output string
+      if [[ "$(grep --silent 'COMPLETE' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]]; then
+        echo "None" > stxtyper_complete_operons.txt
+      fi
+
+      ### complete_novel operons
+      echo "DEBUG: Parsing complete novel hits..."
+      awk -F'\t' -v OFS=, '$4 == "COMPLETE_NOVEL" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stx_novel_hits.txt
+      # if grep for COMPLETE_NOVEL fails, write "None" to file for output string
+      if [ "$(grep --silent 'COMPLETE_NOVEL' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
+        echo "None" > stx_novel_hits.txt
+      fi
+
+      ### partial hits (to any gene in stx operon)
+      echo "DEBUG: Parsing stxtyper partial hits..."
+      # explanation: if "operon" column contains "PARTIAL" (either PARTIAL or PARTIAL_CONTIG_END possible); print either "stx1" or "stx2" or "stx1,stx2"
+      awk -F'\t' -v OFS=, '$4 ~ "PARTIAL.*" {print $3}' ~{samplename}_stxtyper.tsv | sort | uniq | paste -sd, - | tee stxtyper_partial_hits.txt
+      # if no stx partial hits found, write "None" to file for output string
+      if [ "$(grep --silent 'stx' stxtyper_partial_hits.txt; echo $?)" -gt 0 ]; then
+        echo "None" > stxtyper_partial_hits.txt
+      fi
+
+      ### frameshifts or internal stop codons in stx genes
+      echo "DEBUG: Parsing stx frameshifts or internal stop codons..."
+      # explanation: if operon column contains "FRAME_SHIFT" or "INTERNAL_STOP", print the "operon" in a sorted/unique list
+      awk -F'\t' -v OFS=, '$4 == "FRAMESHIFT" || $4 == "INTERNAL_STOP" {print $3}' ~{samplename}_stxtyper.tsv | sort | uniq | paste -sd, - | tee stxtyper_stx_frameshifts_or_internal_stop_hits.txt
+      # if no frameshifts or internal stop codons found, write "None" to file for output string
+      if [ "$(grep --silent -E 'FRAMESHIFT|INTERNAL_STOP' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
+        echo "None" > stxtyper_stx_frameshifts_or_internal_stop_hits.txt
+      fi
+
+      echo "DEBUG: generating stx_type_all string output now..."
+      # sort and uniq so there are no duplicates; then paste into a single comma-separated line with commas
+      # sed is to remove any instances of "None" from the output
+      cat stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt | sed '/None/d' | sort | uniq | paste -sd, - > stxtyper_all_hits.txt
+
+    fi
+    echo "DEBUG: Finished parsing StxTyper output TSV."
+  >>>
+  output {
+    File stxtyper_report = "~{samplename}_stxtyper.tsv"
+    File stxtyper_log = "~{samplename}_stxtyper.log"
+    String stxtyper_docker = docker
+    String stxtyper_version = read_string("VERSION.txt")
+    # outputs parsed from stxtyper output TSV
+    Int stxtyper_num_hits = read_int("stxtyper_num_hits.txt")
+    String stxtyper_all_hits = read_string("stxtyper_all_hits.txt")
+    String stxtyper_complete_operon_hits = read_string("stxtyper_complete_operons.txt")
+    String stxtyper_partial_hits = read_string("stxtyper_partial_hits.txt")
+    String stxtyper_frameshifts_or_internal_stop_hits =  read_string("stxtyper_stx_frameshifts_or_internal_stop_hits.txt")
+    String stxtyper_novel_hits = read_string("stx_novel_hits.txt")
+  }
+  runtime {
+    docker: "~{docker}"
+    memory: "~{memory} GB"
+    cpu: cpu
+    disks: "local-disk " + disk_size + " SSD"
+    disk: disk_size + " GB"
+    preemptible: 1 # does not take long (usually <3 min) to run stxtyper on 1 genome, preemptible is fine
+    maxRetries: 3
+  }
+}
@@ -9,29 +9,30 @@ task tbp_parser {
 
     String? sequencing_method
     String? operator
+
     Int? min_depth # default 10
-    Int? coverage_threshold # default 100 (--min_percent_coverage)
-    File? coverage_regions_bed
     Float? min_frequency # default 0.1
     Int? min_read_support # default 10
+    
+    Int? coverage_threshold # default 100 (--min_percent_coverage)
+    File? coverage_regions_bed
 
-    Boolean tbp_parser_debug = false
-
     Boolean add_cycloserine_lims = false
-
+    Boolean tbp_parser_debug = true
     Boolean tngs_data = false    
+
     Float? rrs_frequency # default 0.1
     Int? rrs_read_support # default 10
     Float? rrl_frequency # default 0.1
     Int? rrl_read_support # default 10
     Float? rpob449_frequency # default 0.1
     Float? etha237_frequency # default 0.1
     File? expert_rule_regions_bed
-
-    String docker = "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0"
-    Int disk_size = 100
-    Int memory = 4
+
     Int cpu = 1
+    Int disk_size = 100   
+    String docker = "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.2.2"
+    Int memory = 4
   }
   command <<<
     # get version
@@ -42,10 +43,10 @@ task tbp_parser {
       ~{"--sequencing_method " + sequencing_method} \
       ~{"--operator " + operator} \
       ~{"--min_depth " + min_depth} \
-      ~{"--min_percent_coverage " + coverage_threshold} \
-      ~{"--coverage_regions " + coverage_regions_bed} \
       ~{"--min_frequency " + min_frequency} \
       ~{"--min_read_support " + min_read_support} \
+      ~{"--min_percent_coverage " + coverage_threshold} \
+      ~{"--coverage_regions " + coverage_regions_bed} \
       ~{"--tngs_expert_regions " + expert_rule_regions_bed} \
       ~{"--rrs_frequency " + rrs_frequency} \
       ~{"--rrs_read_support " + rrs_read_support} \
@@ -63,7 +64,7 @@ task tbp_parser {
     echo 0.0 > AVG_DEPTH
 
     # get genome percent coverage for the entire reference genome length over min_depth
-    genome=$(samtools depth -J ~{tbprofiler_bam} | awk -F "\t" '{if ($3 >= ~{min_depth}) print;}' | wc -l )
+    genome=$(samtools depth -J ~{tbprofiler_bam} | awk -F "\t" -v min_depth=~{min_depth} '{if ($3 >= min_depth) print;}' | wc -l )
     python3 -c "print ( ($genome / 4411532 ) * 100 )" | tee GENOME_PC
 
     # get genome average depth

@@ -5,84 +5,74 @@ task tbprofiler {
     File read1
     File? read2
     String samplename
-
-    # logic
     Boolean ont_data = false
-    Boolean tbprofiler_run_custom_db = false
-    File? tbprofiler_custom_db
-    # minimum thresholds
-    Int cov_frac_threshold = 1
-    Float min_af = 0.1
-    Float min_af_pred = 0.1
-    Int min_depth = 10
-    # tool options within tbprofiler
+
     String mapper = "bwa"
-    String variant_caller = "freebayes"
+    String variant_caller = "gatk"
     String? variant_calling_params
-    # runtime
+
+    String? additional_parameters # for tbprofiler
+    
+    Int min_depth = 10
+    Float min_af = 0.1
+
+    File? tbprofiler_custom_db
+    Boolean tbprofiler_run_cdph_db = false
+    Boolean tbprofiler_run_custom_db = false
+
     Int cpu = 8     
     Int disk_size = 100
-    String docker = "us-docker.pkg.dev/general-theiagen/staphb/tbprofiler:4.4.2"
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/tbprofiler:6.4.1"
     Int memory = 16
   }
   command <<<
-    # Print and save date
-    date | tee DATE
-
     # Print and save version
     tb-profiler version > VERSION && sed -i -e 's/TBProfiler version //' VERSION && sed -n -i '$p' VERSION
 
     # check if file is non existant or non empty
-    if [ -z "~{read2}" ] || [ ! -s "~{read2}" ] ; then
+    if [ -z "~{read2}" ] || [ ! -s "~{read2}" ]; then
       INPUT_READS="-1 ~{read1}"
     else
       INPUT_READS="-1 ~{read1} -2 ~{read2}"
     fi
-
-    if [ "~{ont_data}" = true ]; then
-      mode="--platform nanopore"
-      export ont_data="true"
-    else
-      export ont_data="false"
-    fi
 
     # check if new database file is provided and not empty
-    if [ "~{tbprofiler_run_custom_db}" = true ] ; then
-      echo "Found new database file ~{tbprofiler_custom_db}"
-      prefix=$(basename "~{tbprofiler_custom_db}" | sed 's/\.tar\.gz$//')
-      echo "New database will be created with prefix $prefix"
-
-      echo "Inflating the new database..."
-      tar xfv ~{tbprofiler_custom_db}
+    if ~{tbprofiler_run_custom_db}; then
+      if [ ! -s ~{tbprofiler_custom_db} ]; then
+        echo "Custom database file is empty"
+        TBDB=""
+      else
+        echo "Found new database file ~{tbprofiler_custom_db}"
+        prefix=$(basename "~{tbprofiler_custom_db}" | sed 's/\.tar\.gz$//')
+        tar xfv ~{tbprofiler_custom_db}
+
+        tb-profiler load_library ./"$prefix"/"$prefix"
 
-      tb-profiler load_library ./"$prefix"/"$prefix"
-
-      TBDB="--db $prefix"
-    else
-      TBDB=""
+        TBDB="--db $prefix"
+      fi
+    elif ~{tbprofiler_run_cdph_db}; then
+      tb-profiler update_tbdb --branch CaliforniaDPH
+      TBDB="--db CaliforniaDPH"
     fi
 
     # Run tb-profiler on the input reads with samplename prefix
     tb-profiler profile \
-      ${mode} \
       ${INPUT_READS} \
       --prefix ~{samplename} \
       --mapper ~{mapper} \
       --caller ~{variant_caller} \
       --calling_params "~{variant_calling_params}" \
-      --min_depth ~{min_depth} \
+      --depth ~{min_depth} \
       --af ~{min_af} \
-      --reporting_af ~{min_af_pred} \
-      --coverage_fraction_threshold ~{cov_frac_threshold} \
+      --threads ~{cpu} \
       --csv --txt \
-      $TBDB
+      ~{true="--platform nanopore" false="" ont_data} \
+      ~{additional_parameters} \
+      ${TBDB}
 
     # Collate results
     tb-profiler collate --prefix ~{samplename}
 
-    # touch optional output files because wdl
-    touch GENE_NAME LOCUS_TAG VARIANT_SUBSTITUTIONS OUTPUT_SEQ_METHOD_TYPE
-
     # merge all vcf files if multiple are present
     bcftools index ./vcf/*bcf
     bcftools index ./vcf/*gz
@@ -97,35 +87,32 @@ task tbprofiler {
       tsv_reader=csv.reader(tsv_file, delimiter="\t")
       tsv_data=list(tsv_reader)
       tsv_dict=dict(zip(tsv_data[0], tsv_data[1]))
-      with open ("MAIN_LINEAGE", 'wt') as Main_Lineage:
-        main_lin=tsv_dict['main_lineage']
-        Main_Lineage.write(main_lin)
-      with open ("SUB_LINEAGE", 'wt') as Sub_Lineage:
-        sub_lin=tsv_dict['sub_lineage']
-        Sub_Lineage.write(sub_lin)
-      with open ("DR_TYPE", 'wt') as DR_Type:
-        dr_type=tsv_dict['DR_type']
-        DR_Type.write(dr_type)
-      with open ("NUM_DR_VARIANTS", 'wt') as Num_DR_Variants:
-        num_dr_vars=tsv_dict['num_dr_variants']
-        Num_DR_Variants.write(num_dr_vars)
-      with open ("NUM_OTHER_VARIANTS", 'wt') as Num_Other_Variants:
-        num_other_vars=tsv_dict['num_other_variants']
-        Num_Other_Variants.write(num_other_vars)
-      with open ("RESISTANCE_GENES", 'wt') as Resistance_Genes:
-        res_genes_list=['rifampicin', 'isoniazid', 'pyrazinamide', 'ethambutol', 'streptomycin', 'fluoroquinolones', 'moxifloxacin', 'ofloxacin', 'levofloxacin', 'ciprofloxacin', 'aminoglycosides', 'amikacin', 'kanamycin', 'capreomycin', 'ethionamide', 'para-aminosalicylic_acid', 'cycloserine', 'linezolid', 'bedaquiline', 'clofazimine', 'delamanid']
+
+      with open ("MAIN_LINEAGE", 'wt') as main_lineage:
+        main_lineage.write(tsv_dict['main_lineage'])
+      with open ("SUB_LINEAGE", 'wt') as sublineage:
+        sublineage.write(tsv_dict['sub_lineage'])
+        
+      with open ("DR_TYPE", 'wt') as dr_type:
+        dr_type.write(tsv_dict['drtype'])
+      with open ("NUM_DR_VARIANTS", 'wt') as num_dr_variants:
+        num_dr_variants.write(tsv_dict['num_dr_variants'])
+      with open ("NUM_OTHER_VARIANTS", 'wt') as num_other_variants:
+        num_other_variants.write(tsv_dict['num_other_variants'])
+
+      with open ("RESISTANCE_GENES", 'wt') as resistance_genes:
+        res_genes_list=['rifampicin', 'isoniazid', 'ethambutol', 'pyrazinamide', 'moxifloxacin', 'levofloxacin', 'bedaquiline', 'delamanid', 'pretomanid', 'linezolid', 'streptomycin', 'amikacin', 'kanamycin', 'capreomycin', 'clofazimine', 'ethionamide', 'para-aminosalicylic_acid', 'cycloserine']
         res_genes=[]
         for i in res_genes_list:
           if tsv_dict[i] != '-':
             res_genes.append(tsv_dict[i])
         res_genes_string=';'.join(res_genes)
-        Resistance_Genes.write(res_genes_string)
-      with open ("MEDIAN_COVERAGE", 'wt') as Median_Coverage:
-        median_coverage=tsv_dict['median_coverage']
-        Median_Coverage.write(median_coverage)
-      with open ("PCT_READS_MAPPED", 'wt') as Pct_Reads_Mapped:
-        pct_reads_mapped=tsv_dict['pct_reads_mapped']
-        Pct_Reads_Mapped.write(pct_reads_mapped)
+        resistance_genes.write(res_genes_string)
+
+      with open ("MEDIAN_DEPTH", 'wt') as median_depth:
+        median_depth.write(tsv_dict['target_median_depth'])
+      with open ("PCT_READS_MAPPED", 'wt') as pct_reads_mapped:
+        pct_reads_mapped.write(tsv_dict['pct_reads_mapped'])
     CODE
   >>>
   output {
@@ -134,15 +121,15 @@ task tbprofiler {
     File tbprofiler_output_json = "./results/~{samplename}.results.json"
     File tbprofiler_output_bam = "./bam/~{samplename}.bam"
     File tbprofiler_output_bai = "./bam/~{samplename}.bam.bai"
-    File tbprofiler_output_vcf = "./vcf/~{samplename}.targets.csq.merged.vcf"
+    File? tbprofiler_output_vcf = "./vcf/~{samplename}.targets.csq.merged.vcf"
     String version = read_string("VERSION")
     String tbprofiler_main_lineage = read_string("MAIN_LINEAGE")
     String tbprofiler_sub_lineage = read_string("SUB_LINEAGE")
     String tbprofiler_dr_type = read_string("DR_TYPE")
     String tbprofiler_num_dr_variants = read_string("NUM_DR_VARIANTS")
     String tbprofiler_num_other_variants = read_string("NUM_OTHER_VARIANTS")
     String tbprofiler_resistance_genes = read_string("RESISTANCE_GENES")
-    Int tbprofiler_median_coverage = read_int("MEDIAN_COVERAGE")
+    Float tbprofiler_median_depth = read_float("MEDIAN_DEPTH")
     Float tbprofiler_pct_reads_mapped = read_float("PCT_READS_MAPPED")
   }
   runtime {

@@ -9,7 +9,7 @@ task version_capture {
     volatile: true
   }
   command {
-    PHB_Version="PHB v2.2.1"
+    PHB_Version="PHB v2.3.0"
     ~{default='' 'export TZ=' + timezone}
     date +"%Y-%m-%d" > TODAY
     echo "$PHB_Version" > PHB_VERSION

@@ -5,48 +5,69 @@ task kraken2_theiacov {
     File read1
     File? read2
     String samplename
-    String kraken2_db = "/kraken2-db"
+    File kraken2_db = "gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz"
     Int cpu = 4
     Int memory = 8
     String? target_organism
     Int disk_size = 100
-    String docker_image = "us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.0.8-beta_hv"
+    String docker_image = "us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db"
   }
   command <<<
     # date and version control
     date | tee DATE
     kraken2 --version | head -n1 | tee VERSION
     num_reads=$(ls *fastq.gz 2> /dev/nul | wc -l)
+
+    # Decompress the Kraken2 database
+    mkdir db
+    tar -C ./db/ -xzvf ~{kraken2_db} 
+
     if ! [ -z ~{read2} ]; then
       mode="--paired"
     fi
     echo $mode
-    kraken2 $mode \
+
+     # determine if reads are compressed
+    if [[ ~{read1} == *.gz ]]; then
+      echo "Reads are compressed..."
+      compressed="--gzip-compressed"
+    fi
+    echo $compressed
+
+    # Run Kraken2
+    kraken2 $mode $compressed \
       --threads ~{cpu} \
-      --db ~{kraken2_db} \
+      --db ./db/ \
       ~{read1} ~{read2} \
       --report ~{samplename}_kraken2_report.txt \
       --output ~{samplename}.classifiedreads.txt
 
     # Compress and cleanup
     gzip ~{samplename}.classifiedreads.txt
 
+    # capture human percentage
     percentage_human=$(grep "Homo sapiens" ~{samplename}_kraken2_report.txt | cut -f 1)
-     # | tee PERCENT_HUMAN
-    percentage_sc2=$(grep "Severe acute respiratory syndrome coronavirus 2" ~{samplename}_kraken2_report.txt | cut -f1 )
-     # | tee PERCENT_COV
     if [ -z "$percentage_human" ] ; then percentage_human="0" ; fi
-    if [ -z "$percentage_sc2" ] ; then percentage_sc2="0" ; fi
     echo $percentage_human | tee PERCENT_HUMAN
-    echo $percentage_sc2 | tee PERCENT_SC2
-    # capture target org percentage 
+
+    # capture target org percentage
     if [ ! -z "~{target_organism}" ]; then
       echo "Target org designated: ~{target_organism}"
-      percent_target_organism=$(grep "~{target_organism}" ~{samplename}_kraken2_report.txt | cut -f1 | head -n1 )
-      if [ -z "$percent_target_organism" ] ; then percent_target_organism="0" ; fi
-    else 
+      # if target organisms is sc2, report it in a special legacy column called PERCENT_SC2
+      if [[ "~{target_organism}" == "Severe acute respiratory syndrome coronavirus 2" ]]; then
+        percentage_sc2=$(grep "Severe acute respiratory syndrome coronavirus 2" ~{samplename}_kraken2_report.txt  | cut -f1 )
+        percent_target_organism=""
+        if [ -z "$percentage_sc2" ] ; then percentage_sc2="0" ; fi
+      else
+        percentage_sc2="" 
+        percent_target_organism=$(grep "~{target_organism}" ~{samplename}_kraken2_report.txt  | cut -f1 | head -n1 )
+        if [ -z "$percent_target_organism" ] ; then percent_target_organism="0" ; fi
+      fi
+    else
       percent_target_organism=""
+      percentage_sc2=""
     fi
+    echo $percentage_sc2 | tee PERCENT_SC2
     echo $percent_target_organism | tee PERCENT_TARGET_ORGANISM
 
   >>>
@@ -55,7 +76,7 @@ task kraken2_theiacov {
     String version = read_string("VERSION")
     File kraken_report = "~{samplename}_kraken2_report.txt"
     Float percent_human = read_float("PERCENT_HUMAN")
-    Float percent_sc2 = read_float("PERCENT_SC2")
+    String percent_sc2 = read_string("PERCENT_SC2")
     String percent_target_organism = read_string("PERCENT_TARGET_ORGANISM")
     String? kraken_target_organism = target_organism
     File kraken2_classified_report = "~{samplename}.classifiedreads.txt.gz" 
@@ -205,30 +226,37 @@ task kraken2_parse_classified {
     CODE
 
     # theiacov parsing blocks - percent human, sc2 and target organism
+    # capture human percentage
     percentage_human=$(grep "Homo sapiens" ~{samplename}.report_parsed.txt | cut -f 1)
-    percentage_sc2=$(grep "Severe acute respiratory syndrome coronavirus 2" ~{samplename}.report_parsed.txt | cut -f1 )
-
     if [ -z "$percentage_human" ] ; then percentage_human="0" ; fi
-    if [ -z "$percentage_sc2" ] ; then percentage_sc2="0" ; fi
     echo $percentage_human | tee PERCENT_HUMAN
-    echo $percentage_sc2 | tee PERCENT_SC2
 
-    # capture target org percentage 
-    if [ ! -z "~{target_organism}" ]; then 
+    # capture target org percentage
+    if [ ! -z "~{target_organism}" ]; then
       echo "Target org designated: ~{target_organism}"
-      percent_target_organism=$(grep "~{target_organism}" ~{samplename}.report_parsed.txt | cut -f1 | head -n1 )
-      if [ -z "$percent_target_organism" ] ; then percent_target_organism="0" ; fi
-    else 
+      # if target organisms is sc2, report it in a special legacy column called PERCENT_SC2
+      if [[ "~{target_organism}" == "Severe acute respiratory syndrome coronavirus 2" ]]; then
+        percentage_sc2=$(grep "Severe acute respiratory syndrome coronavirus 2" ~{samplename}.report_parsed.txt  | cut -f1 )
+        percent_target_organism=""
+        if [ -z "$percentage_sc2" ] ; then percentage_sc2="0" ; fi
+      else
+        percentage_sc2="" 
+        percent_target_organism=$(grep "~{target_organism}" ~{samplename}.report_parsed.txt  | cut -f1 | head -n1 )
+        if [ -z "$percent_target_organism" ] ; then percent_target_organism="0" ; fi
+      fi
+    else
       percent_target_organism=""
+      percentage_sc2=""
     fi
-    echo $percent_target_organism | tee PERCENT_TARGET_ORG
+    echo $percentage_sc2 | tee PERCENT_SC2
+    echo $percent_target_organism | tee PERCENT_TARGET_ORGANISM
     
   >>>
   output {
     File kraken_report = "~{samplename}.report_parsed.txt"
     Float percent_human = read_float("PERCENT_HUMAN")
-    Float percent_sc2 = read_float("PERCENT_SC2")
-    String percent_target_organism = read_string("PERCENT_TARGET_ORG")
+    String percent_sc2 = read_string("PERCENT_SC2")
+    String percent_target_organism = read_string("PERCENT_TARGET_ORGANISM")
     String? kraken_target_organism = target_organism
   }
   runtime {

@@ -5,7 +5,8 @@ task freyja_one_sample {
     File primer_trimmed_bam
     String samplename
     File reference_genome
-    File? freyja_usher_barcodes
+    String? freyja_pathogen
+    File? freyja_barcodes
     File? freyja_lineage_metadata
     Float? eps
     Float? adapt
@@ -16,7 +17,7 @@ task freyja_one_sample {
     Int? depth_cutoff
     Int memory = 8
     Int cpu = 2
-    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
     Int disk_size = 100
   }
   command <<<
@@ -44,9 +45,9 @@ task freyja_one_sample {
       freyja_metadata_version="freyja update: $(date +"%Y-%m-%d")"
   else
     # configure barcode    
-    if [[ ! -z "~{freyja_usher_barcodes}" ]]; then
-      echo "User freyja usher barcodes identified; ~{freyja_usher_barcodes} will be utilized for freyja demixing"
-      freyja_usher_barcode_version=$(basename -- "~{freyja_usher_barcodes}")
+    if [[ ! -z "~{freyja_barcodes}" ]]; then
+      echo "User freyja usher barcodes identified; ~{freyja_barcodes} will be utilized for freyja demixing"
+      freyja_usher_barcode_version=$(basename -- "~{freyja_barcodes}")
     else
       freyja_usher_barcode_version="unmodified from freyja container: ~{docker}"  
     fi
@@ -74,9 +75,10 @@ task freyja_one_sample {
   # Calculate Boostraps, if specified
   if ~{bootstrap}; then
     freyja boot \
+    ~{"--pathogen" + freyja_pathogen} \
     ~{"--eps " + eps} \
     ~{"--meta " + freyja_lineage_metadata} \
-    ~{"--barcodes " + freyja_usher_barcodes} \
+    ~{"--barcodes " + freyja_barcodes} \
     ~{"--depthcutoff " + depth_cutoff} \
     ~{"--nb " + number_bootstraps } \
     ~{true='--confirmedonly' false='' confirmed_only} \
@@ -91,7 +93,7 @@ task freyja_one_sample {
   freyja demix \
     ~{'--eps ' + eps} \
     ~{'--meta ' + freyja_lineage_metadata} \
-    ~{'--barcodes ' + freyja_usher_barcodes} \
+    ~{'--barcodes ' + freyja_barcodes} \
     ~{'--depthcutoff ' + depth_cutoff} \
     ~{true='--confirmedonly' false='' confirmed_only} \
     ~{'--adapt ' + adapt} \
@@ -144,7 +146,7 @@ task freyja_one_sample {
     File? freyja_bootstrap_summary = "~{samplename}_summarized.csv"
     File? freyja_bootstrap_summary_pdf = "~{samplename}_summarized.pdf"
     # capture barcode file - first is user supplied, second appears if the user did not supply a barcode file
-    File freyja_usher_barcode_file = select_first([freyja_usher_barcodes, "usher_barcodes.feather"])
+    File freyja_barcode_file = select_first([freyja_barcodes, "usher_barcodes.feather"])
     File freyja_lineage_metadata_file = select_first([freyja_lineage_metadata, "curated_lineages.json"])
     String freyja_barcode_version = read_string("FREYJA_BARCODES")
     String freyja_metadata_version = read_string("FREYJA_METADATA")

@@ -13,7 +13,7 @@ task freyja_dashboard_task {
     Boolean scale_by_viral_load = false
     String freyja_dashboard_title
     File? dashboard_intro_text
-    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
     Int disk_size = 100
     Int memory = 4
     Int cpu = 2

@@ -10,7 +10,7 @@ task freyja_plot_task {
     String plot_time_interval="MS"
     Int plot_day_window=14
     String freyja_plot_name
-    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
     Int disk_size = 100
     Int mincov = 60
     Int memory = 4

@@ -2,7 +2,7 @@ version 1.0
 
 task freyja_update_refs {
   input {
-    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.1-07_02_2024-01-27-2024-07-22"
+    String docker = "us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.2-11_30_2024-02-00-2024-12-02"
     Int disk_size = 100
     Int memory = 16
     Int cpu = 4

@@ -35,6 +35,10 @@ task export_taxon_tables {
     Int? num_reads_raw2
     String? num_reads_raw_pairs
     String? fastq_scan_version
+    File? fastq_scan_raw1_json
+    File? fastq_scan_raw2_json
+    File? fastq_scan_clean1_json
+    File? fastq_scan_clean2_json
     Int? num_reads_clean1
     Int? num_reads_clean2
     String? num_reads_clean_pairs
@@ -390,7 +394,8 @@ task export_taxon_tables {
     volatile: true
   }
   command <<<
-
+    set -euo pipefail
+
     # capture taxon and corresponding table names from input taxon_tables
     taxon_array=($(cut -f1 ~{taxon_tables} | tail +2))
     echo "Taxon array: ${taxon_array[*]}"
@@ -446,6 +451,10 @@ task export_taxon_tables {
       "num_reads_raw2": "~{num_reads_raw2}",
       "num_reads_raw_pairs": "~{num_reads_raw_pairs}",
       "fastq_scan_version": "~{fastq_scan_version}",
+      "fastq_scan_raw1_json": "~{fastq_scan_raw1_json}",
+      "fastq_scan_raw2_json": "~{fastq_scan_raw2_json}",
+      "fastq_scan_clean1_json": "~{fastq_scan_clean1_json}",
+      "fastq_scan_clean2_json": "~{fastq_scan_clean2_json}",
       "num_reads_clean1": "~{num_reads_clean1}",
       "num_reads_clean2": "~{num_reads_clean2}",
       "num_reads_clean_pairs": "~{num_reads_clean_pairs}",
@@ -778,7 +787,7 @@ task export_taxon_tables {
       "agrvate_version": "~{agrvate_version}",
       "agrvate_docker": "~{agrvate_docker}",
       "srst2_vibrio_detailed_tsv": "~{srst2_vibrio_detailed_tsv}",
-      "srst2_vibrio_version": "~{srst2_vibrio_version}",~
+      "srst2_vibrio_version": "~{srst2_vibrio_version}",
       "srst2_vibrio_docker": "~{srst2_vibrio_docker}",
       "srst2_vibrio_database": "~{srst2_vibrio_database}",
       "srst2_vibrio_ctxA": "~{srst2_vibrio_ctxA}",

@@ -12,11 +12,14 @@ task download_terra_table {
     String terra_workspace_name
     String terra_project_name
     Int disk_size = 10
-    Int memory = 1
+    Int memory = 2
     Int cpu = 1
     String docker = "us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21"
   }
   command <<<
+    # set -euo pipefail to avoid silent failure
+    set -euo pipefail
+
     python3 /scripts/export_large_tsv/export_large_tsv.py --project ~{terra_project_name} --workspace ~{terra_workspace_name} --entity_type ~{terra_table_name} --tsv_filename "~{terra_table_name}.tsv"
   >>>
   output {
@@ -29,5 +32,7 @@ task download_terra_table {
     disks: "local-disk " + disk_size + " HDD"
     disk: disk_size + " GB"
     dx_instance_type: "mem1_ssd1_v2_x2"
+    preemptible: 0 # this task may take a long time and shouldn't be preempted
+    maxRetries: 3
   }
 }
@@ -18,6 +18,7 @@ task export_two_tsvs {
     volatile: true
   }
   command <<<
+    set -euo pipefail
     python3 /scripts/export_large_tsv/export_large_tsv.py --project ~{terra_project1} --workspace ~{terra_workspace1} --entity_type ~{datatable1} --tsv_filename "~{datatable1}_table1.tsv"
 
     # check if second project is provided; if not, use first

@@ -0,0 +1,62 @@
+version 1.0
+
+task fetch_srr_accession {
+  input {
+    String sample_accession 
+    String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0"
+    Int disk_size = 10
+    Int cpu = 2
+    Int memory = 8
+  }
+  meta {
+    volatile: true
+  }
+  command <<< 
+    set -euo pipefail
+
+    # Output the current date and fastq-dl version for debugging
+    date -u | tee DATE
+    fastq-dl --version | tee VERSION
+
+    echo "Fetching metadata for accession: ~{sample_accession}"
+
+    # Run fastq-dl and capture stderr
+    fastq-dl --accession ~{sample_accession} --only-download-metadata -m 2 --verbose 2> stderr.log || true
+
+    # Handle whether the ID/accession is valid and contains SRR metadata based on stderr
+    if grep -q "No results found for" stderr.log; then
+        echo "No SRR accession found" > srr_accession.txt
+        echo "No SRR accession found for accession: ~{sample_accession}"
+    elif grep -q "received an empty response" stderr.log; then
+        echo "No SRR accession found" > srr_accession.txt
+        echo "No SRR accession found for accession: ~{sample_accession}"
+    elif grep -q "is not a Study, Sample, Experiment, or Run accession" stderr.log; then
+        echo "Invalid accession: ~{sample_accession}" >&2
+        exit 1
+    elif [[ ! -f fastq-run-info.tsv ]]; then
+        echo "No metadata file found for accession: ~{sample_accession}" >&2
+        exit 1
+    else
+        # Extract SRR accessions from the TSV file if it exists
+        SRR_accessions=$(awk -F'\t' 'NR>1 {print $1}' fastq-run-info.tsv | paste -sd ',' -)
+        if [[ -z "${SRR_accessions}" ]]; then
+            echo "No SRR accession found" > srr_accession.txt
+        else
+            echo "Extracted SRR accessions: ${SRR_accessions}"
+            echo "${SRR_accessions}" > srr_accession.txt
+        fi
+    fi
+  >>>
+  output {
+    String srr_accession = read_string("srr_accession.txt")
+    String fastq_dl_version = read_string("VERSION")
+  }
+  runtime {
+    docker: docker
+    memory: "~{memory} GB"
+    cpu: cpu
+    disks: "local-disk " + disk_size + " SSD"
+    disk: disk_size + " GB"
+    preemptible: 1
+  }
+}
@@ -23,6 +23,8 @@ task summarize_data {
     volatile: true
   }
   command <<<   
+    set -euo pipefail
+
     # when running on terra, comment out all input_table mentions
     python3 /scripts/export_large_tsv/export_large_tsv.py --project "~{terra_project}" --workspace "~{terra_workspace}" --entity_type ~{terra_table} --tsv_filename ~{terra_table}-data.tsv 
 

@@ -28,6 +28,8 @@ task sm_theiacov_fasta_wrangling { # the sm stands for supermassive
     Int memory = 4
   }
   command <<<
+    set -euo pipefail
+
     # check if nextclade json file exists
     if [ -f ~{nextclade_json} ]; then
       # this line splits into individual json files

@@ -146,6 +146,10 @@ task create_terra_table {
     done <filelist-fullpath.txt
 
     echo "DEBUG: terra table created, now beginning upload"
+
+    # set error handling to exit if the subsequent import_large_tsv.py task fails
+    set -euo pipefail
+
     python3 /scripts/import_large_tsv/import_large_tsv.py --project "~{terra_project}" --workspace "~{terra_workspace}" --tsv terra_table_to_upload.tsv
   >>>
   output {

@@ -0,0 +1,54 @@
+version 1.0
+
+task cat_lanes {
+  input {
+    String samplename
+
+    File read1_lane1
+    File read1_lane2
+    File? read1_lane3
+    File? read1_lane4
+
+    File? read2_lane1
+    File? read2_lane2
+    File? read2_lane3
+    File? read2_lane4
+
+    Int cpu = 2
+    Int disk_size = 50
+    String docker = "us-docker.pkg.dev/general-theiagen/theiagen/utility:1.2"
+    Int memory = 4
+  }
+  meta {
+    volatile: true
+  }
+  command <<<
+    # exit task if anything throws an error (important for proper gzip format)
+    set -euo pipefail
+
+    exists() { [[ -f $1 ]]; }
+
+    set -euo pipefail
+
+    cat ~{read1_lane1} ~{read1_lane2} ~{read1_lane3} ~{read1_lane4} > "~{samplename}_merged_R1.fastq.gz"
+
+    if exists ~{read2_lane1} ; then
+      cat ~{read2_lane1} ~{read2_lane2} ~{read2_lane3} ~{read2_lane4} > "~{samplename}_merged_R2.fastq.gz"
+    fi
+
+    # ensure newly merged FASTQs are valid gzipped format
+    gzip -t *merged*.gz
+  >>>
+  output {
+    File read1_concatenated = "~{samplename}_merged_R1.fastq.gz"
+    File? read2_concatenated = "~{samplename}_merged_R2.fastq.gz"
+  }
+  runtime {
+    docker: "~{docker}"
+    memory: memory + " GB"
+    cpu: cpu
+    disks: "local-disk " + disk_size + " SSD"
+    disk: disk_size + " GB"
+    preemptible: 1
+  }
+}
@@ -14,6 +14,8 @@ task transfer_files {
     volatile: true
   }
   command <<<
+    set -euo pipefail
+
     file_path_array="~{sep=' ' files_to_transfer}"
 
     gsutil -m cp -n ${file_path_array[@]} ~{target_bucket}

@@ -23,12 +23,15 @@ task mercury {
     Int cpu = 2
     Int disk_size = 100
     Int memory = 8
-    String docker = "us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.8"
+    String docker = "us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.9"
   }
   meta {
     volatile: true
   }
   command <<<
+    #set -euo pipefail to avoid silent failure
+    set -euo pipefail
+
     python3 /mercury/mercury/mercury.py -v | tee VERSION
 
     python3 /mercury/mercury/mercury.py \

@@ -23,6 +23,8 @@ task prune_table {
     volatile: true
   }
   command <<<
+    set -euo pipefail
+
     # when running on terra, comment out all input_table mentions
     python3 /scripts/export_large_tsv/export_large_tsv.py --project "~{project_name}" --workspace "~{workspace_name}" --entity_type ~{table_name} --tsv_filename ~{table_name}-data.tsv
 
@@ -54,7 +56,7 @@ task prune_table {
 
     # read export table into pandas
     tablename = "~{table_name}-data.tsv"
-    table = pd.read_csv(tablename, delimiter='\t', header=0, dtype={"~{table_name}_id": 'str'}) # ensure sample_id is always a string)
+    table = pd.read_csv(tablename, delimiter='\t', header=0, dtype={"~{table_name}_id": 'str', "collection_date": 'str'}) # ensure sample_id is always a string)
 
     # extract the samples for upload from the entire table
     table = table[table["~{table_name}_id"].isin("~{sep='*' sample_names}".split("*"))]

@@ -2,7 +2,6 @@ name: pytest-env-CI
 channels:
   - conda-forge
   - bioconda
-  - defaults
 dependencies:
   - python >=3.7
   - cromwell=86

@@ -3,5 +3,7 @@
     "theiacov_clearlabs.read1": "tests/data/theiacov/fastqs/clearlabs/clearlabs.fastq.gz",
     "theiacov_clearlabs.primer_bed": "tests/data/theiacov/primers/artic-v3.primers.bed",
     "theiacov_clearlabs.reference_genome": "tests/data/theiacov/reference/MN908947.fasta",
-    "theiacov_clearlabs.organism_parameters.gene_locations_bed_file": "tests/inputs/sc2_gene_locations.bed"
+    "theiacov_clearlabs.organism_parameters.gene_locations_bed_file": "tests/inputs/sc2_gene_locations.bed",
+    "theiacov_clearlabs.kraken2_raw.kraken2_db": "tests/data/theiacov/databases/github_kraken2_test_db.tar.gz",
+    "theiacov_clearlabs.kraken2_dehosted.kraken2_db": "tests/data/theiacov/databases/github_kraken2_test_db.tar.gz"
 }
@@ -5,5 +5,6 @@
     "theiacov_illumina_pe.primer_bed": "tests/data/theiacov/primers/artic-v3.primers.bed",
     "theiacov_illumina_pe.reference_genome": "tests/data/theiacov/reference/MN908947.fasta",
     "theiacov_illumina_pe.reference_gff": "tests/inputs/completely-empty-for-test.txt",
-    "theiacov_illumina_pe.reference_gene_locations_bed": "tests/inputs/sc2_gene_locations.bed"
+    "theiacov_illumina_pe.reference_gene_locations_bed": "tests/inputs/sc2_gene_locations.bed",
+    "theiacov_illumina_pe.read_QC_trim.kraken_db": "tests/data/theiacov/databases/github_kraken2_test_db.tar.gz"
 }
@@ -4,5 +4,6 @@
     "theiacov_illumina_se.primer_bed": "tests/data/theiacov/primers/artic-v3.primers.bed",
     "theiacov_illumina_se.reference_genome": "tests/data/theiacov/reference/MN908947.fasta",
     "theiacov_illumina_se.reference_gff": "tests/inputs/completely-empty-for-test.txt",
-    "theiacov_illumina_se.reference_gene_locations_bed": "tests/inputs/sc2_gene_locations.bed"
+    "theiacov_illumina_se.reference_gene_locations_bed": "tests/inputs/sc2_gene_locations.bed",
+    "theiacov_illumina_se.read_QC_trim.kraken_db": "tests/data/theiacov/databases/github_kraken2_test_db.tar.gz"
 }
@@ -3,5 +3,6 @@
     "theiacov_ont.read1": "tests/data/theiacov/fastqs/ont/ont.fastq.gz",
     "theiacov_ont.primer_bed": "tests/data/theiacov/primers/artic-v3.primers.bed",
     "theiacov_ont.reference_genome": "tests/data/theiacov/reference/MN908947.fasta",
-    "theiacov_ont.reference_gene_locations_bed": "tests/inputs/sc2_gene_locations.bed"
+    "theiacov_ont.reference_gene_locations_bed": "tests/inputs/sc2_gene_locations.bed",
+    "theiacov_ont.read_qc_trim.kraken_db": "tests/data/theiacov/databases/github_kraken2_test_db.tar.gz"
 }
@@ -17,7 +17,7 @@
     - wf_theiacov_clearlabs_miniwdl
   files:
     - path: miniwdl_run/call-consensus/command
-      md5sum: a8e200703dedf732b45dd92b0af15f1c
+      md5sum: b19d5ce485c612036064c07f0a1d6a18
     - path: miniwdl_run/call-consensus/inputs.json
       contains: ["read1", "samplename", "fastq"]
     - path: miniwdl_run/call-consensus/outputs.json
@@ -115,17 +115,16 @@
     - path: miniwdl_run/call-fastq_scan_clean_reads/inputs.json
       contains: ["read1", "clearlabs"]
     - path: miniwdl_run/call-fastq_scan_clean_reads/outputs.json
-      contains: ["fastq_scan_se", "pipeline_date", "read1_seq"]
+      contains: ["fastq_scan_se", "read1_seq"]
     - path: miniwdl_run/call-fastq_scan_clean_reads/stderr.txt
     - path: miniwdl_run/call-fastq_scan_clean_reads/stderr.txt.offset
     - path: miniwdl_run/call-fastq_scan_clean_reads/stdout.txt
     - path: miniwdl_run/call-fastq_scan_clean_reads/task.log
       contains: ["wdl", "theiacov_clearlabs", "fastq_scan_clean_reads", "done"]
-    - path: miniwdl_run/call-fastq_scan_clean_reads/work/DATE
     - path: miniwdl_run/call-fastq_scan_clean_reads/work/READ1_SEQS
       md5sum: 097e79b36919c8377c56088363e3d8b7
     - path: miniwdl_run/call-fastq_scan_clean_reads/work/VERSION
-      md5sum: 8e4e9cdfbacc9021a3175ccbbbde002b
+      md5sum: a59bb42644e35c09b8fa8087156fa4c2
     - path: miniwdl_run/call-fastq_scan_clean_reads/work/_miniwdl_inputs/0/clearlabs_R1_dehosted.fastq.gz
     - path: miniwdl_run/call-fastq_scan_clean_reads/work/clearlabs_R1_dehosted_fastq-scan.json
       md5sum: 869dd2e934c600bba35f30f08e2da7c9
@@ -134,22 +133,21 @@
     - path: miniwdl_run/call-fastq_scan_raw_reads/inputs.json
       contains: ["read1", "clearlabs"]
     - path: miniwdl_run/call-fastq_scan_raw_reads/outputs.json
-      contains: ["fastq_scan_se", "pipeline_date", "read1_seq"]
+      contains: ["fastq_scan_se", "read1_seq"]
     - path: miniwdl_run/call-fastq_scan_raw_reads/stderr.txt
     - path: miniwdl_run/call-fastq_scan_raw_reads/stderr.txt.offset
     - path: miniwdl_run/call-fastq_scan_raw_reads/stdout.txt
     - path: miniwdl_run/call-fastq_scan_raw_reads/task.log
       contains: ["wdl", "theiacov_clearlabs", "fastq_scan_raw_reads", "done"]
-    - path: miniwdl_run/call-fastq_scan_raw_reads/work/DATE
     - path: miniwdl_run/call-fastq_scan_raw_reads/work/READ1_SEQS
       md5sum: 097e79b36919c8377c56088363e3d8b7
     - path: miniwdl_run/call-fastq_scan_raw_reads/work/VERSION
-      md5sum: 8e4e9cdfbacc9021a3175ccbbbde002b
+      md5sum: a59bb42644e35c09b8fa8087156fa4c2
     - path: miniwdl_run/call-fastq_scan_raw_reads/work/_miniwdl_inputs/0/clearlabs.fastq.gz
     - path: miniwdl_run/call-fastq_scan_raw_reads/work/clearlabs_fastq-scan.json
       md5sum: 869dd2e934c600bba35f30f08e2da7c9
     - path: miniwdl_run/call-kraken2_dehosted/command
-      md5sum: 0f9db3341b5f58fb8d145d6d94222827
+      md5sum: 4306699c67306b103561adf31c3754e3
     - path: miniwdl_run/call-kraken2_dehosted/inputs.json
       contains: ["read1", "samplename"]
     - path: miniwdl_run/call-kraken2_dehosted/outputs.json
@@ -161,18 +159,18 @@
       contains: ["wdl", "theiacov_clearlabs", "kraken2_dehosted", "done"]
     - path: miniwdl_run/call-kraken2_dehosted/work/DATE
     - path: miniwdl_run/call-kraken2_dehosted/work/PERCENT_HUMAN
-      md5sum: 4fd4dcef994592f9865e9bc8807f32f4
+      md5sum: 897316929176464ebc9ad085f31e7284
     - path: miniwdl_run/call-kraken2_dehosted/work/PERCENT_SC2
-      md5sum: 9fc4759d176a0e0d240c418dbaaafeb2
+      md5sum: 86b6b8aa9ad17f169f04c02b0e2bf1b1
     - path: miniwdl_run/call-kraken2_dehosted/work/PERCENT_TARGET_ORGANISM
       md5sum: 68b329da9893e34099c7d8ad5cb9c940
     - path: miniwdl_run/call-kraken2_dehosted/work/VERSION
-      md5sum: 379b99c23325315c502e74614c035e7d
+      md5sum: 7ad46f90cd0ffa94f32a6e06299ed05c
     - path: miniwdl_run/call-kraken2_dehosted/work/_miniwdl_inputs/0/clearlabs_R1_dehosted.fastq.gz
     - path: miniwdl_run/call-kraken2_dehosted/work/clearlabs_kraken2_report.txt
-      md5sum: 35841fa2d77ec202c275b1de548b8d98
+      md5sum: b66dbcf8d229c1b6fcfff4dd786068bd
     - path: miniwdl_run/call-kraken2_raw/command
-      md5sum: a9dabf08bff8e183fd792901ce24fc57
+      md5sum: d6e217901b67290466eec97f13564022
     - path: miniwdl_run/call-kraken2_raw/inputs.json
       contains: ["read1", "samplename"]
     - path: miniwdl_run/call-kraken2_raw/outputs.json
@@ -184,16 +182,16 @@
       contains: ["wdl", "theiacov_clearlabs", "kraken2_raw", "done"]
     - path: miniwdl_run/call-kraken2_raw/work/DATE
     - path: miniwdl_run/call-kraken2_raw/work/PERCENT_HUMAN
-      md5sum: 4fd4dcef994592f9865e9bc8807f32f4
+      md5sum: 897316929176464ebc9ad085f31e7284
     - path: miniwdl_run/call-kraken2_raw/work/PERCENT_SC2
-      md5sum: 9fc4759d176a0e0d240c418dbaaafeb2
+      md5sum: 86b6b8aa9ad17f169f04c02b0e2bf1b1
     - path: miniwdl_run/call-kraken2_raw/work/PERCENT_TARGET_ORGANISM
       md5sum: 68b329da9893e34099c7d8ad5cb9c940
     - path: miniwdl_run/call-kraken2_raw/work/VERSION
-      md5sum: 379b99c23325315c502e74614c035e7d
+      md5sum: 7ad46f90cd0ffa94f32a6e06299ed05c
     - path: miniwdl_run/call-kraken2_raw/work/_miniwdl_inputs/0/clearlabs.fastq.gz
     - path: miniwdl_run/call-kraken2_raw/work/clearlabs_kraken2_report.txt
-      md5sum: 35841fa2d77ec202c275b1de548b8d98
+      md5sum: b66dbcf8d229c1b6fcfff4dd786068bd
     - path: miniwdl_run/call-ncbi_scrub_se/command
       contains: ["read1", "scrubber", "gzip"]
     - path: miniwdl_run/call-ncbi_scrub_se/inputs.json
@@ -236,7 +234,7 @@
     - path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/genome_annotation.gff3
       md5sum: 4dff84d2d6ada820e0e3a8bc6798d402
     - path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/pathogen.json
-      md5sum: a51a91e0b5e16590c1afd0c7897ad071
+      md5sum: 32f20640f926d5b59fed6b954541792d
     - path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/reference.fasta
       md5sum: c7ce05f28e4ec0322c96f24e064ef55c
     - path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/sequences.fasta
@@ -310,15 +308,15 @@
     - path: miniwdl_run/call-pangolin4/work/PANGOLIN_NOTES
       md5sum: 59478efddde2191ead1b46b1f121bbc9
     - path: miniwdl_run/call-pangolin4/work/PANGO_ASSIGNMENT_VERSION
-      md5sum: 0803245359027bd3017d2bd9a9c9c8e3
+      md5sum: 36f64a1cd7c6844309e8ad2121358088
     - path: miniwdl_run/call-pangolin4/work/VERSION_PANGOLIN_ALL
-      md5sum: b5dbf2ba7480effea8c656099df0e78e
+      md5sum: dfd90750c8776f46bad1de214c1d1a57
     - path: miniwdl_run/call-pangolin4/work/_miniwdl_inputs/0/clearlabs.medaka.consensus.fasta
       md5sum: d41d8cd98f00b204e9800998ecf8427e
     - path: miniwdl_run/call-pangolin4/work/clearlabs.pangolin_report.csv
-      md5sum: 151390c419b00ca44eb83e2bbfb96996
+      md5sum: 0370f24c270c44f6023dd98af79501e7
     - path: miniwdl_run/call-stats_n_coverage/command
-      md5sum: 51da320ddc7de2ffeb263f0ddd85ced6
+      md5sum: ac020678f99ac145b11d3dbc7b9fe9ba
     - path: miniwdl_run/call-stats_n_coverage/inputs.json
       contains: ["bamfile", "samplename"]
     - path: miniwdl_run/call-stats_n_coverage/outputs.json
@@ -350,7 +348,7 @@
     - path: miniwdl_run/call-stats_n_coverage/work/clearlabs.stats.txt
       md5sum: bfed5344c91ce6f4db1f688cac0a3ab9
     - path: miniwdl_run/call-stats_n_coverage_primtrim/command
-      md5sum: a84f90b8877babe54bf8c068d244fbe8
+      md5sum: 2974f886e1959cd5eaae5e495c91f7cc
     - path: miniwdl_run/call-stats_n_coverage_primtrim/inputs.json
       contains: ["bamfile", "samplename"]
     - path: miniwdl_run/call-stats_n_coverage_primtrim/outputs.json