Skip to content

Commit

Permalink
Merge branch 'main' into fc-dorado-workflow-standalone-dev
Browse files Browse the repository at this point in the history
  • Loading branch information
sage-wright authored Nov 14, 2024
2 parents 41c41a8 + ad49a36 commit c94f59d
Show file tree
Hide file tree
Showing 110 changed files with 1,455 additions and 329 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
cromwell*
_LAST
2024*
2024*site/
site/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Many more workflows are available, and are documented in detail in the [Theiagen

The PHB repository would not be possible without its predecessors. We would like to acknowledge the following repositories, individuals, and contributors for their influence on the development of these workflows:

The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's [Monroe](https://staph-b.github.io/staphb_toolkit/workflow_docs/monroe/) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions.
The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's Monroe (now deprecated) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions.

If you would like to provide feedback, please raise a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new).

Expand Down
Binary file modified docs/assets/figures/TheiaProk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_fasta.organism":"rsv_a"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_fasta.organism":"rsv_b"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_fasta.organism":"WNV"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_fasta.organism":"flu"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_pe.organism":"HIV","theiacov_illumina_pe.hiv_primer_version":"v1"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_pe.organism":"HIV","theiacov_illumina_pe.hiv_primer_version":"v2"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_pe.organism":"rsv_a"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_pe.organism":"rsv_b"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_pe.organism":"WNV"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_pe.organism":"flu"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_illumina_se.organism":"WNV"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_ont.organism":"HIV","theiacov_ont.hiv_primer_version":"v1"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_ont.organism":"HIV","theiacov_ont.hiv_primer_version":"v2"}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"theiacov_ont.organism":"flu"}
1 change: 1 addition & 0 deletions docs/contributing/doc_contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Here are some VSCode Extensions can help you write and edit your markdown files

- [Excel to Markdown Table](https://tableconvert.com/excel-to-markdown) - This website will convert an Excel table into markdown format, which can be copied and pasted into your markdown file.
- [Material for MkDocs Reference](https://squidfunk.github.io/mkdocs-material/reference/) - This is the official reference for the Material for MkDocs theme, which will help you understand how to use the theme's features.
- [Broken Link Check](https://www.brokenlinkcheck.com/) - This website will scan your website to ensure that all links are working correctly. This will only work on the deployed version of the documentation, not the local version.

## Documentation Structure

Expand Down
7 changes: 3 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ title: Home

The PHB repository contains workflows for the characterization, genomic epidemiology, and sharing of pathogen genomes of public health concern. Workflows are available for viruses, bacteria, and fungi.

All workflows in the PHB repository end with `_PHB` in order to differentiate them from earlier versions and from the original tools they
incorporate.
All workflows in the PHB repository end with `_PHB` in order to differentiate them from earlier versions and from the original tools they incorporate.

<center>[Explore our workflows](workflows_overview/workflows_type.md){ .md-button .md-button--primary }</center>

Expand All @@ -28,7 +27,7 @@ incorporate.
</div>

!!! dna "Our Open Source Philosophy"
PHB source code is publicly available on [GitHub](https://github.com/theiagen/public_health_bioinformatics) and available under [GNU Affero General Public License v3.0](https://github.com/theiagen/public_health_viral_genomics/blob/main/LICENSE)!
PHB source code is publicly available on [GitHub](https://github.com/theiagen/public_health_bioinformatics) and available under [GNU Affero General Public License v3.0](https://github.com/theiagen/public_health_bioinformatics/blob/main/LICENSE)!

All workflows can be imported directly to [Terra](https://terra.bio/) via the [**Dockstore PHB collection**](https://dockstore.org/organizations/Theiagen/collections/public-health-bioinformatics)!

Expand Down Expand Up @@ -90,7 +89,7 @@ We would like to gratefully acknowledge the following individuals from the publi

The PHB repository would not be possible without its predecessors. We would like to acknowledge the following repositories, individuals, and contributors for their influence on the development of these workflows:

The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's [Monroe](https://staph-b.github.io/staphb_toolkit/workflow_docs/monroe/) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions.
The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's Monroe (now deprecated) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions.

If you would like to provide feedback, please raise a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new) or contact us at <[email protected]>.

Expand Down
64 changes: 64 additions & 0 deletions docs/javascripts/table-search.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
function addTableSearch() {
// Select all containers with the class 'searchable-table'
const containers = document.querySelectorAll('.searchable-table');

containers.forEach((container) => {
// Find the table within this container
const table = container.querySelector('table');

if (table) {
// Ensure we don't add multiple search boxes
if (!container.querySelector('input[type="search"]')) {
// Create the search input element
const searchInput = document.createElement("input");
searchInput.setAttribute("type", "search");
searchInput.setAttribute("placeholder", "Search table...");
searchInput.classList.add('table-search-input');
searchInput.style.marginBottom = "10px";
searchInput.style.display = "block";

// Insert the search input before the table
container.insertBefore(searchInput, container.firstChild);

// Add event listener for table search
searchInput.addEventListener("input", function () {
const filter = searchInput.value.toUpperCase();
const rows = table.getElementsByTagName("tr");

for (let i = 1; i < rows.length; i++) { // Skip header row
const cells = rows[i].getElementsByTagName("td");
let match = false;

for (let j = 0; j < cells.length; j++) {
if (cells[j].innerText.toUpperCase().includes(filter)) {
match = true;
break;
}
}

rows[i].style.display = match ? "" : "none";
}
});
}
} else {
console.log('Table not found within container.');
}
});
}

// Run on page load
addTableSearch();

// Reapply search bar on page change
function observeDOMChanges() {
const targetNode = document.querySelector('body');
const config = { childList: true, subtree: true };

const observer = new MutationObserver(() => {
addTableSearch();
});

observer.observe(targetNode, config);
}

observeDOMChanges();
31 changes: 31 additions & 0 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -184,5 +184,36 @@ th {
td {
word-break: break-all;
}
/* Base styles for the search box */
div.searchable-table input.table-search-input {
width: 25%;
padding: 10px;
margin-bottom: 12px;
font-size: 12px;
box-sizing: border-box;
border-radius: 2px;
}

/* Light mode styles */
[data-md-color-scheme="light"] div.searchable-table input.table-search-input {
background-color: #fff;
color: #000;
border: 1px solid #E0E1E1;
}

[data-md-color-scheme="light"] div.searchable-table input.table-search-input::placeholder {
color: #888;
font-style: italic;
}

/* Dark mode styles */
[data-md-color-scheme="slate"] div.searchable-table input.table-search-input {
background-color: #1d2125;
color: #fff;
border: 1px solid #373B40;
}

[data-md-color-scheme="slate"] div.searchable-table input.table-search-input::placeholder {
color: #bbb;
font-style: italic;
}
4 changes: 4 additions & 0 deletions docs/workflows/data_export/concatenate_column_content.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ This set-level workflow will create a file containing all of the items from a gi

This workflow runs on the set level.

<div class="searchable-table" markdown="1">

| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
|---|---|---|---|---|---|
| concatenate_column_content | **concatenated_file_name** | String | The name of the output file. ***Include the extension***, such as ".fasta" or ".txt". | | Required |
Expand All @@ -28,6 +30,8 @@ This workflow runs on the set level.
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |

</div>

### Outputs

!!! info
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/data_export/transfer_column_content.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ This set-level workflow will transfer all of the items from a given column in a

This workflow runs on the set level.

<div class="searchable-table" markdown="1">

| **Terra Task name** | **input_variable** | **Type** | **Description** | **Default attribute** | **Status** |
|---|---|---|---|---|---|
| transfer_column_content | **files_to_transfer** | Array[File] | The column that has the files you want to concatenate. | | Required |
Expand All @@ -36,6 +38,8 @@ This workflow runs on the set level.
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |

</div>

### Outputs

!!! info
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/data_export/zip_column_content.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ This workflow will create a zip file that contains all of the items in a column

This workflow runs on the set level.

<div class="searchable-table" markdown="1">

| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
|---|---|---|---|---|---|
| zip_column_content | **files_to_zip** | Array[File] | The column that has the files you want to zip. | | Required |
Expand All @@ -27,6 +29,8 @@ This workflow runs on the set level.
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |

</div>

### Outputs

!!! info
Expand Down
10 changes: 9 additions & 1 deletion docs/workflows/data_import/assembly_fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Assembly_Fetch requires the input samplename, and either the accession for a ref

This workflow runs on the sample level.

<div class="searchable-table" markdown="1">

| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
|---|---|---|---|---|---|
| reference_fetch | **samplename** | String | Your sample's name | | Required |
Expand All @@ -44,6 +46,8 @@ This workflow runs on the sample level.
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |

</div>

### Analysis Tasks

??? task "ReferenceSeeker (optional) Details"
Expand Down Expand Up @@ -90,6 +94,8 @@ This workflow runs on the sample level.

### Outputs

<div class="searchable-table" markdown="1">

| **Variable** | **Type** | **Description** |
|---|---|---|
| assembly_fetch_analysis_date | String | Date of assembly download |
Expand All @@ -101,11 +107,13 @@ This workflow runs on the sample level.
| assembly_fetch_ncbi_datasets_version | String | NCBI datasets version used |
| assembly_fetch_referenceseeker_database | String | ReferenceSeeker database used |
| assembly_fetch_referenceseeker_docker | String | Docker file used for ReferenceSeeker |
| assembly_fetch_referenceseeker_top_hit_ncbi_accession | String | NCBI Accession for the top it identified by Assembly_Fetch |
| assembly_fetch_referenceseeker_top_hit_ncbi_accession | String | NCBI Accession for the top hit identified by Assembly_Fetch |
| assembly_fetch_referenceseeker_tsv | File | TSV file of the top hits between the query genome and the Reference Seeker database |
| assembly_fetch_referenceseeker_version | String | ReferenceSeeker version used |
| assembly_fetch_version | String | The version of the repository the Assembly Fetch workflow is in |

</div>

## References

> **ReferenceSeeker:** Schwengers O, Hain T, Chakraborty T, Goesmann A. ReferenceSeeker: rapid determination of appropriate reference genomes. J Open Source Softw. 2020 Feb 4;5(46):1994.
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/data_import/basespace_fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ This process must be performed on a command-line (ideally on a Linux or MacOS co

This workflow runs on the sample level.

<div class="searchable-table" markdown="1">

| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
|---|---|---|---|---|---|
| basespace_fetch | **access_token** | String | The access token is used in place of a username and password to allow the workflow to access the user account in BaseSpace from which the data is to be transferred. It is an alphanumeric string that is 32 characters in length. Example: 9e08a96471df44579b72abf277e113b7 | | Required |
Expand All @@ -168,6 +170,8 @@ This workflow runs on the sample level.
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional |

</div>

### **Outputs**

The outputs of this workflow will be the fastq files imported from BaseSpace into the data table where the sample ID information had originally been uploaded.
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/data_import/create_terra_table.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ The manual creation of Terra tables can be tedious and error-prone. This workflo

**_This can be changed_** by providing information in the `file_ending` optional input parameter. See below for more information.

<div class="searchable-table" markdown="1">

| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
|---|---|---|---|---|---|
| create_terra_table | **assembly_data** | Boolean | Set to true if your data is in FASTA format; set to false if your data is FASTQ format | | Required |
Expand All @@ -33,6 +35,8 @@ The manual creation of Terra tables can be tedious and error-prone. This workflo
| make_table | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21" | Optional |
| make_table | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |

</div>

### Finding the `data_location_path`

#### Using the Terra data uploader
Expand Down
Loading

0 comments on commit c94f59d

Please sign in to comment.