Skip to content

Commit

Permalink
Merge pull request #15 from PacificBiosciences/develop
Browse files Browse the repository at this point in the history
Merging develop into main for release
  • Loading branch information
gconcepcion authored Feb 14, 2024
2 parents 6eb5bb6 + 2ab2d8e commit 3a95564
Show file tree
Hide file tree
Showing 7 changed files with 162 additions and 14 deletions.
38 changes: 26 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# DISCLAIMER
<h1 align="center"><img width="300px" src="images/logo_wdl_workflows.svg"/></h1>

TO THE GREATEST EXTENT PERMITTED BY APPLICABLE LAW, THIS WEBSITE AND ITS CONTENT, INCLUDING ALL SOFTWARE, SOFTWARE CODE, SITE-RELATED SERVICES, AND DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. ALL WARRANTIES ARE REJECTED AND DISCLAIMED. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THE FOREGOING. PACBIO IS NOT OBLIGATED TO PROVIDE ANY SUPPORT FOR ANY OF THE FOREGOING, AND ANY SUPPORT PACBIO DOES PROVIDE IS SIMILARLY PROVIDED WITHOUT REPRESENTATION OR WARRANTY OF ANY KIND. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A REPRESENTATION OR WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACBIO.

# wdl-humanassembly
<h1 align="center">PacBio Human Assembly pipeline</h1>

Workflow for running de novo assembly using human PacBio whole genome sequencing (WGS) data. Written using [Workflow Description Language (WDL)](https://openwdl.org/).

Expand All @@ -15,11 +13,19 @@ Workflow for running de novo assembly using human PacBio whole genome sequencing

The assembly workflow performs _de novo_ assembly on samples and trios.

![De novo assembly workflow diagram](workflows/main.graphviz.svg "De novo assembly workflow diagram")
![De novo assembly workflow diagram](images/main.graphviz.svg "De novo assembly workflow diagram")

## Setup

Some tasks and workflows are pulled in from other repositories. Ensure you have initialized submodules following cloning by running `git submodule update --init --recursive`.
Clone a tagged version of the git repository. Use the `--branch` flag to pull the desired version, and the `--recursive` flag to pull code from any submodules.

```
git clone \
--depth 1 --branch v1.0.0 \ # for reproducibility
--recursive \ # to clone submodule
https://github.com/PacificBiosciences/HiFi-human-assembly-WDL.git
```


## Resource requirements

Expand Down Expand Up @@ -47,10 +53,12 @@ For backend-specific configuration, see the relevant documentation:
- [GCP](backends/gcp)
- [HPC](backends/hpc)

## Configuring a workflow engine
## Configuring a workflow engine and container runtime

An execution engine is required to run workflows. Two popular engines for running WDL-based workflows are [`miniwdl`](https://miniwdl.readthedocs.io/en/latest/getting_started.html) and [`Cromwell`](https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/).

Because workflow dependencies are containerized, a container runtime is required. This workflow has been tested with [Docker](https://docs.docker.com/get-docker/) and [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/) container runtimes.

See [backend-specific documentation](backends) for details on setting up an engine.

| Engine | Azure | AWS | GCP | HPC |
Expand Down Expand Up @@ -115,7 +123,7 @@ A cohort can include one or more samples. Samples need not be related.

| Type | Name | Description | Notes |
| :- | :- | :- | :- |
| String | cohort_id | A unique name for the cohort; used to name outputs | |
| String | cohort_id | A unique name for the cohort; used to name outputs. Alphanumeric characters, underscore (`_`), and dash (`-`) are allowed. | |
| Array[[Sample](#sample)] | samples | The set of samples for the cohort. At least one sample must be defined. | |
| Boolean | run_de_novo_assembly_trio | Run trio binned _de novo_ assembly. | Cohort must contain at least one valid trio (child and both parents present in the cohort) |

Expand All @@ -125,10 +133,10 @@ Sample information for each sample in the workflow run.

| Type | Name | Description | Notes |
| :- | :- | :- | :- |
| String | sample_id | A unique name for the sample; used to name outputs | |
| String | sample_id | A unique name for the sample; used to name outputs. Alphanumeric characters, underscore (`_`), and dash (`-`) are allowed | |
| Array[[IndexData](https://github.com/PacificBiosciences/wdl-common/blob/main/wdl/structs.wdl)] | movie_bams | The set of unaligned movie BAMs associated with this sample | |
| String? | father_id | Paternal `sample_id` | |
| String? | mother_id | Maternal `sample_id` | |
| String? | father_id | Paternal `sample_id`. Alphanumeric characters, underscore (`_`), and dash (`-`) are allowed. | |
| String? | mother_id | Maternal `sample_id`. Alphanumeric characters, underscore (`_`), and dash (`-`) are allowed. | |
| Boolean | run_de_novo_assembly | If true, run single-sample _de novo_ assembly for this sample | \[true, false\] |

## [ReferenceData](workflows/humanwgs_structs.wdl)
Expand Down Expand Up @@ -197,6 +205,12 @@ The Docker image used by a particular step of the workflow can be identified by
| hifiasm | <ul><li>[hifiasm 0.19.4](https://github.com/chhylp123/hifiasm/releases/tag/0.19.4)</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/3560fcc5a84e044067cea9c9a7669cfc2659178e/docker/hifiasm) |
| htslib | <ul><li>[htslib 1.14](https://github.com/samtools/htslib/releases/tag/1.14)</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/3560fcc5a84e044067cea9c9a7669cfc2659178e/docker/htslib) |
| paftools | <ul><li>[paftools 2.26-r1182-dirty](https://github.com/lh3/minimap2/blob/bc588c0eeb26426d0d90a93fb0877358a389c515/misc/paftools.js)</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/3560fcc5a84e044067cea9c9a7669cfc2659178e/docker/align_hifiasm) |
| parse-cohort | <ul><li>python 3.8.10; custom scripts</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/3560fcc5a84e044067cea9c9a7669cfc2659178e/docker/parse-cohort) |
| pyyaml | <ul><li>python 3.8.10; custom scripts</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/f72e862bca2f209b9909e6043ef0197975762f27/docker/pyyaml) |
| samtools | <ul><li>[samtools 1.14](https://github.com/samtools/samtools/releases/tag/1.14)</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/3560fcc5a84e044067cea9c9a7669cfc2659178e/docker/samtools) |
| yak | <ul><li>[yak 0.1](https://github.com/lh3/yak/releases/tag/v0.1)</li></ul> | [Dockerfile](https://github.com/PacificBiosciences/wdl-dockerfiles/tree/3560fcc5a84e044067cea9c9a7669cfc2659178e/docker/yak) |

---

## DISCLAIMER

TO THE GREATEST EXTENT PERMITTED BY APPLICABLE LAW, THIS WEBSITE AND ITS CONTENT, INCLUDING ALL SOFTWARE, SOFTWARE CODE, SITE-RELATED SERVICES, AND DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. ALL WARRANTIES ARE REJECTED AND DISCLAIMED. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THE FOREGOING. PACBIO IS NOT OBLIGATED TO PROVIDE ANY SUPPORT FOR ANY OF THE FOREGOING, AND ANY SUPPORT PACBIO DOES PROVIDE IS SIMILARLY PROVIDED WITHOUT REPRESENTATION OR WARRANTY OF ANY KIND. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A REPRESENTATION OR WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACBIO.
17 changes: 17 additions & 0 deletions backends/example/single.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"de_novo_assembly.cohort": {
"cohort_id": "HG002",
"samples": [
{
"movie_bams": [
"/path/to/input1.bam",
"/path/to/input2.bam",
"/path/to/input3.bam"
],
"run_de_novo_assembly": true,
"sample_id": "HG002"
}
],
"run_de_novo_assembly_trio": false
}
}
34 changes: 34 additions & 0 deletions backends/example/trio.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"de_novo_assembly.cohort": {
"cohort_id": "HG002",
"samples": [
{
"father_id": "HG003",
"mother_id": "HG004",
"movie_bams": [
"/path/to/sampleA_1.bam",
"/path/to/sampleA_2.bam"
],
"run_de_novo_assembly": false,
"sample_id": "HG002"
},
{
"movie_bams": [
"/path/to/sampleB_1.bam",
"/path/to/sampleB_2.bam"
],
"run_de_novo_assembly": true,
"sample_id": "HG003"
},
{
"movie_bams": [
"/path/to/sampleC_1.bam",
"/path/to/sampleC_2.bam"
],
"run_de_novo_assembly": true,
"sample_id": "HG004"
}
],
"run_de_novo_assembly_trio": true
}
}
83 changes: 83 additions & 0 deletions images/logo_wdl_workflows.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
2 changes: 1 addition & 1 deletion wdl-ci.config.json
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
"tasks": {
"parse_families": {
"key": "parse_families",
"digest": "rprxafsnidgno35awynatngwbnuw6suo",
"digest": "rbuiru23pdiayrbc4zmrqcjyqay4c2aa",
"tests": [
{
"inputs": {
Expand Down
2 changes: 1 addition & 1 deletion workflows/de_novo_assembly_trio/de_novo_assembly_trio.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ task parse_families {
}

runtime {
docker: "~{runtime_attributes.container_registry}/parse-cohort@sha256:e6a8ac24ada706644e62878178790a0006db9a6abec7a312232052bb0666fe8f"
docker: "~{runtime_attributes.container_registry}/pyyaml@sha256:af6f0689a7412b1edf76bd4bf6434e7fa6a86192eebf19573e8618880d9c1dbb"
cpu: 2
memory: "4 GB"
disk: "20 GB"
Expand Down

0 comments on commit 3a95564

Please sign in to comment.