Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds updated Dorado 0.8.3 #1107

Merged
merged 6 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [datasets-sars-cov-2](https://github.com/CDCgov/datasets-sars-cov-2) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/datasets-sars-cov-2)](https://hub.docker.com/r/staphb/datasets-sars-cov-2) | <ul><li>0.6.2</li><li>0.6.3</li><li>0.7.2</li></ul> | https://github.com/CDCgov/datasets-sars-cov-2 |
| [diamond](https://github.com/bbuchfink/diamond) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/diamond)](https://hub.docker.com/r/staphb/diamond) | <ul><li>[2.1.9](./diamond/2.1.9)</li><li>[2.1.10](./diamond/2.1.10)</li></ul> | https://github.com/bbuchfink/diamond|
| [dnaapler](https://hub.docker.com/r/staphb/dnaapler) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dnaapler)](https://hub.docker.com/r/staphb/dnaapler) | <ul><li>[0.1.0](dnaapler/0.1.0/)</li></ul> <ul><li>[0.4.0](dnaapler/0.4.0/)</li><li>[0.5.0](./dnaapler/0.5.0/)</li><li>[0.5.1](./dnaapler/0.5.1/)</li><li>[0.7.0](./dnaapler/0.7.0/)</li><li>[0.8.0](./dnaapler/0.8.0/)</li><li>[0.8.1](./dnaapler/0.8.1/)</li></ul> | https://github.com/gbouras13/dnaapler |
| [dorado](https://hub.docker.com/r/staphb/dorado) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dorado)](https://hub.docker.com/r/staphb/dorado) | <ul><li>[0.8.0](dorado/0.8.0/)</li></ul> | [https://github.com/nanoporetech/dorado](https://github.com/nanoporetech/dorado) |
| [dorado](https://hub.docker.com/r/staphb/dorado) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dorado)](https://hub.docker.com/r/staphb/dorado) | <ul><li>[0.8.0](dorado/0.8.0/)</li><li>[0.8.3](dorado/0.8.3/)</li></ul> | [https://github.com/nanoporetech/dorado](https://github.com/nanoporetech/dorado) |
| [dragonflye](https://hub.docker.com/r/staphb/dragonflye) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dragonflye)](https://hub.docker.com/r/staphb/dragonflye) | <ul><li>[1.0.14](./dragonflye/1.0.14/)</li><li>[1.1.1](./dragonflye/1.1.1/)</li><li>[1.1.2](./dragonflye/1.1.2/)</li><li>[1.2.0](./dragonflye/1.2.0/)</li><li>[1.2.1](./dragonflye/1.2.1/)</li></ul> | https://github.com/rpetit3/dragonflye |
| [Dr. PRG ](https://hub.docker.com/r/staphb/drprg) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/drprg)](https://hub.docker.com/r/staphb/drprg) | <ul><li>[0.1.1](drprg/0.1.1/)</li></ul> | https://mbh.sh/drprg/ |
| [DSK](https://hub.docker.com/r/staphb/dsk) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dsk)](https://hub.docker.com/r/staphb/dsk) | <ul><li>[0.0.100](./dsk/0.0.100/)</li><li>[2.3.3](./dsk/2.3.3/)</li></ul> | https://gatb.inria.fr/software/dsk/ |
Expand Down
64 changes: 64 additions & 0 deletions dorado/0.8.3/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Use NVIDIA CUDA image as the base image
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS app

ARG DORADO_VER=0.8.3

# Metadata
LABEL base.image="nvidia/cuda:12.2.0-devel-ubuntu20.04"
LABEL dockerfile.version="1"
LABEL software="dorado ${DORADO_VER}"
LABEL software.version="${DORADO_VER}"
LABEL description="A tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing"
LABEL website="https://github.com/nanoporetech/dorado"
LABEL license="https://github.com/nanoporetech/dorado/blob/master/LICENSE"
LABEL original.website="https://nanoporetech.github.io/dorado/"
LABEL maintainer="Fraser Combe"
LABEL maintainer.email="[email protected]"

# Set working directory
WORKDIR /usr/src/app

# Install dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends wget ca-certificates pigz && \
rm -rf /var/lib/apt/lists/* && apt-get autoclean

# Download and extract Dorado package
RUN wget https://cdn.oxfordnanoportal.com/software/analysis/dorado-${DORADO_VER}-linux-x64.tar.gz \
&& tar -xzvf dorado-${DORADO_VER}-linux-x64.tar.gz -C /opt \
&& rm dorado-${DORADO_VER}-linux-x64.tar.gz

# Set environment variables for Dorado binary
ENV PATH="/opt/dorado-${DORADO_VER}-linux-x64/bin:${PATH}"

# Download basecalling models
RUN mkdir /dorado_models && \
cd /dorado_models && \
dorado download --model all

# Default command
CMD ["dorado"]

# -----------------------------
# Test Stage
# -----------------------------
FROM app AS test


# Download the specific Pod5 test file
RUN wget -O /usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
https://github.com/nanoporetech/dorado/raw/release-v0.7/tests/data/pod5/dna_r10.4.1_e8.2_260bps/\
dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5

# Set working directory
WORKDIR /usr/src/app

# Run test command (using CPU mode)
RUN dorado basecaller \
--device cpu \
/dorado_models/[email protected] \
dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves --max-reads 10 > basecalled.sam

# Verify the output file exists and is not empty
RUN test -s basecalled.sam
220 changes: 220 additions & 0 deletions dorado/0.8.3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Dorado Docker Image

This Dockerfile sets up an environment for running **Dorado**, a tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing.

## Table of Contents

- [Introduction](#introduction)
- [Requirements](#requirements)
- [Building the Docker Image](#building-the-docker-image)
- [Running the Docker Container](#running-the-docker-container)
- [Testing the Docker Image](#testing-the-docker-image)
- [Basecalling Test](#basecalling-test)
- [Verifying the Output](#verifying-the-output)
- [Additional Notes](#additional-notes)
- [License](#license)

## Introduction

This Docker image includes:

- **Dorado**: Version **0.8.3**, a tool for basecalling Oxford Nanopore sequencing data.
- **NVIDIA CUDA**: Version **12.2.0**, for GPU acceleration (requires NVIDIA GPU).
- **Pigz**: Version **2.6**, for parallel compression and decompression.
- **Pre-downloaded basecalling models**: All models are downloaded during the build process for basecalling.

## Requirements

- **Docker**: Installed on your system.
- **NVIDIA GPU and Drivers**: Installed and configured.
- **NVIDIA Container Toolkit**: To enable GPU support in Docker containers.

## Running the Docker Container

To run the Dorado tool within the Docker container, use the following command:

```bash
docker run --gpus all -it dorado-image dorado --help
```

This command will display the help information for Dorado, confirming that it's installed correctly.

## Testing the Docker Image

To test that Dorado is working correctly, you will need to download a sample Pod5 file and perform a basecalling operation using the pre-downloaded basecalling models.

```bash
wget -O dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
https://github.com/nanoporetech/dorado/raw/release-v0.7/tests/data/pod5/dna_r10.4.1_e8.2_260bps/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5


### Basecalling Test

Run the following command:

```bash
docker run --gpus all -v $(pwd):/usr/src/app -it dorado-image bash -c "\
dorado basecaller /dorado_models/[email protected] \
/usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves > /usr/src/app/basecalled.sam"
```

**Explanation:**

- `--gpus all`: Enables GPU support.
- `-v $(pwd):/usr/src/app`: Mounts the current directory to `/usr/src/app` inside the container.
- `bash -c "...":` Runs the basecalling command inside the container.
- `> /usr/src/app/basecalled.sam`: Redirects the output to `basecalled.sam` in your current directory.

### Verifying the Output

Check the output file to ensure basecalling was successful:

```bash
samtools view basecalled.sam
```

You should see SAM-formatted basecalling results.

## Additional Notes

- **Sample Data**: The sample Pod5 file is downloaded to `/usr/src/app` during the build.
- **Internal Testing**: An internal test stage is included in the Dockerfile to verify installation.
- **Basecalling Models**: All models are downloaded to `/dorado_models` during the build process.
Below is the list of basecalling models included in the Docker image:
```yaml

modification models:
- "[email protected][email protected]"
- "[email protected][email protected]"
- "[email protected][email protected]"
- "[email protected]_5mCG_5hmCG@v0"
- "[email protected]_5mCG_5hmCG@v0"
- "[email protected]_5mCG_5hmCG@v0"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected][email protected]"
- "[email protected]_5mC@v2"
- "[email protected]_6mA@v2"
- "[email protected]_6mA@v3"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v2"
- "[email protected]_6mA@v2"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_4mC_5mC@v1"
- "[email protected]_4mC_5mC@v1"
- "[email protected]_4mC_5mC@v2"
- "[email protected]_4mC_5mC@v2"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v2"
- "[email protected]_5mC_5hmC@v2"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v2"
- "[email protected]_6mA@v2"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_m6A@v1"
- "[email protected]_m6A@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_pseU@v1"
- "[email protected]_pseU@v1"
- "[email protected]_m5C@v1"
- "[email protected]_m5C@v1"
- "[email protected]_inosine_m6A@v1"
- "[email protected]_inosine_m6A@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_pseU@v1"
- "[email protected]_pseU@v1"
stereo models:
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
simplex models:
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "rna002_70bps_fast@v3"
- "rna002_70bps_hac@v3"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
```

## License

Dorado is licensed under [Oxford Nanopore Technologies' License](https://github.com/nanoporetech/dorado/blob/master/LICENSE).


---

**Note**: Please ensure that you have the necessary NVIDIA drivers and the NVIDIA Container Toolkit installed to utilize GPU acceleration.

---
Loading