This Dockerfile sets up an environment for running Dorado, a tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing.
- Introduction
- Requirements
- Building the Docker Image
- Running the Docker Container
- Testing the Docker Image
- Basecalling Test
- Verifying the Output
- Additional Notes
- License
This Docker image includes:
- Dorado: Version 0.8.3, a tool for basecalling Oxford Nanopore sequencing data.
- NVIDIA CUDA: Version 12.2.0, for GPU acceleration (requires NVIDIA GPU).
- Pigz: Version 2.6, for parallel compression and decompression.
- Pre-downloaded basecalling models: All models are downloaded during the build process for basecalling.
- Docker: Installed on your system.
- NVIDIA GPU and Drivers: Installed and configured.
- NVIDIA Container Toolkit: To enable GPU support in Docker containers.
To run the Dorado tool within the Docker container, use the following command:
docker run --gpus all -it dorado-image dorado --help
This command will display the help information for Dorado, confirming that it's installed correctly.
To test that Dorado is working correctly, you will need to download a sample Pod5 file and perform a basecalling operation using the pre-downloaded basecalling models.
wget -O dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
https://github.com/nanoporetech/dorado/raw/release-v0.7/tests/data/pod5/dna_r10.4.1_e8.2_260bps/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5
### Basecalling Test
Run the following command:
```bash
docker run --gpus all -v $(pwd):/usr/src/app -it dorado-image bash -c "\
dorado basecaller /dorado_models/[email protected] \
/usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves > /usr/src/app/basecalled.sam"
Explanation:
--gpus all
: Enables GPU support.-v $(pwd):/usr/src/app
: Mounts the current directory to/usr/src/app
inside the container.bash -c "...":
Runs the basecalling command inside the container.> /usr/src/app/basecalled.sam
: Redirects the output tobasecalled.sam
in your current directory.
Check the output file to ensure basecalling was successful:
samtools view basecalled.sam
You should see SAM-formatted basecalling results.
- Sample Data: The sample Pod5 file is downloaded to
/usr/src/app
during the build. - Internal Testing: An internal test stage is included in the Dockerfile to verify installation.
- Basecalling Models: All models are downloaded to
/dorado_models
during the build process. Below is the list of basecalling models included in the Docker image:modification models: - "[email protected][email protected]" - "[email protected][email protected]" - "[email protected][email protected]" - "[email protected]_5mCG_5hmCG@v0" - "[email protected]_5mCG_5hmCG@v0" - "[email protected]_5mCG_5hmCG@v0" - "[email protected]_5mCG@v2" - "[email protected]_5mCG@v2" - "[email protected]_5mCG@v2" - "[email protected]_5mCG@v2" - "[email protected]_5mCG@v2" - "[email protected]_5mCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected][email protected]" - "[email protected]_5mC@v2" - "[email protected]_6mA@v2" - "[email protected]_6mA@v3" - "[email protected]_5mC_5hmC@v1" - "[email protected]_5mC_5hmC@v1" - "[email protected]_5mC_5hmC@v1" - "[email protected]_6mA@v1" - "[email protected]_6mA@v1" - "[email protected]_6mA@v2" - "[email protected]_6mA@v2" - "[email protected]_5mCG_5hmCG@v1" - "[email protected]_5mCG_5hmCG@v1" - "[email protected]_4mC_5mC@v1" - "[email protected]_4mC_5mC@v1" - "[email protected]_4mC_5mC@v2" - "[email protected]_4mC_5mC@v2" - "[email protected]_5mC_5hmC@v1" - "[email protected]_5mC_5hmC@v1" - "[email protected]_5mC_5hmC@v2" - "[email protected]_5mC_5hmC@v2" - "[email protected]_5mCG_5hmCG@v1" - "[email protected]_5mCG_5hmCG@v1" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_5mCG_5hmCG@v2" - "[email protected]_6mA@v1" - "[email protected]_6mA@v1" - "[email protected]_6mA@v2" - "[email protected]_6mA@v2" - "[email protected]_m6A_DRACH@v1" - "[email protected]_m6A@v1" - "[email protected]_m6A@v1" - "[email protected]_m6A_DRACH@v1" - "[email protected]_m6A_DRACH@v1" - "[email protected]_pseU@v1" - "[email protected]_pseU@v1" - "[email protected]_m5C@v1" - "[email protected]_m5C@v1" - "[email protected]_inosine_m6A@v1" - "[email protected]_inosine_m6A@v1" - "[email protected]_m6A_DRACH@v1" - "[email protected]_m6A_DRACH@v1" - "[email protected]_pseU@v1" - "[email protected]_pseU@v1" stereo models: - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" simplex models: - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "rna002_70bps_fast@v3" - "rna002_70bps_hac@v3" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]" - "[email protected]"
Dorado is licensed under Oxford Nanopore Technologies' License.
Note: Please ensure that you have the necessary NVIDIA drivers and the NVIDIA Container Toolkit installed to utilize GPU acceleration.