Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated FastANI to 1.34 #752

Merged
merged 6 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [emmtyper](https://hub.docker.com/r/staphb/emmtyper) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/emmtyper)](https://hub.docker.com/r/staphb/emmtyper) | <ul><li>0.2.0</li></ul> | https://github.com/MDU-PHL/emmtyper |
| [emm-typing-tool](https://hub.docker.com/r/staphb/emm-typing-tool) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/emm-typing-tool)](https://hub.docker.com/r/staphb/emm-typing-tool) | <ul><li>0.0.1 (no version)</li></ul> | https://github.com/phe-bioinformatics/emm-typing-tool |
| [EToKi](https://hub.docker.com/r/staphb/etoki) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/etoki)](https://hub.docker.com/r/staphb/etoki) | <ul><li>1.2.1</li></ul> | https://github.com/zheminzhou/EToKi |
| [FastANI](https://hub.docker.com/r/staphb/fastani) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/fastani)](https://hub.docker.com/r/staphb/fastani) | <ul><li>1.1</li><li>1.32</li><li>1.33</li><li>1.33 + RGDv2</li></ul> | https://github.com/ParBLiSS/FastANI |
| [FastANI](https://hub.docker.com/r/staphb/fastani) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/fastani)](https://hub.docker.com/r/staphb/fastani) | <ul><li>1.1</li><li>1.32</li><li>1.33</li><li>1.33 + RGDv2</li><li>1.34</li><li>1.34 + RGDv2</li></ul> | https://github.com/ParBLiSS/FastANI |
| [Fastp](https://hub.docker.com/r/staphb/fastp) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/fastp)](https://hub.docker.com/r/staphb/fastp) | <ul><li>0.23.2</li><li>[0.23.4](fastp/0.23.4/)</li></ul> | http://opengene.org/fastp/ <br/> https://github.com/OpenGene/fastp |
| [FastTree](https://hub.docker.com/r/staphb/fasttree) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/fasttree)](https://hub.docker.com/r/staphb/fasttree) | <ul><li>2.1.11</li></ul> | http://www.microbesonline.org/fasttree/ |
| [FastQC](https://hub.docker.com/r/staphb/fastqc) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/fastqc)](https://hub.docker.com/r/staphb/fastqc) | <ul><li>0.11.8</li><li>0.11.9</li><li>0.12.1</li></ul> | https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ <br/> https://github.com/s-andrews/FastQC |
Expand Down
77 changes: 77 additions & 0 deletions fastani/1.34-RGDV2/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## build RGDv2 ##
FROM staphb/ncbi-datasets:15.11.0 as stage

# copy in list of NCBI accessions and species list
COPY RGDv2-NCBI-assembly-accessions.txt /RGDv2/RGDv2-NCBI-assembly-accessions.txt
COPY RGDv2-NCBI-assembly-accessions-and-species.txt /RGDv2/RGDv2-NCBI-assembly-accessions-and-species.txt

# download RGD genomes using NCBI datasets tools; cleanup unneccessary files;
# move and re-name assemblies to include Species in the filename
# make fasta files readable to all users; create File Of FileNames for all 43 assemblies (to be used with fastANI)
RUN for ID in $(cat /RGDv2/RGDv2-NCBI-assembly-accessions.txt); do \
SPECIES=$(grep "${ID}" /RGDv2/RGDv2-NCBI-assembly-accessions-and-species.txt | cut -f 1) && \
echo "downloading $ID, species "${SPECIES}", from NCBI..."; \
datasets download genome accession ${ID} --filename ${ID}.zip; \
unzip ${ID}.zip; \
rm ${ID}.zip; \
mv -v ncbi_dataset/data/${ID}/${ID}*.fna /RGDv2/${ID}.${SPECIES}.fasta; \
rm -rfv ncbi_dataset/; \
rm -v README.md; \
done && \
ls /RGDv2/*.fasta >/RGDv2/FOFN-RGDv2.txt &&\
chmod 664 /RGDv2/*

## App ##
FROM ubuntu:jammy as app

# for easy upgrade later. ARG variables only persist at build time
ARG FASTANI_VER="v1.34"

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="FastANI"
LABEL software.version=${FASTANI_VER}
LABEL description="Fast alignment-free computation of whole-genome Average Nucleotide Identity"
LABEL website="https://github.com/ParBLiSS/FastANI"
LABEL license="https://github.com/ParBLiSS/FastANI/blob/master/LICENSE"
LABEL maintainer="Kelsey Florek"
LABEL maintainer.email="[email protected]"
LABEL maintainer2="Curtis Kapsak"
LABEL maintainer2.email="[email protected]"
LABEL maintainer3="Kutluhan Incekara"
LABEL maintainer3.email="[email protected]"

# install dependencies; cleanup apt garbage
RUN apt-get update && apt-get install --no-install-recommends -y \
wget \
unzip \
libgomp1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# download pre-compiled binary; unzip; put binary in /usr/local/bin
# apt dependencies: libgomp1 unzip wget
RUN wget --no-check-certificate https://github.com/ParBLiSS/FastANI/releases/download/${FASTANI_VER}/fastANI-Linux64-${FASTANI_VER}.zip && \
unzip fastANI-Linux64-${FASTANI_VER}.zip -d /usr/local/bin && \
rm fastANI-Linux64-${FASTANI_VER}.zip

# copy RGDv2 from stage
COPY --from=stage /RGDv2/ /RGDv2/

# default run command
CMD fastANI -h

# singularity compatibility
ENV LC_ALL=C

# set working directory
WORKDIR /data

## Test ##
FROM app as test

# test against RGDv2
RUN wget --no-check-certificate -P /data https://github.com/ParBLiSS/FastANI/raw/master/tests/data/Escherichia_coli_str_K12_MG1655.fna && \
fastANI -t 8 -q /data/Escherichia_coli_str_K12_MG1655.fna --rl /RGDv2/FOFN-RGDv2.txt -o fastANI.RGDv2.out.tsv &&\
echo "output TSV from fastANI test:" && \
cat fastANI.RGDv2.out.tsv

21 changes: 21 additions & 0 deletions fastani/1.34-RGDV2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# fastANI container

Main tool : [fastANI](https://github.com/ParBLiSS/FastANI)

Full documentation: https://github.com/ParBLiSS/FastANI

FastANI was developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes.

This docker image contains the Reference Genome Database version 2 (RGDv2) from the Enteric Diseases Laboratory Branch at the CDC. It contains the genomes of 43 enteric bacterial isolates that are used to for species identification of bacterial isolate WGS data. This database is NOT meant to be comprehensive - it contains the genomes of enteric pathogens commonly sequenced by EDLB and some closely related species.

The FASTA files for RGDv2 can be found within `/RGDv2/` inside the docker image.

## Example Usage

```bash
# query one genome against another genome
fastANI -t 8 -q bacterial-genome1.fasta -r bacterial-genome2.fasta -o fastANI.out.tsv

# query one genome against the 43 genomes in RGDv2 (requires a File Of FileNames as input)
fastANI -t 8 -q bacterial-genome.fasta --rl /RGDv2/FOFN-RGDv2.txt -o fastANI.RGDv2.out.tsv
```
43 changes: 43 additions & 0 deletions fastani/1.34-RGDV2/RGDv2-NCBI-assembly-accessions-and-species.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Campylobacter_coli GCA_008011635.1
Campylobacter_fetus GCA_000015085.1
Campylobacter_fetus GCA_000495505.1
Campylobacter_fetus GCA_000759515.1
Campylobacter_hyointestinalis GCA_001643955.1
Campylobacter_jejuni GCA_000017485.1
Campylobacter_jejuni GCA_008011525.1
Campylobacter_lari GCA_000019205.1
Campylobacter_lari GCA_000816225.1
Campylobacter_upsaliensis GCA_008011615.1
Escherichia_albertii GCA_000512125.1
Escherichia_coli GCA_002741475.1
Escherichia_fergusonii GCA_000026225.1
Grimontia_hollisae GCA_009665295.1
Listeria_innocua GCA_017363615.1
Listeria_innocua GCA_017363655.1
Listeria_ivanovii GCA_000252975.1
Listeria_marthii GCA_017363645.1
Listeria_monocytogenes GCA_001466295.1
Listeria_monocytogenes GCA_013625895.1
Listeria_monocytogenes GCA_013625995.1
Listeria_monocytogenes GCA_013626145.1
Listeria_monocytogenes GCA_014526935.1
Listeria_seeligeri GCA_017363605.1
Listeria_welshimeri GCA_002489005.1
Photobacterium_damselae GCA_009665375.1
Salmonella_bongori GCA_013588055.1
Salmonella_enterica GCA_011388235.1
Vibrio_alginolyticus GCA_009665435.1
Vibrio_cholerae GCA_009665515.2
Vibrio_cidicii GCA_009665415.1
Vibrio_cincinnatiensis GCA_009665395.1
Vibrio_fluvialis GCA_009665355.1
Vibrio_furnissii GCA_009665335.1
Vibrio_harveyi GCA_009665315.1
Vibrio_metoecus GCA_009665255.1
Vibrio_metoecus GCA_009665275.1
Vibrio_metschnikovii GCA_009665235.1
Vibrio_mimicus GCA_009665195.1
Vibrio_navarrensis GCA_009665215.1
Vibrio_parahaemolyticus GCA_009665495.1
Vibrio_vulnificus GCA_009665455.1
Vibrio_vulnificus GCA_009665475.1
43 changes: 43 additions & 0 deletions fastani/1.34-RGDV2/RGDv2-NCBI-assembly-accessions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
GCA_008011635.1
GCA_000015085.1
GCA_000495505.1
GCA_000759515.1
GCA_001643955.1
GCA_000017485.1
GCA_008011525.1
GCA_000816225.1
GCA_000019205.1
GCA_008011615.1
GCA_000512125.1
GCA_002741475.1
GCA_000026225.1
GCA_009665295.1
GCA_017363655.1
GCA_017363615.1
GCA_000252975.1
GCA_017363645.1
GCA_001466295.1
GCA_014526935.1
GCA_013626145.1
GCA_013625995.1
GCA_013625895.1
GCA_017363605.1
GCA_002489005.1
GCA_009665375.1
GCA_013588055.1
GCA_011388235.1
GCA_009665435.1
GCA_009665515.2
GCA_009665415.1
GCA_009665395.1
GCA_009665355.1
GCA_009665335.1
GCA_009665315.1
GCA_009665275.1
GCA_009665255.1
GCA_009665235.1
GCA_009665195.1
GCA_009665215.1
GCA_009665495.1
GCA_009665475.1
GCA_009665455.1
44 changes: 44 additions & 0 deletions fastani/1.34-RGDV2/RGDv2-metadata.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Species BioSample NCBI Assembly Strain ID
Campylobacter coli SAMN12323645 GCA_008011635.1 2013D-9606
Campylobacter fetus SAMN02604050 GCA_000015085.1 82-40
Campylobacter fetus SAMN02604287 GCA_000495505.1 03-427
Campylobacter fetus SAMN02870596 GCA_000759515.1 97-608
Campylobacter hyointestinalis SAMN03737973 GCA_001643955.1 LMG 9260
Campylobacter jejuni SAMN02604056 GCA_000017485.1 NC_009707
Campylobacter jejuni SAMN12323651 GCA_008011525.1 D0133
Campylobacter lari SAMN02604025 GCA_000019205.1 RM2100
Campylobacter lari SAMN03248542 GCA_000816225.1 LMG 11760
Campylobacter upsaliensis SAMN12323647 GCA_008011615.1 D1914
Escherichia albertii SAMN02641387 GCA_000512125.1 KF1
Escherichia coli SAMN07731009 GCA_002741475.1 B4103-1
Escherichia fergusonii SAMEA3138228 GCA_000026225.1 ATCC_35469
Grimontia hollisae SAMN10812938 GCA_009665295.1 2013V-1029
Listeria innocua SAMN10869157 GCA_017363615.1 2010L-2059
Listeria innocua SAMN10869156 GCA_017363655.1 H0996 L
Listeria ivanovii SAMEA3138408 GCA_000252975.1 PAM55
Listeria marthii SAMN10869158 GCA_017363645.1 FSL S4-696
Listeria monocytogenes SAMN02944835 GCA_001466295.1 G4599
Listeria monocytogenes SAMN02847829 GCA_013625895.1 2014L-6256
Listeria monocytogenes SAMN03067768 GCA_013625995.1 J0099
Listeria monocytogenes SAMN02950479 GCA_013626145.1 2014L-6393
Listeria monocytogenes SAMN03761815 GCA_014526935.1 2011L-2626
Listeria seeligeri SAMN10869159 GCA_017363605.1 F5761
Listeria welshimeri SAMN03462185 GCA_002489005.1 SLCC5334
Photobacterium damselae SAMN10702680 GCA_009665375.1 2012V-1072
Salmonella bongori SAMN13207407 GCA_013588055.1 04-0440
Salmonella enterica SAMN08167480 GCA_011388235.1 2010K-2370
Vibrio alginolyticus SAMN10702675 GCA_009665435.1 2013V-1302
Vibrio cholerae SAMN10863496 GCA_009665515.2 2010EL-1786
Vibrio cidicii SAMN10863497 GCA_009665415.1 2423-01
Vibrio cincinnatiensis SAMN10812936 GCA_009665395.1 2409-02
Vibrio fluvialis SAMN10812937 GCA_009665355.1 2013V-1049
Vibrio furnissii SAMN10702681 GCA_009665335.1 2419-04
Vibrio harveyi SAMN10702676 GCA_009665315.1 2011V-1164
Vibrio metoecus SAMN10702677 GCA_009665255.1 2011V-1169
Vibrio metoecus SAMN10863498 GCA_009665275.1 08-2459
Vibrio metschnikovii SAMN10702671 GCA_009665235.1 2012V-1020
Vibrio mimicus SAMN10812939 GCA_009665195.1 2011V-1073
Vibrio navarrensis SAMN10863499 GCA_009665215.1 08-2462
Vibrio parahaemolyticus SAMN10702672 GCA_009665495.1 2012AW-0154
Vibrio vulnificus SAMN10702674 GCA_009665455.1 2009V-1035
Vibrio vulnificus SAMN10702673 GCA_009665475.1 2142-77
51 changes: 51 additions & 0 deletions fastani/1.34/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
FROM ubuntu:jammy as app

# for easy upgrade later. ARG variables only persist at build time
ARG FASTANI_VER="v1.34"

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="FastANI"
LABEL software.version=${FASTANI_VER}
LABEL description="Fast alignment-free computation of whole-genome Average Nucleotide Identity"
LABEL website="https://github.com/ParBLiSS/FastANI"
LABEL license="https://github.com/ParBLiSS/FastANI/blob/master/LICENSE"
LABEL maintainer="Kelsey Florek"
LABEL maintainer.email="[email protected]"
LABEL maintainer2="Curtis Kapsak"
LABEL maintainer2.email="[email protected]"
LABEL maintainer3="Kutluhan Incekara"
LABEL maintainer3.email="[email protected]"

# install dependencies; cleanup apt garbage
RUN apt-get update && apt-get install --no-install-recommends -y \
wget \
unzip \
libgomp1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# download pre-compiled binary; unzip; put binary in /usr/local/bin
# apt dependencies: libgomp1 unzip wget
RUN wget --no-check-certificate https://github.com/ParBLiSS/FastANI/releases/download/${FASTANI_VER}/fastANI-Linux64-${FASTANI_VER}.zip && \
unzip fastANI-Linux64-${FASTANI_VER}.zip -d /usr/local/bin && \
rm fastANI-Linux64-${FASTANI_VER}.zip

# default run command
CMD fastANI -h

# singularity compatibility
ENV LC_ALL=C

# set working directory
WORKDIR /data

## Test ##
FROM app as test

# download 2 genomes from fastANI GitHub; compare the 2; cat the output file
RUN wget --no-check-certificate -P /data https://github.com/ParBLiSS/FastANI/raw/master/tests/data/Escherichia_coli_str_K12_MG1655.fna && \
wget --no-check-certificate -P /data https://github.com/ParBLiSS/FastANI/raw/master/tests/data/Shigella_flexneri_2a_01.fna && \
fastANI -q /data/Shigella_flexneri_2a_01.fna -r /data/Escherichia_coli_str_K12_MG1655.fna -o /data/fastANI-test-ShiglellaFlexneri-EcoliK12.tsv && \
echo "output TSV from fastANI test:" && \
cat fastANI-test-ShiglellaFlexneri-EcoliK12.tsv

19 changes: 19 additions & 0 deletions fastani/1.34/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# fastANI container

Main tool : [fastANI](https://github.com/ParBLiSS/FastANI)

Full documentation: https://github.com/ParBLiSS/FastANI

FastANI was developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes.

This docker image contains no references.

## Example Usage

```bash
# query one genome against another genome
fastANI -t 8 -q bacterial-genome1.fasta -r bacterial-genome2.fasta -o fastANI.out.tsv

# query one genome against the 43 genomes in RGDv2 (requires a File Of FileNames as input)
fastANI -t 8 -q bacterial-genome.fasta --rl /RGDv2/FOFN-RGDv2.txt -o fastANI.RGDv2.out.tsv
```