Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bindashtree v0.1.0 #1146

Merged
merged 14 commits into from
Dec 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Program_Licenses.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The licenses of the open-source software that is contained in these Docker image
| BBTools | non-standard - see `licence.txt` and `legal.txt` that is included in docker image under `/bbmap/docs/`; Also on sourceforge repo for BBTools | https://jgi.doe.gov/disclaimer/ |
| bcftools | MIT & **GNU GPLv3** | https://github.com/samtools/bcftools/blob/develop/LICENSE |
| bedtools | MIT | https://github.com/arq5x/bedtools2/blob/master/LICENSE |
| bindashtree | MIT | https://github.com/jianshu93/bindashtree?tab=MIT-1-ov-file#readme |
| blast+ | Public Domain | https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/scripts/projects/blast/LICENSE |
| bowtie2 | GNU GPLv3 | https://github.com/BenLangmead/bowtie2/blob/master/LICENSE |
| Bracken | GNU GPLv3 | https://github.com/jenniferlu717/Bracken/blob/master/LICENSE |
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [bcftools](https://hub.docker.com/r/staphb/bcftools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bcftools)](https://hub.docker.com/r/staphb/bcftools) | <ul><li>[1.10.2](./bcftools/1.10.2/)</li><li>[1.11](./bcftools/1.11/)</li><li>[1.12](./bcftools/1.12/)</li><li>[1.13](./bcftools/1.13/)</li><li>[1.14](./bcftools/1.14/)</li><li>[1.15](./bcftools/1.15/)</li><li>[1.16](./bcftools/1.16/)</li><li>[1.17](./bcftools/1.17/)</li><li>[1.18](bcftools/1.18/)</li><li>[1.19](./bcftools/1.19/)</li><li>[1.20](./bcftools/1.20/)</li><li>[1.20.c](./bcftools/1.20.c/)</li><li>[1.21](./bcftools/1.21/)</li></ul> | https://github.com/samtools/bcftools |
| [bedtools](https://hub.docker.com/r/staphb/bedtools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bedtools)](https://hub.docker.com/r/staphb/bedtools) | <ul><li>2.29.2</li><li>2.30.0</li><li>[2.31.0](bedtools/2.31.0/)</li><li>[2.31.1](bedtools/2.31.1/)</li></ul> | https://bedtools.readthedocs.io/en/latest/ <br/>https://github.com/arq5x/bedtools2 |
| [berrywood-report-env](https://hub.docker.com/r/staphb/berrywood-report-env/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/berrywood-report-env)](https://hub.docker.com/r/staphb/berrywood-report-env) | <ul><li>1.0</li></ul> | none |
| [bindashtree](https://hub.docker.com/r/staphb/bindashtree/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bindashtree)](https://hub.docker.com/r/staphb/bindashtree) | <ul><li>[0.1.0](./build-files/bindashtree/0.1.0/)</li></ul> | https://github.com/jianshu93/bindashtree |
| [blast+](https://hub.docker.com/r/staphb/blast/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/blast)](https://hub.docker.com/r/staphb/blast) | <ul><li>[2.13.0](blast/2.13.0/)</li><li>[2.14.0](blast/2.14.0/)</li><li>[2.14.1](blast/2.14.1/)</li><li>[2.15.0](blast/2.15.0/)</li><li>[2.16.0](./blast/2.16.0/)</li></ul> | https://www.ncbi.nlm.nih.gov/books/NBK279690/ |
| [bowtie2](https://hub.docker.com/r/staphb/bowtie2/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bowtie2)](https://hub.docker.com/r/staphb/bowtie2) | <ul><li>[2.4.4](./bowtie2/2.4.4/)</li><li>[2.4.5](./bowtie2/2.4.5/)</li><li>[2.5.1](./bowtie2/2.5.1/)</li><li>[2.5.3](./bowtie2/2.5.3/)</li><li>[2.5.4](./bowtie2/2.5.4/)</li></ul> | http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml <br/>https://github.com/BenLangmead/bowtie2 |
| [Bracken](https://hub.docker.com/r/staphb/bracken/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bracken)](https://hub.docker.com/r/staphb/bracken) | <ul><li>[2.9](./bracken/2.9)</li></ul> | https://ccb.jhu.edu/software/bracken/index.shtml?t=manual <br/>https://github.com/jenniferlu717/Bracken |
Expand Down
81 changes: 81 additions & 0 deletions build-files/bindashtree/0.1.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Stage 1: Build stage
FROM ubuntu:focal AS builder

# Set global variables
ARG BINDASHTREE_VER="0.1.0"

# Update package manager and install necessary tools
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
curl \
build-essential \
gcc \
pkg-config \
libssl-dev \
ca-certificates \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

# Install Rust and Cargo using rustup
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y && \
export PATH="$HOME/.cargo/bin:$PATH" && \
rustup default stable

# Ensure Rust and Cargo are available
ENV PATH="/root/.cargo/bin:$PATH"

# Download, extract, and build bindashtree
RUN wget https://github.com/jianshu93/bindashtree/archive/refs/tags/v${BINDASHTREE_VER}.tar.gz && \
tar -xzvf v${BINDASHTREE_VER}.tar.gz && \
cd bindashtree-${BINDASHTREE_VER} && \
/root/.cargo/bin/cargo build --release

# Stage 2: Final image
FROM ubuntu:focal AS app
ARG BINDASHTREE_VER="0.1.0"

# Install wget for test stage compatibility
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
ca-certificates \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

# Labels for metadata
LABEL base.image="ubuntu:focal" \
dockerfile.version="1" \
software="bindashtree" \
software.version="${BINDASHTREE_VER}" \
description="Binwise Densified MinHash and Rapid Neighbor-joining Tree Construction for microbial genomes." \
website="https://github.com/jianshu93/bindashtree" \
license.url="https://github.com/jianshu93/bindashtree?tab=MIT-1-ov-file#readme" \
maintainer="Taylor K. Paisie" \
maintainer.email="[email protected]"

# Copy built binaries from the builder stage
COPY --from=builder /bindashtree-${BINDASHTREE_VER}/target/release/bindashtree /usr/local/bin/

CMD ["bindashtree", "--help"]

WORKDIR /data

# Stage 3: Test stage
FROM app AS test

# Set working directory
WORKDIR /data/test

# Install wget if not installed (redundancy for safety)
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

# Download test files
RUN wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/587/385/GCA_002587385.1_ASM258738v1/GCA_002587385.1_ASM258738v1_genomic.fna.gz && \
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/596/765/GCA_002596765.1_ASM259676v1/GCA_002596765.1_ASM259676v1_genomic.fna.gz && \
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/598/005/GCA_002598005.1_ASM259800v1/GCA_002598005.1_ASM259800v1_genomic.fna.gz

RUN ls /data/test/*.fna.gz > name.txt

#### for highly similar genomes, e.g., > 99.9% ANI, a large sketch size should be used. -s 10204 works well for ANI below 99%.
RUN bindashtree -i name.txt -k 16 -s 10240 -d 1 -t 8 --output_tree try.nwk

FROM app
59 changes: 59 additions & 0 deletions build-files/bindashtree/0.1.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# bindashtree container

Main tool: [bindashtree](https://github.com/jianshu93/bindashtree)

Code repository: https://github.com/jianshu93/bindashtree

Basic information on how to use this tool:
- executable: |

```
Binwise Densified MinHash and Rapid Neighbor-joining Tree Construction

Usage: bindashtree [OPTIONS] --input <INPUT_LIST_FILE> --output_tree <OUTPUT_TREE_FILE>

Options:
-i, --input <INPUT_LIST_FILE>
Genome list file (one FASTA/FNA file per line), gz supported
-k, --kmer_size <KMER_SIZE>
K-mer size [default: 16]
-s, --sketch_size <SKETCH_SIZE>
MinHash sketch size [default: 10240]
-d, --densification <DENS_OPT>
Densification strategy: 0=Optimal Densification, 1=Reverse Optimal Densification/faster Densification [default: 0]
-t, --threads <THREADS>
Number of threads to use in parallel [default: 1]
--tree <TREE_METHOD>
Tree construction method: naive, rapidnj, hybrid [default: rapidnj]
--chunk_size <chunk_size>
Chunk size for RapidNJ/Hybrid methods [default: 30]
--naive_percentage <naive_percentage>
Percentage of steps naive for hybrid method [default: 90]
--output_matrix <OUTPUT_MATRIX_FILE>
Output the phylip distance matrix to a file
--output_tree <OUTPUT_TREE_FILE>
Output the resulting tree in Newick format to a file
-h, --help
Print help
-V, --version
Print version
```

Additional information:
One Permutation Hashing with Optimal Densification can be use for genomic distance estimation (1-ANI) and then we can perform rapid neighbor-joining based on the genomic distance. We also provided a new densification strategy called faster densification (or reverse optimal densification), which is more accurate and faster for large sketch size.


Full documentation: https://github.com/jianshu93/bindashtree

## Testing for bindashtree

```
# Download test files
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/587/385/GCA_002587385.1_ASM258738v1/GCA_002587385.1_ASM258738v1_genomic.fna.gz && \
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/596/765/GCA_002596765.1_ASM259676v1/GCA_002596765.1_ASM259676v1_genomic.fna.gz && \
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/598/005/GCA_002598005.1_ASM259800v1/GCA_002598005.1_ASM259800v1_genomic.fna.gz

ls /data/test/*.fna.gz > name.txt

bindashtree -i name.txt -k 16 -s 10240 -d 1 -t 8 --output_tree try.nwk
```
Loading