Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bindashtree v0.1.0 #1146

Merged
merged 14 commits into from
Dec 30, 2024
Merged

Add bindashtree v0.1.0 #1146

merged 14 commits into from
Dec 30, 2024

Conversation

taylorpaisie
Copy link
Contributor

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The dockerfile successfully builds to a test target for the user creating the PR. (i.e. docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15 )
  • Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • (optional) All test files are located in same directory as the Dockerfile (i.e. shigatyper/2.0.1/test.sh)
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
    • If this README is longer than 30 lines, there is an explanation as to why more detail was needed
  • Dockerfile includes the recommended LABELS
  • Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
  • Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing

@erinyoung
Copy link
Contributor

It looks like the tests worked

#15 [test 5/5] RUN bindashtree -i name.txt -k 16 -s 10240 -d 1 -t 8 --output_tree try.nwk
#15 0.051 
#15 0.051  ************** initializing logger *****************
#15 0.051 
#15 0.052 Sketching all genomes...
#15 0.054 Building PHYLIP distance matrix...
#15 0.055 Constructing the tree...
#15 DONE 0.1s

Comment on lines 10 to 40
```
Binwise Densified MinHash and Rapid Neighbor-joining Tree Construction

Usage: bindashtree [OPTIONS] --input <INPUT_LIST_FILE> --output_tree <OUTPUT_TREE_FILE>

Options:
-i, --input <INPUT_LIST_FILE>
Genome list file (one FASTA/FNA file per line), gz supported
-k, --kmer_size <KMER_SIZE>
K-mer size [default: 16]
-s, --sketch_size <SKETCH_SIZE>
MinHash sketch size [default: 10240]
-d, --densification <DENS_OPT>
Densification strategy: 0=Optimal Densification, 1=Reverse Optimal Densification/faster Densification [default: 0]
-t, --threads <THREADS>
Number of threads to use in parallel [default: 1]
--tree <TREE_METHOD>
Tree construction method: naive, rapidnj, hybrid [default: rapidnj]
--chunk_size <chunk_size>
Chunk size for RapidNJ/Hybrid methods [default: 30]
--naive_percentage <naive_percentage>
Percentage of steps naive for hybrid method [default: 90]
--output_matrix <OUTPUT_MATRIX_FILE>
Output the phylip distance matrix to a file
--output_tree <OUTPUT_TREE_FILE>
Output the resulting tree in Newick format to a file
-h, --help
Print help
-V, --version
Print version
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the printed help message. I'm not going to make you change this, but I recommend following the format of

# <program> container

Main tool: [<program>](link to program)
  
Code repository: <url for code>

Additional tools:
- list: version

Basic information on how to use this tool:
- executable: <tool>
- help: <-h>
- version: <-v>
- description: <tool does something>

Additional information:

<Container contains X database at Y>
  
Full documentation: link to documentation or wiki

Because it helps some of the external parsers.


# Copy built binaries from the builder stage
COPY --from=builder /bindashtree-${BINDASHTREE_VER}/target/release/bindashtree /usr/local/bin/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a CMD line and WORDIR line at the end of the 'app' stage?

It'd be something like

CMD ["bindashtree", "--help"]

WORKDIR /data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok added it in the latest commit, thanks for the help!

@erinyoung
Copy link
Contributor

This looks great! But it does look like the 'app' stage is missing the CMD and WORKDIR layers. Can you add them?

@erinyoung
Copy link
Contributor

@taylorpaisie , I don't want to cause you any kind of alarm, but we updated the file structure of the repo... which probably messed you up in all kinds of ways

WE ARE SO SORRY

But also... there wasn't a great time to do it. Thank you for your efforts.

Dockerfiles and READMEs are now in StaPH-B/docker-builds/build-files/<tool> instead of StaPH-B/docker-builds/<tool>.

Also, this may mean that there are more GA errors. PLEASE LET US KNOW IF YOU ENCOUNTER THEM!!!

@erinyoung
Copy link
Contributor

The tests still work

#15 [test 3/5] RUN wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/587/385/GCA_002587385.1_ASM258738v1/GCA_002587385.1_ASM258738v1_genomic.fna.gz &&     wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/596/765/GCA_002596765.1_ASM259676v1/GCA_002596765.1_ASM259676v1_genomic.fna.gz &&     wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/598/005/GCA_002598005.1_ASM259800v1/GCA_002598005.1_ASM259800v1_genomic.fna.gz
#15 0.055 --2024-12-27 20:05:16--  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/587/385/GCA_002587385.1_ASM258738v1/GCA_002587385.1_ASM258738v1_genomic.fna.gz
#15 0.056 Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.13, 130.14.250.10, 130.14.250.11, ...
#15 0.083 Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:443... connected.
#15 0.106 HTTP request sent, awaiting response... 200 OK
#15 0.124 Length: 1922 (1.9K) [application/x-gzip]
#15 0.125 Saving to: 'GCA_002587385.1_ASM258738v1_genomic.fna.gz'
#15 0.125 
#15 0.125      0K .                                                     100% 88.5M=0s
#15 0.125 
#15 0.125 2024-12-27 20:05:16 (88.5 MB/s) - 'GCA_002587385.1_ASM258738v1_genomic.fna.gz' saved [1922/1922]
#15 0.125 
#15 0.128 --2024-12-27 20:05:16--  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/596/765/GCA_002596765.1_ASM259676v1/GCA_002596765.1_ASM259676v1_genomic.fna.gz
#15 0.129 Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.[250](https://github.com/StaPH-B/docker-builds/actions/runs/12520192578/job/34925496960#step:8:256).12, 130.14.250.13, 130.14.250.7, ...
#15 0.160 Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.12|:443... connected.
#15 0.184 HTTP request sent, awaiting response... 200 OK
#15 0.194 Length: 1890 (1.8K) [application/x-gzip]
#15 0.194 Saving to: 'GCA_002596765.1_ASM259676v1_genomic.fna.gz'
#15 0.194 
#15 0.194      0K .                                                     100% 88.1M=0s
#15 0.194 
#15 0.194 2024-12-27 20:05:17 (88.1 MB/s) - 'GCA_00[259](https://github.com/StaPH-B/docker-builds/actions/runs/12520192578/job/34925496960#step:8:265)6765.1_ASM259676v1_genomic.fna.gz' saved [1890/1890]
#15 0.194 
#15 0.197 --2024-12-27 20:05:17--  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/598/005/GCA_002598005.1_ASM259800v1/GCA_002598005.1_ASM259800v1_genomic.fna.gz
#15 0.198 Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.12, 130.14.250.10, 130.14.250.11, ...
#15 0.220 Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.12|:443... connected.
#15 0.243 HTTP request sent, awaiting response... 200 OK
#15 0.331 Length: 1928 (1.9K) [application/x-gzip]
#15 0.332 Saving to: 'GCA_002598005.1_ASM259800v1_genomic.fna.gz'
#15 0.332 
#15 0.332      0K .                                                     100% 95.5M=0s
#15 0.332 
#15 0.332 2024-12-27 20:05:17 (95.5 MB/s) - 'GCA_002598005.1_ASM259800v1_genomic.fna.gz' saved [1928/1928]
#15 0.332 
#15 DONE 0.3s

#16 [test 4/5] RUN ls /data/test/*.fna.gz > name.txt
#16 DONE 0.1s

#17 [test 5/5] RUN bindashtree -i name.txt -k 16 -s 10240 -d 1 -t 8 --output_tree try.nwk
#17 0.049 
#17 0.049  ************** initializing logger *****************
#17 0.049 
#17 0.051 Sketching all genomes...
#17 0.052 Building PHYLIP distance matrix...
#17 0.052 Constructing the tree...
#17 DONE 0.1s

Copy link
Contributor

@erinyoung erinyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making those changes! I adjusted the hyperlink in the main readme, but once the github actions finish I will merge this PR and get this deployed.

@erinyoung erinyoung merged commit 1420679 into StaPH-B:master Dec 30, 2024
2 checks passed
@erinyoung
Copy link
Contributor

Thank you for putting this together! You can check the status of the deployment at https://github.com/StaPH-B/docker-builds/actions/runs/12550397445

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants