Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds dockerfile for pangolin 4.3.1 and pdata 1.23.1 #777

Merged
merged 2 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [NCBI table2asn](https://hub.docker.com/r/staphb/ncbi-table2asn) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/ncbi-table2asn)](https://hub.docker.com/r/staphb/ncbi-table2asn) | <ul><li>1.26.678</li></ul> | [https://www.ncbi.nlm.nih.gov/genbank/table2asn/](https://www.ncbi.nlm.nih.gov/genbank/table2asn/) <br/>[https://ftp.ncbi.nlm.nih.gov/asn1-converters/versions/2022-06-14/by_program/table2asn/](https://ftp.ncbi.nlm.nih.gov/asn1-converters/versions/2022-06-14/by_program/table2asn/) |
| [OrthoFinder](https://hub.docker.com/r/staphb/OrthoFinder) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/orthofinder)](https://hub.docker.com/r/staphb/orthofinder) | <ul><li>2.17</li></ul> | https://github.com/davidemms/OrthoFinder |
| [Panaroo](https://hub.docker.com/r/staphb/panaroo) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/panaroo)](https://hub.docker.com/r/staphb/panaroo) | <ul><li>1.2.10</li></ul> | https://github.com/gtonkinhill/panaroo |
| [Pangolin](https://hub.docker.com/r/staphb/pangolin) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/pangolin)](https://hub.docker.com/r/staphb/pangolin) | <details><summary> Click to see Pangolin v4.2 and older versions! </summary> **Pangolin version & pangoLEARN data release date** <ul><li>1.1.14</li><li>2.0.4 & 2020-07-20</li><li>2.0.5 & 2020-07-20</li><li>2.1.1 & 2020-12-17</li><li>2.1.3 & 2020-12-17</li><li>2.1.6 & 2021-01-06</li><li>2.1.7 & 2021-01-11</li><li>2.1.7 & 2021-01-20</li><li>2.1.8 & 2021-01-22</li><li>2.1.10 & 2021-02-01</li><li>2.1.11 & 2021-02-01</li><li>2.1.11 & 2021-02-05</li><li>2.2.1 & 2021-02-06</li><li>2.2.2 & 2021-02-06</li><li>2.2.2 & 2021-02-11</li><li>2.2.2 & 2021-02-12</li><li>2.3.0 & 2021-02-12</li><li>2.3.0 & 2021-02-18</li><li>2.3.0 & 2021-02-21</li><li>2.3.2 & 2021-02-21</li><li>2.3.3 & 2021-03-16</li><li>2.3.4 & 2021-03-16</li><li>2.3.5 & 2021-03-16</li><li>2.3.6 & 2021-03-16</li><li>2.3.6 & 2021-03-29</li><li>2.3.8 & 2021-04-01</li><li>2.3.8 & 2021-04-14</li><li>2.3.8 & 2021-04-21</li><li>2.3.8 & 2021-04-23</li><li>2.4 & 2021-04-28</li><li>2.4.1 & 2021-04-28</li><li>2.4.2 & 2021-04-28</li><li>2.4.2 & 2021-05-10</li><li>2.4.2 & 2021-05-11</li><li>2.4.2 & 2021-05-19</li><li>3.0.5 & 2021-06-05</li><li>3.1.3 & 2021-06-15</li><li>3.1.5 & 2021-06-15</li><li>3.1.5 & 2021-07-07-2</li><li>3.1.7 & 2021-07-09</li><li>3.1.8 & 2021-07-28</li><li>3.1.10 & 2021-07-28</li><li>3.1.11 & 2021-08-09</li><li>3.1.11 & 2021-08-24</li><li>3.1.11 & 2021-09-17</li><li>3.1.14 & 2021-09-28</li><li>3.1.14 & 2021-10-13</li><li>3.1.16 & 2021-10-18</li><li>3.1.16 & 2021-11-04</li><li>3.1.16 & 2021-11-09</li><li>3.1.16 & 2021-11-18</li><li>3.1.16 & 2021-11-25</li><li>3.1.17 & 2021-11-25</li><li>3.1.17 & 2021-12-06</li><li>3.1.17 & 2022-01-05</li><li>3.1.18 & 2022-01-20</li><li>3.1.19 & 2022-01-20</li><li>3.1.20 & 2022-02-02</li><li>3.1.20 & 2022-02-28</li></ul> **Pangolin version & pangolin-data version** <ul><li>4.0 & 1.2.133</li><li>4.0.1 & 1.2.133</li><li>4.0.2 & 1.2.133</li><li>4.0.3 & 1.2.133</li><li>4.0.4 & 1.2.133</li><li>4.0.5 & 1.3</li><li>4.0.6 & 1.6</li><li>4.0.6 & 1.8</li><li>4.0.6 & 1.9</li><li>4.1.1 & 1.11</li><li>4.1.2 & 1.12</li><li>4.1.2 & 1.13</li><li>4.1.2 & 1.14</li><li>4.1.3 & 1.15.1</li><li>4.1.3 & 1.16</li><li>4.1.3 & 1.17</li><li>4.2 & 1.18</li><li>4.2 & 1.18.1</li><li>4.2 & 1.18.1.1</li><li>4.2 & 1.19</li></ul> </details> **Pangolin version & pangolin-data version** <ul><li>[4.3 & 1.20](pangolin/4.3-pdata-1.20/)</li><li>[4.3 & 1.21](pangolin/4.3-pdata-1.21/)</li><li>[4.3.1 & 1.22](pangolin/4.3.1-pdata-1.22/)</li><li>[4.3.1 & 1.23](pangolin/4.3.1-pdata-1.23/)</li></ul> | https://github.com/cov-lineages/pangolin<br/>https://github.com/cov-lineages/pangoLEARN<br/>https://github.com/cov-lineages/pango-designation<br/>https://github.com/cov-lineages/scorpio<br/>https://github.com/cov-lineages/constellations<br/>https://github.com/cov-lineages/lineages (archived)<br/>https://github.com/hCoV-2019/pangolin (archived) |
| [Pangolin](https://hub.docker.com/r/staphb/pangolin) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/pangolin)](https://hub.docker.com/r/staphb/pangolin) | <details><summary> Click to see Pangolin v4.2 and older versions! </summary> **Pangolin version & pangoLEARN data release date** <ul><li>1.1.14</li><li>2.0.4 & 2020-07-20</li><li>2.0.5 & 2020-07-20</li><li>2.1.1 & 2020-12-17</li><li>2.1.3 & 2020-12-17</li><li>2.1.6 & 2021-01-06</li><li>2.1.7 & 2021-01-11</li><li>2.1.7 & 2021-01-20</li><li>2.1.8 & 2021-01-22</li><li>2.1.10 & 2021-02-01</li><li>2.1.11 & 2021-02-01</li><li>2.1.11 & 2021-02-05</li><li>2.2.1 & 2021-02-06</li><li>2.2.2 & 2021-02-06</li><li>2.2.2 & 2021-02-11</li><li>2.2.2 & 2021-02-12</li><li>2.3.0 & 2021-02-12</li><li>2.3.0 & 2021-02-18</li><li>2.3.0 & 2021-02-21</li><li>2.3.2 & 2021-02-21</li><li>2.3.3 & 2021-03-16</li><li>2.3.4 & 2021-03-16</li><li>2.3.5 & 2021-03-16</li><li>2.3.6 & 2021-03-16</li><li>2.3.6 & 2021-03-29</li><li>2.3.8 & 2021-04-01</li><li>2.3.8 & 2021-04-14</li><li>2.3.8 & 2021-04-21</li><li>2.3.8 & 2021-04-23</li><li>2.4 & 2021-04-28</li><li>2.4.1 & 2021-04-28</li><li>2.4.2 & 2021-04-28</li><li>2.4.2 & 2021-05-10</li><li>2.4.2 & 2021-05-11</li><li>2.4.2 & 2021-05-19</li><li>3.0.5 & 2021-06-05</li><li>3.1.3 & 2021-06-15</li><li>3.1.5 & 2021-06-15</li><li>3.1.5 & 2021-07-07-2</li><li>3.1.7 & 2021-07-09</li><li>3.1.8 & 2021-07-28</li><li>3.1.10 & 2021-07-28</li><li>3.1.11 & 2021-08-09</li><li>3.1.11 & 2021-08-24</li><li>3.1.11 & 2021-09-17</li><li>3.1.14 & 2021-09-28</li><li>3.1.14 & 2021-10-13</li><li>3.1.16 & 2021-10-18</li><li>3.1.16 & 2021-11-04</li><li>3.1.16 & 2021-11-09</li><li>3.1.16 & 2021-11-18</li><li>3.1.16 & 2021-11-25</li><li>3.1.17 & 2021-11-25</li><li>3.1.17 & 2021-12-06</li><li>3.1.17 & 2022-01-05</li><li>3.1.18 & 2022-01-20</li><li>3.1.19 & 2022-01-20</li><li>3.1.20 & 2022-02-02</li><li>3.1.20 & 2022-02-28</li></ul> **Pangolin version & pangolin-data version** <ul><li>4.0 & 1.2.133</li><li>4.0.1 & 1.2.133</li><li>4.0.2 & 1.2.133</li><li>4.0.3 & 1.2.133</li><li>4.0.4 & 1.2.133</li><li>4.0.5 & 1.3</li><li>4.0.6 & 1.6</li><li>4.0.6 & 1.8</li><li>4.0.6 & 1.9</li><li>4.1.1 & 1.11</li><li>4.1.2 & 1.12</li><li>4.1.2 & 1.13</li><li>4.1.2 & 1.14</li><li>4.1.3 & 1.15.1</li><li>4.1.3 & 1.16</li><li>4.1.3 & 1.17</li><li>4.2 & 1.18</li><li>4.2 & 1.18.1</li><li>4.2 & 1.18.1.1</li><li>4.2 & 1.19</li></ul> </details> **Pangolin version & pangolin-data version** <ul><li>[4.3 & 1.20](pangolin/4.3-pdata-1.20/)</li><li>[4.3 & 1.21](pangolin/4.3-pdata-1.21/)</li><li>[4.3.1 & 1.22](pangolin/4.3.1-pdata-1.22/)</li><li>[4.3.1 & 1.23](pangolin/4.3.1-pdata-1.23/)</li><li>[4.3.1 & 1.23.1](pangolin/4.3.1-pdata-1.23.1/)</li></ul> | https://github.com/cov-lineages/pangolin<br/>https://github.com/cov-lineages/pangoLEARN<br/>https://github.com/cov-lineages/pango-designation<br/>https://github.com/cov-lineages/scorpio<br/>https://github.com/cov-lineages/constellations<br/>https://github.com/cov-lineages/lineages (archived)<br/>https://github.com/hCoV-2019/pangolin (archived) |
| [parallel-perl](https://hub.docker.com/r/staphb/parallel-perl) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/parallel-perl)](https://hub.docker.com/r/staphb/parallel-perl) | <ul><li>20200722</li></ul> | https://www.gnu.org/software/parallel |
| [pasty](https://hub.docker.com/r/staphb/pasty) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/pasty)](https://hub.docker.com/r/staphb/pasty) | <ul><li>1.0.2</li></ul> | https://github.com/rpetit3/pasty |
| [pbptyper](https://hub.docker.com/r/staphb/pbptyper) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/pbptyper)](https://hub.docker.com/r/staphb/pbptyper) | <ul><li>1.0.0</li><li>1.0.1</li><li>1.0.4</li></ul> | https://github.com/rpetit3/pbptyper |
Expand Down
165 changes: 165 additions & 0 deletions pangolin/4.3.1-pdata-1.23.1/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
FROM mambaorg/micromamba:1.5.1 as app

# build and run as root users since micromamba image has 'mambauser' set as the $USER
USER root
# set workdir to default for building; set to /data at the end
WORKDIR /

# ARG variables only persist during build time
# had to include the v for some of these due to GitHub tags.
# using pangolin-data github tag, NOT what is in the GH release title "v1.2.133"
ARG PANGOLIN_VER="v4.3.1"
ARG PANGOLIN_DATA_VER="v1.23.1"
ARG SCORPIO_VER="v0.3.19"
ARG CONSTELLATIONS_VER="v0.1.12"
ARG USHER_VER="0.6.2"

# metadata labels
LABEL base.image="mambaorg/micromamba:1.5.1"
LABEL dockerfile.version="1"
LABEL software="pangolin"
LABEL software.version=${PANGOLIN_VER}
LABEL description="Conda environment for Pangolin. Pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages."
LABEL website="https://github.com/cov-lineages/pangolin"
LABEL license="GNU General Public License v3.0"
LABEL license.url="https://github.com/cov-lineages/pangolin/blob/master/LICENSE.txt"
LABEL maintainer="Curtis Kapsak"
LABEL maintainer.email="[email protected]"

# install dependencies; cleanup apt garbage
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
ca-certificates \
git \
procps \
bsdmainutils && \
apt-get autoclean && rm -rf /var/lib/apt/lists/*

# get the pangolin repo
RUN wget "https://github.com/cov-lineages/pangolin/archive/${PANGOLIN_VER}.tar.gz" && \
tar -xf ${PANGOLIN_VER}.tar.gz && \
rm -v ${PANGOLIN_VER}.tar.gz && \
mv -v pangolin-* pangolin

# set the environment; PATH is unnecessary here, but leaving anyways. It's reset later in dockerfile
ENV PATH="$PATH" \
LC_ALL=C.UTF-8

# modify environment.yml to pin specific versions during install
# create the conda environment using modified environment.yml
RUN sed -i "s|usher.*|usher=${USHER_VER}|" /pangolin/environment.yml && \
sed -i "s|scorpio.git|scorpio.git@${SCORPIO_VER}|" /pangolin/environment.yml && \
sed -i "s|pangolin-data.git|pangolin-data.git@${PANGOLIN_DATA_VER}|" /pangolin/environment.yml && \
sed -i "s|constellations.git|constellations.git@${CONSTELLATIONS_VER}|" /pangolin/environment.yml && \
micromamba create -n pangolin -y -f /pangolin/environment.yml

# so that mamba/conda env is active when running below commands
ENV ENV_NAME="pangolin"
ARG MAMBA_DOCKERFILE_ACTIVATE=1

WORKDIR /pangolin

# run pip install step; download optional pre-computed assignment hashes for UShER (useful for running on large batches of samples)
# best to skip using the assigment-cache if running on one sample for speed
# print versions
RUN pip install . && \
pangolin --add-assignment-cache && \
micromamba clean -a -y && \
mkdir /data && \
pangolin --all-versions && \
usher --version

WORKDIR /data

# hardcode pangolin executable into the PATH variable
ENV PATH="${PATH}:/opt/conda/envs/pangolin/bin/"

# default command is to pull up help options for virulencefinder; can be overridden of course
CMD ["pangolin", "-h"]

# new base for testing
FROM app as test

# so that mamba/conda env is active when running below commands
ENV ENV_NAME="pangolin"
ARG MAMBA_DOCKERFILE_ACTIVATE=1

# test on test sequences supplied with Pangolin code
RUN pangolin /pangolin/pangolin/test/test_seqs.fasta --analysis-mode usher -o /data/test_seqs-output-pusher && \
column -t -s, /data/test_seqs-output-pusher/lineage_report.csv

# test functionality of assignment-cache option
RUN pangolin --use-assignment-cache /pangolin/pangolin/test/test_seqs.fasta

# download B.1.1.7 genome from Utah
ADD https://raw.githubusercontent.com/StaPH-B/docker-builds/master/tests/SARS-CoV-2/SRR13957123.consensus.fa /test-data/SRR13957123.consensus.fa

# test on a B.1.1.7 genome
RUN pangolin /test-data/SRR13957123.consensus.fa --analysis-mode usher -o /test-data/SRR13957123-pusher && \
column -t -s, /test-data/SRR13957123-pusher/lineage_report.csv

# install unzip for unzipping zip archive from NCBI
RUN apt-get update && apt-get install -y --no-install-recommends unzip

# install ncbi datasets tool (pre-compiled binary); place in $PATH
RUN wget https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets && \
chmod +x datasets && \
mv -v datasets /usr/local/bin

# download assembly for a BA.1 from Florida (https://www.ncbi.nlm.nih.gov/biosample?term=SAMN29506515 and https://www.ncbi.nlm.nih.gov/nuccore/ON924087)
# run pangolin in usher analysis mode
RUN datasets download virus genome accession ON924087.1 --filename ON924087.1.zip && \
unzip ON924087.1.zip && rm ON924087.1.zip && \
mv -v ncbi_dataset/data/genomic.fna ON924087.1.genomic.fna && \
rm -vr ncbi_dataset/ README.md && \
pangolin ON924087.1.genomic.fna --analysis-mode usher -o ON924087.1-usher && \
column -t -s, ON924087.1-usher/lineage_report.csv

# test specific for new lineage, XBB.1.16, introduced in pangolin-data v1.19
# using this assembly: https://www.ncbi.nlm.nih.gov/nuccore/2440446687
# biosample here: https://www.ncbi.nlm.nih.gov/biosample?term=SAMN33060589
# one of the sample included in initial pango-designation here: https://github.com/cov-lineages/pango-designation/issues/1723
RUN datasets download virus genome accession OQ381818.1 --filename OQ381818.1.zip && \
unzip OQ381818.1.zip && rm OQ381818.1.zip && \
mv -v ncbi_dataset/data/genomic.fna OQ381818.1.genomic.fna && \
rm -vr ncbi_dataset/ README.md && \
pangolin OQ381818.1.genomic.fna --analysis-mode usher -o OQ381818.1-usher && \
column -t -s, OQ381818.1-usher/lineage_report.csv

# testing another XBB.1.16, trying to test scorpio functionality. Want pangolin to NOT assign lineage based on pango hash match.
# this test runs as expected, uses scorpio to check for constellation of mutations, then assign using PUSHER placement
RUN datasets download virus genome accession OR177999.1 --filename OR177999.1.zip && \
unzip OR177999.1.zip && rm OR177999.1.zip && \
mv -v ncbi_dataset/data/genomic.fna OR177999.1.genomic.fna && \
rm -vr ncbi_dataset/ README.md && \
pangolin OR177999.1.genomic.fna --analysis-mode usher -o OR177999.1-usher && \
column -t -s, OR177999.1-usher/lineage_report.csv

## test for BA.2.86
# virus identified in MI: https://www.ncbi.nlm.nih.gov/nuccore/OR461132.1
RUN datasets download virus genome accession OR461132.1 --filename OR461132.1.zip && \
unzip OR461132.1.zip && rm OR461132.1.zip && \
mv -v ncbi_dataset/data/genomic.fna OR461132.1.genomic.fna && \
rm -vr ncbi_dataset/ README.md && \
pangolin OR461132.1.genomic.fna --analysis-mode usher -o OR461132.1-usher && \
column -t -s, OR461132.1-usher/lineage_report.csv

## test for JN.2 (BA.2.86 sublineage) JN.2 is an alias of B.1.1.529.2.86.1.2
# NY CDC Quest sample: https://www.ncbi.nlm.nih.gov/nuccore/OR598183
RUN datasets download virus genome accession OR598183.1 --filename OR598183.1.zip && \
unzip OR598183.1.zip && rm OR598183.1.zip && \
mv -v ncbi_dataset/data/genomic.fna OR598183.1.genomic.fna && \
rm -vr ncbi_dataset/ README.md && \
pangolin OR598183.1.genomic.fna --analysis-mode usher -o OR598183.1-usher && \
column -t -s, OR598183.1-usher/lineage_report.csv

## test for JQ.1 (BA.2.86.3 sublineage); JQ.1 is an alias of B.1.1.529.2.86.3.1
# THANK YOU ERIN AND UPHL!! https://www.ncbi.nlm.nih.gov/nuccore/OR716684
# this test is important due to the fact that this lineage was included in the UShER tree, despite being designated after the pangolin-designation 1.23 release
# it previously caused and error/bug in pangolin, but now is fixed
RUN datasets download virus genome accession OR716684.1 --filename OR716684.1.zip && \
unzip OR716684.1.zip && rm OR716684.1.zip && \
mv -v ncbi_dataset/data/genomic.fna OR716684.1.genomic.fna && \
rm -vr ncbi_dataset/ README.md && \
pangolin OR716684.1.genomic.fna --analysis-mode usher -o OR716684.1-usher && \
column -t -s, OR716684.1-usher/lineage_report.csv
Loading