Move phylogenetic workflow from top-level to folder phylogenetic

nextstrain · Nov 17, 2023 · 20eb826 · 20eb826
1 parent 7b03fb0
commit 20eb826
Show file tree

Hide file tree

Showing 13 changed files with 95 additions and 83 deletions.
diff --git a/README.md b/README.md
@@ -1,88 +1,12 @@
-# nextstrain.org/zika
+# Nextstrain repository for Zika virus
 
-This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
-[nextstrain.org/zika](https://nextstrain.org/zika).
+This repository contains two workflows for the analysis of Zika virus data:
 
-The build encompasses fetching data, preparing it for analysis, doing quality
-control, performing analyses, and saving the results in a format suitable for
-visualization (with [auspice][]).  This involves running components of
-Nextstrain such as [fauna][] and [augur][].
+- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
+- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org
 
-All Zika-specific steps and functionality for the Nextstrain pipeline should be
-housed in this repository.
+Each folder contains a README.md with more information.
 
-_This build requires Augur v6._
+## Documentation
 
-[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)
-
-## Usage
-
-If you're unfamiliar with Nextstrain builds, you may want to follow our
-[quickstart guide][] first and then come back here.
-
-There are two main ways to run & visualise the output from this build:
-
-The first, and easiest, way to run this pathogen build is using the [Nextstrain
-command-line tool][nextstrain-cli]:
-```
-nextstrain build . 
-nextstrain view auspice/
-```
-
-See the [nextstrain-cli README][] for how to install the `nextstrain` command.
-
-The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
-The build may then be run via:
-```
-snakemake
-auspice --datasetDir auspice/
-```
-
-Build output goes into the directories `data/`, `results/` and `auspice/`.
-
-## Configuration
-
-Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
-specifies its file inputs and output and also its parameters. There is little redirection and each
-rule should be able to be reasoned with on its own.
-
-
-## Input data
-
-This build starts by downloading sequences from
-https://data.nextstrain.org/files/zika/sequences.fasta.xz
-and metadata from
-https://data.nextstrain.org/files/zika/metadata.tsv.gz.
-These are publicly provisioned data by the Nextstrain team by pulling sequences
-from NCBI GenBank via ViPR and performing 
-[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).
-
-Data from GenBank follows Open Data principles, such that we can make input data
-and intermediate files available for further analysis. Open Data is data that
-can be freely used, re-used and redistributed by anyone - subject only, at most,
-to the requirement to attribute and sharealike.
-
-We gratefully acknowledge the authors, originating and submitting laboratories
-of the genetic sequences and metadata for sharing their work in open databases.
-Please note that although data generators have generously shared data in an open
-fashion, that does not mean there should be free license to publish on this
-data. Data generators should be cited where possible and collaborations should
-be sought in some circumstances. Please try to avoid scooping someone else's
-work. Reach out if uncertain. Authors, paper references (where available) and
-links to GenBank entries are provided in the metadata file.
-
-A faster build process can be run working from example data by copying over
-sequences and metadata from `example_data/` to `data/` via:
-```
-mkdir -p data/
-cp -v example_data/* data/
-```
-
-[Nextstrain]: https://nextstrain.org
-[fauna]: https://github.com/nextstrain/fauna
-[augur]: https://github.com/nextstrain/augur
-[auspice]: https://github.com/nextstrain/auspice
-[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
-[nextstrain-cli]: https://github.com/nextstrain/cli
-[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
-[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
+- [Contributor documentation](./CONTRIBUTING.md)
diff --git a/phylogenetic/README.md b/phylogenetic/README.md
@@ -0,0 +1,88 @@
+# nextstrain.org/zika
+
+This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
+[nextstrain.org/zika](https://nextstrain.org/zika).
+
+The build encompasses fetching data, preparing it for analysis, doing quality
+control, performing analyses, and saving the results in a format suitable for
+visualization (with [auspice][]).  This involves running components of
+Nextstrain such as [fauna][] and [augur][].
+
+All Zika-specific steps and functionality for the Nextstrain pipeline should be
+housed in this repository.
+
+_This build requires Augur v6._
+
+[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)
+
+## Usage
+
+If you're unfamiliar with Nextstrain builds, you may want to follow our
+[quickstart guide][] first and then come back here.
+
+There are two main ways to run & visualise the output from this build:
+
+The first, and easiest, way to run this pathogen build is using the [Nextstrain
+command-line tool][nextstrain-cli]:
+```
+nextstrain build . 
+nextstrain view auspice/
+```
+
+See the [nextstrain-cli README][] for how to install the `nextstrain` command.
+
+The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
+The build may then be run via:
+```
+snakemake
+auspice --datasetDir auspice/
+```
+
+Build output goes into the directories `data/`, `results/` and `auspice/`.
+
+## Configuration
+
+Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
+specifies its file inputs and output and also its parameters. There is little redirection and each
+rule should be able to be reasoned with on its own.
+
+
+## Input data
+
+This build starts by downloading sequences from
+https://data.nextstrain.org/files/zika/sequences.fasta.xz
+and metadata from
+https://data.nextstrain.org/files/zika/metadata.tsv.gz.
+These are publicly provisioned data by the Nextstrain team by pulling sequences
+from NCBI GenBank via ViPR and performing 
+[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).
+
+Data from GenBank follows Open Data principles, such that we can make input data
+and intermediate files available for further analysis. Open Data is data that
+can be freely used, re-used and redistributed by anyone - subject only, at most,
+to the requirement to attribute and sharealike.
+
+We gratefully acknowledge the authors, originating and submitting laboratories
+of the genetic sequences and metadata for sharing their work in open databases.
+Please note that although data generators have generously shared data in an open
+fashion, that does not mean there should be free license to publish on this
+data. Data generators should be cited where possible and collaborations should
+be sought in some circumstances. Please try to avoid scooping someone else's
+work. Reach out if uncertain. Authors, paper references (where available) and
+links to GenBank entries are provided in the metadata file.
+
+A faster build process can be run working from example data by copying over
+sequences and metadata from `example_data/` to `data/` via:
+```
+mkdir -p data/
+cp -v example_data/* data/
+```
+
+[Nextstrain]: https://nextstrain.org
+[fauna]: https://github.com/nextstrain/fauna
+[augur]: https://github.com/nextstrain/augur
+[auspice]: https://github.com/nextstrain/auspice
+[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
+[nextstrain-cli]: https://github.com/nextstrain/cli
+[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
+[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
diff --git a/Snakefile → phylogenetic/Snakefile b/Snakefile → phylogenetic/Snakefile
diff --git a/config/auspice_config.json → phylogenetic/config/auspice_config.json b/config/auspice_config.json → phylogenetic/config/auspice_config.json
diff --git a/config/colors.tsv → phylogenetic/config/colors.tsv b/config/colors.tsv → phylogenetic/config/colors.tsv
diff --git a/config/config_zika.yaml → phylogenetic/config/config_zika.yaml b/config/config_zika.yaml → phylogenetic/config/config_zika.yaml
diff --git a/config/description.md → phylogenetic/config/description.md b/config/description.md → phylogenetic/config/description.md
diff --git a/config/dropped_strains.txt → phylogenetic/config/dropped_strains.txt b/config/dropped_strains.txt → phylogenetic/config/dropped_strains.txt
diff --git a/config/zika_reference.gb → phylogenetic/config/zika_reference.gb b/config/zika_reference.gb → phylogenetic/config/zika_reference.gb
diff --git a/example_data/metadata.tsv → phylogenetic/example_data/metadata.tsv b/example_data/metadata.tsv → phylogenetic/example_data/metadata.tsv
diff --git a/example_data/sequences.fasta → phylogenetic/example_data/sequences.fasta b/example_data/sequences.fasta → phylogenetic/example_data/sequences.fasta
diff --git a/scripts/check-countries-have-colors.sh → ...ic/scripts/check-countries-have-colors.sh b/scripts/check-countries-have-colors.sh → ...ic/scripts/check-countries-have-colors.sh
diff --git a/scripts/set_final_strain_name.py → ...ogenetic/scripts/set_final_strain_name.py b/scripts/set_final_strain_name.py → ...ogenetic/scripts/set_final_strain_name.py