From 20eb826064c5441eccd96cef85810053df2fcdb2 Mon Sep 17 00:00:00 2001
From: Jennifer Chang <jennifer.chang.bioinform@gmail.com>
Date: Fri, 17 Nov 2023 10:18:45 -0800
Subject: [PATCH] Move phylogenetic workflow from top-level to folder
 phylogenetic

---
 README.md                                     | 90 ++-----------------
 phylogenetic/README.md                        | 88 ++++++++++++++++++
 Snakefile => phylogenetic/Snakefile           |  0
 .../config}/auspice_config.json               |  0
 {config => phylogenetic/config}/colors.tsv    |  0
 .../config}/config_zika.yaml                  |  0
 .../config}/description.md                    |  0
 .../config}/dropped_strains.txt               |  0
 .../config}/zika_reference.gb                 |  0
 .../example_data}/metadata.tsv                |  0
 .../example_data}/sequences.fasta             |  0
 .../scripts}/check-countries-have-colors.sh   |  0
 .../scripts}/set_final_strain_name.py         |  0
 13 files changed, 95 insertions(+), 83 deletions(-)
 create mode 100644 phylogenetic/README.md
 rename Snakefile => phylogenetic/Snakefile (100%)
 rename {config => phylogenetic/config}/auspice_config.json (100%)
 rename {config => phylogenetic/config}/colors.tsv (100%)
 rename {config => phylogenetic/config}/config_zika.yaml (100%)
 rename {config => phylogenetic/config}/description.md (100%)
 rename {config => phylogenetic/config}/dropped_strains.txt (100%)
 rename {config => phylogenetic/config}/zika_reference.gb (100%)
 rename {example_data => phylogenetic/example_data}/metadata.tsv (100%)
 rename {example_data => phylogenetic/example_data}/sequences.fasta (100%)
 rename {scripts => phylogenetic/scripts}/check-countries-have-colors.sh (100%)
 rename {scripts => phylogenetic/scripts}/set_final_strain_name.py (100%)

diff --git a/README.md b/README.md
index 568bb03..ee1a6e6 100644
--- a/README.md
+++ b/README.md
@@ -1,88 +1,12 @@
-# nextstrain.org/zika
+# Nextstrain repository for Zika virus
 
-This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
-[nextstrain.org/zika](https://nextstrain.org/zika).
+This repository contains two workflows for the analysis of Zika virus data:
 
-The build encompasses fetching data, preparing it for analysis, doing quality
-control, performing analyses, and saving the results in a format suitable for
-visualization (with [auspice][]).  This involves running components of
-Nextstrain such as [fauna][] and [augur][].
+- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
+- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org
 
-All Zika-specific steps and functionality for the Nextstrain pipeline should be
-housed in this repository.
+Each folder contains a README.md with more information.
 
-_This build requires Augur v6._
+## Documentation
 
-[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)
-
-## Usage
-
-If you're unfamiliar with Nextstrain builds, you may want to follow our
-[quickstart guide][] first and then come back here.
-
-There are two main ways to run & visualise the output from this build:
-
-The first, and easiest, way to run this pathogen build is using the [Nextstrain
-command-line tool][nextstrain-cli]:
-```
-nextstrain build . 
-nextstrain view auspice/
-```
-
-See the [nextstrain-cli README][] for how to install the `nextstrain` command.
-
-The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
-The build may then be run via:
-```
-snakemake
-auspice --datasetDir auspice/
-```
-
-Build output goes into the directories `data/`, `results/` and `auspice/`.
-
-## Configuration
-
-Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
-specifies its file inputs and output and also its parameters. There is little redirection and each
-rule should be able to be reasoned with on its own.
-
-
-## Input data
-
-This build starts by downloading sequences from
-https://data.nextstrain.org/files/zika/sequences.fasta.xz
-and metadata from
-https://data.nextstrain.org/files/zika/metadata.tsv.gz.
-These are publicly provisioned data by the Nextstrain team by pulling sequences
-from NCBI GenBank via ViPR and performing 
-[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).
-
-Data from GenBank follows Open Data principles, such that we can make input data
-and intermediate files available for further analysis. Open Data is data that
-can be freely used, re-used and redistributed by anyone - subject only, at most,
-to the requirement to attribute and sharealike.
-
-We gratefully acknowledge the authors, originating and submitting laboratories
-of the genetic sequences and metadata for sharing their work in open databases.
-Please note that although data generators have generously shared data in an open
-fashion, that does not mean there should be free license to publish on this
-data. Data generators should be cited where possible and collaborations should
-be sought in some circumstances. Please try to avoid scooping someone else's
-work. Reach out if uncertain. Authors, paper references (where available) and
-links to GenBank entries are provided in the metadata file.
-
-A faster build process can be run working from example data by copying over
-sequences and metadata from `example_data/` to `data/` via:
-```
-mkdir -p data/
-cp -v example_data/* data/
-```
-
-[Nextstrain]: https://nextstrain.org
-[fauna]: https://github.com/nextstrain/fauna
-[augur]: https://github.com/nextstrain/augur
-[auspice]: https://github.com/nextstrain/auspice
-[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
-[nextstrain-cli]: https://github.com/nextstrain/cli
-[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
-[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
+- [Contributor documentation](./CONTRIBUTING.md)
diff --git a/phylogenetic/README.md b/phylogenetic/README.md
new file mode 100644
index 0000000..568bb03
--- /dev/null
+++ b/phylogenetic/README.md
@@ -0,0 +1,88 @@
+# nextstrain.org/zika
+
+This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
+[nextstrain.org/zika](https://nextstrain.org/zika).
+
+The build encompasses fetching data, preparing it for analysis, doing quality
+control, performing analyses, and saving the results in a format suitable for
+visualization (with [auspice][]).  This involves running components of
+Nextstrain such as [fauna][] and [augur][].
+
+All Zika-specific steps and functionality for the Nextstrain pipeline should be
+housed in this repository.
+
+_This build requires Augur v6._
+
+[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)
+
+## Usage
+
+If you're unfamiliar with Nextstrain builds, you may want to follow our
+[quickstart guide][] first and then come back here.
+
+There are two main ways to run & visualise the output from this build:
+
+The first, and easiest, way to run this pathogen build is using the [Nextstrain
+command-line tool][nextstrain-cli]:
+```
+nextstrain build . 
+nextstrain view auspice/
+```
+
+See the [nextstrain-cli README][] for how to install the `nextstrain` command.
+
+The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
+The build may then be run via:
+```
+snakemake
+auspice --datasetDir auspice/
+```
+
+Build output goes into the directories `data/`, `results/` and `auspice/`.
+
+## Configuration
+
+Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
+specifies its file inputs and output and also its parameters. There is little redirection and each
+rule should be able to be reasoned with on its own.
+
+
+## Input data
+
+This build starts by downloading sequences from
+https://data.nextstrain.org/files/zika/sequences.fasta.xz
+and metadata from
+https://data.nextstrain.org/files/zika/metadata.tsv.gz.
+These are publicly provisioned data by the Nextstrain team by pulling sequences
+from NCBI GenBank via ViPR and performing 
+[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).
+
+Data from GenBank follows Open Data principles, such that we can make input data
+and intermediate files available for further analysis. Open Data is data that
+can be freely used, re-used and redistributed by anyone - subject only, at most,
+to the requirement to attribute and sharealike.
+
+We gratefully acknowledge the authors, originating and submitting laboratories
+of the genetic sequences and metadata for sharing their work in open databases.
+Please note that although data generators have generously shared data in an open
+fashion, that does not mean there should be free license to publish on this
+data. Data generators should be cited where possible and collaborations should
+be sought in some circumstances. Please try to avoid scooping someone else's
+work. Reach out if uncertain. Authors, paper references (where available) and
+links to GenBank entries are provided in the metadata file.
+
+A faster build process can be run working from example data by copying over
+sequences and metadata from `example_data/` to `data/` via:
+```
+mkdir -p data/
+cp -v example_data/* data/
+```
+
+[Nextstrain]: https://nextstrain.org
+[fauna]: https://github.com/nextstrain/fauna
+[augur]: https://github.com/nextstrain/augur
+[auspice]: https://github.com/nextstrain/auspice
+[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
+[nextstrain-cli]: https://github.com/nextstrain/cli
+[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
+[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
diff --git a/Snakefile b/phylogenetic/Snakefile
similarity index 100%
rename from Snakefile
rename to phylogenetic/Snakefile
diff --git a/config/auspice_config.json b/phylogenetic/config/auspice_config.json
similarity index 100%
rename from config/auspice_config.json
rename to phylogenetic/config/auspice_config.json
diff --git a/config/colors.tsv b/phylogenetic/config/colors.tsv
similarity index 100%
rename from config/colors.tsv
rename to phylogenetic/config/colors.tsv
diff --git a/config/config_zika.yaml b/phylogenetic/config/config_zika.yaml
similarity index 100%
rename from config/config_zika.yaml
rename to phylogenetic/config/config_zika.yaml
diff --git a/config/description.md b/phylogenetic/config/description.md
similarity index 100%
rename from config/description.md
rename to phylogenetic/config/description.md
diff --git a/config/dropped_strains.txt b/phylogenetic/config/dropped_strains.txt
similarity index 100%
rename from config/dropped_strains.txt
rename to phylogenetic/config/dropped_strains.txt
diff --git a/config/zika_reference.gb b/phylogenetic/config/zika_reference.gb
similarity index 100%
rename from config/zika_reference.gb
rename to phylogenetic/config/zika_reference.gb
diff --git a/example_data/metadata.tsv b/phylogenetic/example_data/metadata.tsv
similarity index 100%
rename from example_data/metadata.tsv
rename to phylogenetic/example_data/metadata.tsv
diff --git a/example_data/sequences.fasta b/phylogenetic/example_data/sequences.fasta
similarity index 100%
rename from example_data/sequences.fasta
rename to phylogenetic/example_data/sequences.fasta
diff --git a/scripts/check-countries-have-colors.sh b/phylogenetic/scripts/check-countries-have-colors.sh
similarity index 100%
rename from scripts/check-countries-have-colors.sh
rename to phylogenetic/scripts/check-countries-have-colors.sh
diff --git a/scripts/set_final_strain_name.py b/phylogenetic/scripts/set_final_strain_name.py
similarity index 100%
rename from scripts/set_final_strain_name.py
rename to phylogenetic/scripts/set_final_strain_name.py