From 20eb826064c5441eccd96cef85810053df2fcdb2 Mon Sep 17 00:00:00 2001 From: Jennifer Chang Date: Fri, 17 Nov 2023 10:18:45 -0800 Subject: [PATCH] Move phylogenetic workflow from top-level to folder phylogenetic --- README.md | 90 ++----------------- phylogenetic/README.md | 88 ++++++++++++++++++ Snakefile => phylogenetic/Snakefile | 0 .../config}/auspice_config.json | 0 {config => phylogenetic/config}/colors.tsv | 0 .../config}/config_zika.yaml | 0 .../config}/description.md | 0 .../config}/dropped_strains.txt | 0 .../config}/zika_reference.gb | 0 .../example_data}/metadata.tsv | 0 .../example_data}/sequences.fasta | 0 .../scripts}/check-countries-have-colors.sh | 0 .../scripts}/set_final_strain_name.py | 0 13 files changed, 95 insertions(+), 83 deletions(-) create mode 100644 phylogenetic/README.md rename Snakefile => phylogenetic/Snakefile (100%) rename {config => phylogenetic/config}/auspice_config.json (100%) rename {config => phylogenetic/config}/colors.tsv (100%) rename {config => phylogenetic/config}/config_zika.yaml (100%) rename {config => phylogenetic/config}/description.md (100%) rename {config => phylogenetic/config}/dropped_strains.txt (100%) rename {config => phylogenetic/config}/zika_reference.gb (100%) rename {example_data => phylogenetic/example_data}/metadata.tsv (100%) rename {example_data => phylogenetic/example_data}/sequences.fasta (100%) rename {scripts => phylogenetic/scripts}/check-countries-have-colors.sh (100%) rename {scripts => phylogenetic/scripts}/set_final_strain_name.py (100%) diff --git a/README.md b/README.md index 568bb03..ee1a6e6 100644 --- a/README.md +++ b/README.md @@ -1,88 +1,12 @@ -# nextstrain.org/zika +# Nextstrain repository for Zika virus -This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at -[nextstrain.org/zika](https://nextstrain.org/zika). +This repository contains two workflows for the analysis of Zika virus data: -The build encompasses fetching data, preparing it for analysis, doing quality -control, performing analyses, and saving the results in a format suitable for -visualization (with [auspice][]). This involves running components of -Nextstrain such as [fauna][] and [augur][]. +- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3 +- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org -All Zika-specific steps and functionality for the Nextstrain pipeline should be -housed in this repository. +Each folder contains a README.md with more information. -_This build requires Augur v6._ +## Documentation -[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml) - -## Usage - -If you're unfamiliar with Nextstrain builds, you may want to follow our -[quickstart guide][] first and then come back here. - -There are two main ways to run & visualise the output from this build: - -The first, and easiest, way to run this pathogen build is using the [Nextstrain -command-line tool][nextstrain-cli]: -``` -nextstrain build . -nextstrain view auspice/ -``` - -See the [nextstrain-cli README][] for how to install the `nextstrain` command. - -The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended). -The build may then be run via: -``` -snakemake -auspice --datasetDir auspice/ -``` - -Build output goes into the directories `data/`, `results/` and `auspice/`. - -## Configuration - -Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule -specifies its file inputs and output and also its parameters. There is little redirection and each -rule should be able to be reasoned with on its own. - - -## Input data - -This build starts by downloading sequences from -https://data.nextstrain.org/files/zika/sequences.fasta.xz -and metadata from -https://data.nextstrain.org/files/zika/metadata.tsv.gz. -These are publicly provisioned data by the Nextstrain team by pulling sequences -from NCBI GenBank via ViPR and performing -[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md). - -Data from GenBank follows Open Data principles, such that we can make input data -and intermediate files available for further analysis. Open Data is data that -can be freely used, re-used and redistributed by anyone - subject only, at most, -to the requirement to attribute and sharealike. - -We gratefully acknowledge the authors, originating and submitting laboratories -of the genetic sequences and metadata for sharing their work in open databases. -Please note that although data generators have generously shared data in an open -fashion, that does not mean there should be free license to publish on this -data. Data generators should be cited where possible and collaborations should -be sought in some circumstances. Please try to avoid scooping someone else's -work. Reach out if uncertain. Authors, paper references (where available) and -links to GenBank entries are provided in the metadata file. - -A faster build process can be run working from example data by copying over -sequences and metadata from `example_data/` to `data/` via: -``` -mkdir -p data/ -cp -v example_data/* data/ -``` - -[Nextstrain]: https://nextstrain.org -[fauna]: https://github.com/nextstrain/fauna -[augur]: https://github.com/nextstrain/augur -[auspice]: https://github.com/nextstrain/auspice -[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options -[nextstrain-cli]: https://github.com/nextstrain/cli -[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md -[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart +- [Contributor documentation](./CONTRIBUTING.md) diff --git a/phylogenetic/README.md b/phylogenetic/README.md new file mode 100644 index 0000000..568bb03 --- /dev/null +++ b/phylogenetic/README.md @@ -0,0 +1,88 @@ +# nextstrain.org/zika + +This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at +[nextstrain.org/zika](https://nextstrain.org/zika). + +The build encompasses fetching data, preparing it for analysis, doing quality +control, performing analyses, and saving the results in a format suitable for +visualization (with [auspice][]). This involves running components of +Nextstrain such as [fauna][] and [augur][]. + +All Zika-specific steps and functionality for the Nextstrain pipeline should be +housed in this repository. + +_This build requires Augur v6._ + +[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml) + +## Usage + +If you're unfamiliar with Nextstrain builds, you may want to follow our +[quickstart guide][] first and then come back here. + +There are two main ways to run & visualise the output from this build: + +The first, and easiest, way to run this pathogen build is using the [Nextstrain +command-line tool][nextstrain-cli]: +``` +nextstrain build . +nextstrain view auspice/ +``` + +See the [nextstrain-cli README][] for how to install the `nextstrain` command. + +The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended). +The build may then be run via: +``` +snakemake +auspice --datasetDir auspice/ +``` + +Build output goes into the directories `data/`, `results/` and `auspice/`. + +## Configuration + +Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule +specifies its file inputs and output and also its parameters. There is little redirection and each +rule should be able to be reasoned with on its own. + + +## Input data + +This build starts by downloading sequences from +https://data.nextstrain.org/files/zika/sequences.fasta.xz +and metadata from +https://data.nextstrain.org/files/zika/metadata.tsv.gz. +These are publicly provisioned data by the Nextstrain team by pulling sequences +from NCBI GenBank via ViPR and performing +[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md). + +Data from GenBank follows Open Data principles, such that we can make input data +and intermediate files available for further analysis. Open Data is data that +can be freely used, re-used and redistributed by anyone - subject only, at most, +to the requirement to attribute and sharealike. + +We gratefully acknowledge the authors, originating and submitting laboratories +of the genetic sequences and metadata for sharing their work in open databases. +Please note that although data generators have generously shared data in an open +fashion, that does not mean there should be free license to publish on this +data. Data generators should be cited where possible and collaborations should +be sought in some circumstances. Please try to avoid scooping someone else's +work. Reach out if uncertain. Authors, paper references (where available) and +links to GenBank entries are provided in the metadata file. + +A faster build process can be run working from example data by copying over +sequences and metadata from `example_data/` to `data/` via: +``` +mkdir -p data/ +cp -v example_data/* data/ +``` + +[Nextstrain]: https://nextstrain.org +[fauna]: https://github.com/nextstrain/fauna +[augur]: https://github.com/nextstrain/augur +[auspice]: https://github.com/nextstrain/auspice +[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options +[nextstrain-cli]: https://github.com/nextstrain/cli +[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md +[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart diff --git a/Snakefile b/phylogenetic/Snakefile similarity index 100% rename from Snakefile rename to phylogenetic/Snakefile diff --git a/config/auspice_config.json b/phylogenetic/config/auspice_config.json similarity index 100% rename from config/auspice_config.json rename to phylogenetic/config/auspice_config.json diff --git a/config/colors.tsv b/phylogenetic/config/colors.tsv similarity index 100% rename from config/colors.tsv rename to phylogenetic/config/colors.tsv diff --git a/config/config_zika.yaml b/phylogenetic/config/config_zika.yaml similarity index 100% rename from config/config_zika.yaml rename to phylogenetic/config/config_zika.yaml diff --git a/config/description.md b/phylogenetic/config/description.md similarity index 100% rename from config/description.md rename to phylogenetic/config/description.md diff --git a/config/dropped_strains.txt b/phylogenetic/config/dropped_strains.txt similarity index 100% rename from config/dropped_strains.txt rename to phylogenetic/config/dropped_strains.txt diff --git a/config/zika_reference.gb b/phylogenetic/config/zika_reference.gb similarity index 100% rename from config/zika_reference.gb rename to phylogenetic/config/zika_reference.gb diff --git a/example_data/metadata.tsv b/phylogenetic/example_data/metadata.tsv similarity index 100% rename from example_data/metadata.tsv rename to phylogenetic/example_data/metadata.tsv diff --git a/example_data/sequences.fasta b/phylogenetic/example_data/sequences.fasta similarity index 100% rename from example_data/sequences.fasta rename to phylogenetic/example_data/sequences.fasta diff --git a/scripts/check-countries-have-colors.sh b/phylogenetic/scripts/check-countries-have-colors.sh similarity index 100% rename from scripts/check-countries-have-colors.sh rename to phylogenetic/scripts/check-countries-have-colors.sh diff --git a/scripts/set_final_strain_name.py b/phylogenetic/scripts/set_final_strain_name.py similarity index 100% rename from scripts/set_final_strain_name.py rename to phylogenetic/scripts/set_final_strain_name.py