-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Move phylogenetic workflow from top-level to folder phylogenetic
- Loading branch information
Showing
13 changed files
with
95 additions
and
83 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,88 +1,12 @@ | ||
# nextstrain.org/zika | ||
# Nextstrain repository for Zika virus | ||
|
||
This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at | ||
[nextstrain.org/zika](https://nextstrain.org/zika). | ||
This repository contains two workflows for the analysis of Zika virus data: | ||
|
||
The build encompasses fetching data, preparing it for analysis, doing quality | ||
control, performing analyses, and saving the results in a format suitable for | ||
visualization (with [auspice][]). This involves running components of | ||
Nextstrain such as [fauna][] and [augur][]. | ||
- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3 | ||
- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org | ||
|
||
All Zika-specific steps and functionality for the Nextstrain pipeline should be | ||
housed in this repository. | ||
Each folder contains a README.md with more information. | ||
|
||
_This build requires Augur v6._ | ||
## Documentation | ||
|
||
[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml) | ||
|
||
## Usage | ||
|
||
If you're unfamiliar with Nextstrain builds, you may want to follow our | ||
[quickstart guide][] first and then come back here. | ||
|
||
There are two main ways to run & visualise the output from this build: | ||
|
||
The first, and easiest, way to run this pathogen build is using the [Nextstrain | ||
command-line tool][nextstrain-cli]: | ||
``` | ||
nextstrain build . | ||
nextstrain view auspice/ | ||
``` | ||
|
||
See the [nextstrain-cli README][] for how to install the `nextstrain` command. | ||
|
||
The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended). | ||
The build may then be run via: | ||
``` | ||
snakemake | ||
auspice --datasetDir auspice/ | ||
``` | ||
|
||
Build output goes into the directories `data/`, `results/` and `auspice/`. | ||
|
||
## Configuration | ||
|
||
Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule | ||
specifies its file inputs and output and also its parameters. There is little redirection and each | ||
rule should be able to be reasoned with on its own. | ||
|
||
|
||
## Input data | ||
|
||
This build starts by downloading sequences from | ||
https://data.nextstrain.org/files/zika/sequences.fasta.xz | ||
and metadata from | ||
https://data.nextstrain.org/files/zika/metadata.tsv.gz. | ||
These are publicly provisioned data by the Nextstrain team by pulling sequences | ||
from NCBI GenBank via ViPR and performing | ||
[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md). | ||
|
||
Data from GenBank follows Open Data principles, such that we can make input data | ||
and intermediate files available for further analysis. Open Data is data that | ||
can be freely used, re-used and redistributed by anyone - subject only, at most, | ||
to the requirement to attribute and sharealike. | ||
|
||
We gratefully acknowledge the authors, originating and submitting laboratories | ||
of the genetic sequences and metadata for sharing their work in open databases. | ||
Please note that although data generators have generously shared data in an open | ||
fashion, that does not mean there should be free license to publish on this | ||
data. Data generators should be cited where possible and collaborations should | ||
be sought in some circumstances. Please try to avoid scooping someone else's | ||
work. Reach out if uncertain. Authors, paper references (where available) and | ||
links to GenBank entries are provided in the metadata file. | ||
|
||
A faster build process can be run working from example data by copying over | ||
sequences and metadata from `example_data/` to `data/` via: | ||
``` | ||
mkdir -p data/ | ||
cp -v example_data/* data/ | ||
``` | ||
|
||
[Nextstrain]: https://nextstrain.org | ||
[fauna]: https://github.com/nextstrain/fauna | ||
[augur]: https://github.com/nextstrain/augur | ||
[auspice]: https://github.com/nextstrain/auspice | ||
[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options | ||
[nextstrain-cli]: https://github.com/nextstrain/cli | ||
[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md | ||
[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart | ||
- [Contributor documentation](./CONTRIBUTING.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# nextstrain.org/zika | ||
|
||
This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at | ||
[nextstrain.org/zika](https://nextstrain.org/zika). | ||
|
||
The build encompasses fetching data, preparing it for analysis, doing quality | ||
control, performing analyses, and saving the results in a format suitable for | ||
visualization (with [auspice][]). This involves running components of | ||
Nextstrain such as [fauna][] and [augur][]. | ||
|
||
All Zika-specific steps and functionality for the Nextstrain pipeline should be | ||
housed in this repository. | ||
|
||
_This build requires Augur v6._ | ||
|
||
[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml) | ||
|
||
## Usage | ||
|
||
If you're unfamiliar with Nextstrain builds, you may want to follow our | ||
[quickstart guide][] first and then come back here. | ||
|
||
There are two main ways to run & visualise the output from this build: | ||
|
||
The first, and easiest, way to run this pathogen build is using the [Nextstrain | ||
command-line tool][nextstrain-cli]: | ||
``` | ||
nextstrain build . | ||
nextstrain view auspice/ | ||
``` | ||
|
||
See the [nextstrain-cli README][] for how to install the `nextstrain` command. | ||
|
||
The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended). | ||
The build may then be run via: | ||
``` | ||
snakemake | ||
auspice --datasetDir auspice/ | ||
``` | ||
|
||
Build output goes into the directories `data/`, `results/` and `auspice/`. | ||
|
||
## Configuration | ||
|
||
Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule | ||
specifies its file inputs and output and also its parameters. There is little redirection and each | ||
rule should be able to be reasoned with on its own. | ||
|
||
|
||
## Input data | ||
|
||
This build starts by downloading sequences from | ||
https://data.nextstrain.org/files/zika/sequences.fasta.xz | ||
and metadata from | ||
https://data.nextstrain.org/files/zika/metadata.tsv.gz. | ||
These are publicly provisioned data by the Nextstrain team by pulling sequences | ||
from NCBI GenBank via ViPR and performing | ||
[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md). | ||
|
||
Data from GenBank follows Open Data principles, such that we can make input data | ||
and intermediate files available for further analysis. Open Data is data that | ||
can be freely used, re-used and redistributed by anyone - subject only, at most, | ||
to the requirement to attribute and sharealike. | ||
|
||
We gratefully acknowledge the authors, originating and submitting laboratories | ||
of the genetic sequences and metadata for sharing their work in open databases. | ||
Please note that although data generators have generously shared data in an open | ||
fashion, that does not mean there should be free license to publish on this | ||
data. Data generators should be cited where possible and collaborations should | ||
be sought in some circumstances. Please try to avoid scooping someone else's | ||
work. Reach out if uncertain. Authors, paper references (where available) and | ||
links to GenBank entries are provided in the metadata file. | ||
|
||
A faster build process can be run working from example data by copying over | ||
sequences and metadata from `example_data/` to `data/` via: | ||
``` | ||
mkdir -p data/ | ||
cp -v example_data/* data/ | ||
``` | ||
|
||
[Nextstrain]: https://nextstrain.org | ||
[fauna]: https://github.com/nextstrain/fauna | ||
[augur]: https://github.com/nextstrain/augur | ||
[auspice]: https://github.com/nextstrain/auspice | ||
[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options | ||
[nextstrain-cli]: https://github.com/nextstrain/cli | ||
[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md | ||
[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.