Skip to content

Commit

Permalink
Simplify README instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
j23414 committed Jan 12, 2024
1 parent 44825a3 commit aefdec1
Showing 1 changed file with 28 additions and 63 deletions.
91 changes: 28 additions & 63 deletions phylogenetic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,86 +3,51 @@
This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
[nextstrain.org/zika](https://nextstrain.org/zika).

The build encompasses fetching data, preparing it for analysis, doing quality
control, performing analyses, and saving the results in a format suitable for
visualization (with [auspice][]). This involves running components of
Nextstrain such as [fauna][] and [augur][].
## Software requirements

All Zika-specific steps and functionality for the Nextstrain pipeline should be
housed in this repository.

_This build requires Augur v6._

[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)
Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.

## Usage

If you're unfamiliar with Nextstrain builds, you may want to follow our
[quickstart guide][] first and then come back here.
[Running a Pathogen Workflow guide][] first and then come back here.

There are two main ways to run & visualise the output from this build:
The easiest way to run this pathogen build is using the Nextstrain
command-line tool:

The first, and easiest, way to run this pathogen build is using the [Nextstrain
command-line tool][nextstrain-cli]:
```
nextstrain build .
nextstrain view auspice/
```
nextstrain build .

See the [nextstrain-cli README][] for how to install the `nextstrain` command.
Build output goes into the directories `data/`, `results/` and `auspice/`.

The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
The build may then be run via:
```
snakemake
auspice --datasetDir auspice/
```
Once you've run the build, you can view the results in auspice:

Build output goes into the directories `data/`, `results/` and `auspice/`.
nextstrain view auspice/

## Configuration

Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
specifies its file inputs and output and also its parameters. There is little redirection and each
rule should be able to be reasoned with on its own.

### Using GenBank data

This build starts by pulling preprocessed sequence and metadata files from:

* https://data.nextstrain.org/files/zika/sequences.fasta.zst
* https://data.nextstrain.org/files/zika/metadata.tsv.zst

The above datasets have been preprocessed and cleaned from GenBank and are updated at regular intervals.

### Using example data

Alternatively, you can run the build using the
example data provided in this repository. To run the build by copying the
example sequences into the `data/` directory, use the following:

## Input data

This build starts by downloading sequences from
https://data.nextstrain.org/files/zika/sequences.fasta.xz
and metadata from
https://data.nextstrain.org/files/zika/metadata.tsv.gz.
These are publicly provisioned data by the Nextstrain team by pulling sequences
from NCBI GenBank via ViPR and performing
[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).

Data from GenBank follows Open Data principles, such that we can make input data
and intermediate files available for further analysis. Open Data is data that
can be freely used, re-used and redistributed by anyone - subject only, at most,
to the requirement to attribute and sharealike.

We gratefully acknowledge the authors, originating and submitting laboratories
of the genetic sequences and metadata for sharing their work in open databases.
Please note that although data generators have generously shared data in an open
fashion, that does not mean there should be free license to publish on this
data. Data generators should be cited where possible and collaborations should
be sought in some circumstances. Please try to avoid scooping someone else's
work. Reach out if uncertain. Authors, paper references (where available) and
links to GenBank entries are provided in the metadata file.

A faster build process can be run working from example data by copying over
sequences and metadata from `example_data/` to `data/` via:
```
mkdir -p data/
cp -v example_data/* data/
```
nextstrain build . --configfile profiles/ci/profiles_config.yaml

[Nextstrain]: https://nextstrain.org
[fauna]: https://github.com/nextstrain/fauna
[augur]: https://github.com/nextstrain/augur
[auspice]: https://github.com/nextstrain/auspice
[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
[nextstrain-cli]: https://github.com/nextstrain/cli
[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
[augur]: https://docs.nextstrain.org/projects/augur/en/stable/
[auspice]: https://docs.nextstrain.org/projects/auspice/en/stable/index.html
[Installing Nextstrain guide]: https://docs.nextstrain.org/en/latest/install.html
[Running a Pathogen Workflow guide]: https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html

0 comments on commit aefdec1

Please sign in to comment.