-
Notifications
You must be signed in to change notification settings - Fork 403
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
12 changed files
with
427 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
Run the workflow using custom data | ||
================================== | ||
|
||
In this tutorial, you will run the workflow using custom focal data in addition to the example reference data. The reference data will serve as background context for the new data. | ||
|
||
.. contents:: Table of Contents | ||
:local: | ||
|
||
Prerequisites | ||
------------- | ||
|
||
1. :doc:`example-data`. These instructions will set up the command line environment used in this tutorial. | ||
2. You have a GISAID account. `Register <https://www.gisaid.org/registration/register/>`__ if you do not have an account yet. However, registration may take a few days. Follow `alternative data preparation methods <../guides/data-prep.html>`__ in place of **Curate data from GISAID** if you wish to continue this tutorial in the meantime. | ||
|
||
Setup | ||
----- | ||
|
||
If you are not already there, change directory to the ``ncov`` directory: | ||
|
||
.. code:: text | ||
cd ncov | ||
Curate data from GISAID | ||
----------------------- | ||
|
||
We will retrieve 10 sequences from GISAID's EpiCoV database. | ||
|
||
1. Navigate to `GISAID <https://www.gisaid.org/>`__ and select **Login**. | ||
|
||
.. image:: ../images/gisaid-homepage.png | ||
:width: 400 | ||
:alt: GISAID login link | ||
|
||
2. Login to your GISAID account. | ||
|
||
.. image:: ../images/gisaid-login.png | ||
:width: 200 | ||
:alt: GISAID login | ||
|
||
3. In the top left navigation bar, select **EpiCoV** then **Search**. | ||
|
||
.. image:: ../images/gisaid-epicov-search.png | ||
:width: 400 | ||
:alt: GISAID EpiCoV Search | ||
|
||
4. Select the first 10 sequences. | ||
|
||
.. image:: ../images/gisaid-select-sequences.png | ||
:width: 700 | ||
:alt: GISAID EpiCoV Search | ||
|
||
5. Select **Download** in the bottom right of the search results. | ||
6. Select **Input for the Augur pipeline** as the download format. | ||
|
||
.. image:: ../images/gisaid-augur-pipeline-download.png | ||
:width: 400 | ||
:alt: GISAID EpiCoV Search | ||
|
||
.. note:: | ||
|
||
You may see different download options, but it is fine as long as **Input for the Augur pipeline** is available. | ||
|
||
7. Select **Download**. | ||
8. Download/move the ``.tar`` file into the ``ncov/data/`` directory. | ||
9. Extract by opening the downloaded ``.tar`` file in your file explorer. It contains two files: one ending with ``.metadata.tsv`` and another with ``.sequences.fasta``. | ||
10. Rename the files as ``custom.metadata.tsv`` and ``custom.sequences.fasta``. | ||
|
||
Run the workflow | ||
---------------- | ||
|
||
From within the ``ncov/`` directory, run the ``ncov`` workflow using a pre-written ``--configfile``: | ||
|
||
.. code:: text | ||
nextstrain build . --cores all --configfile ncov-tutorial/custom-data.yaml | ||
Break down the command | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The workflow can take several minutes to run. While it is running, you can investigate the contents of ``custom-data.yaml`` (comments excluded): | ||
|
||
.. code-block:: yaml | ||
inputs: | ||
- name: reference_data | ||
metadata: https://data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz | ||
sequences: https://data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz | ||
- name: custom_data | ||
metadata: data/custom.metadata.tsv | ||
sequences: data/custom.sequences.fasta | ||
refine: | ||
root: "Wuhan-Hu-1/2019" | ||
builds: | ||
custom-build: | ||
title: "Build with custom data and example data" | ||
subsampling_scheme: all | ||
auspice_config: ncov-tutorial/auspice-config-custom-data.json | ||
This is the same as the previous file, with some additions: | ||
|
||
1. A second input for the custom data, referencing the metadata and sequences files downloaded from GISAID. | ||
2. A ``builds`` section that defines one output :term:`docs.nextstrain.org:dataset` using: | ||
|
||
1. A custom name ``custom-build`` | ||
2. A custom title ``Build with custom data and example data`` | ||
3. A pre-defined subsampling scheme ``all`` (TODO: add doc link) | ||
4. An Auspice config file with the contents: | ||
|
||
.. code-block:: json | ||
{ | ||
"colorings": [ | ||
{ | ||
"key": "custom_data", | ||
"title": "Custom data", | ||
"type": "categorical" | ||
} | ||
], | ||
"display_defaults": { | ||
"color_by": "custom_data" | ||
} | ||
} | ||
This JSON does two things: | ||
|
||
1. Create a new coloring ``custom_data`` which reflects a special metadata column generated by the ncov workflow. Each data input produces a new final metadata column with categorical values ``yes`` or ``no`` representing whether the sequence was from the input. | ||
2. Set the default Color By as the new ``custom_data`` coloring. | ||
|
||
.. note :: | ||
**Build** is a widely used term with various meanings. In the context of the ncov workflow, the ``builds:`` section defines output :term:`datasets <docs.nextstrain.org:dataset>` to be generated by the workflow (i.e. "build" a dataset). | ||
Visualize the results | ||
--------------------- | ||
|
||
Run this command to start the :term:`docs.nextstrain.org:Auspice` server, providing ``auspice/`` as the directory containing output dataset files: | ||
|
||
.. code:: text | ||
nextstrain view auspice/ | ||
Navigate to ``http://127.0.0.1:4000/ncov/custom-build``. The resulting :term:`docs.nextstrain.org:dataset` should have similar phylogeny to the previous dataset, with additional sequences: | ||
|
||
.. figure:: ../images/dataset-custom-data-highlighted.png | ||
:alt: Phylogenetic tree from the "custom data" tutorial as visualized in Auspice | ||
|
||
|
||
1. The custom dataset name ``custom-build`` can be seen in the dataset selector, as well as the dataset URL. | ||
2. The custom dataset title can be seen at the top of the page. | ||
3. The custom coloring is used by default. You can see which sequences are from the custom data added in this tutorial. | ||
|
||
.. note:: | ||
|
||
You may not see all 10 custom sequences - some can be filtered out due to quality checks built into the ncov workflow. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
Run the workflow using example data | ||
=================================== | ||
|
||
The aim of this first tutorial is to introduce our SARS-CoV-2 workflow. | ||
To do this, we will run the workflow using a small set of reference data which we provide. | ||
This tutorial leads on to subsequent tutorials where we will walk through more complex scenarios. | ||
|
||
.. contents:: Table of Contents | ||
:local: | ||
|
||
Prerequisites | ||
------------- | ||
|
||
1. :doc:`setup`. These instructions will install all of the software you need to complete this tutorial and others. | ||
|
||
Setup | ||
----- | ||
|
||
1. Activate the ``nextstrain`` conda environment: | ||
|
||
.. code:: text | ||
conda activate nextstrain | ||
2. Change directory to the ``ncov`` directory: | ||
|
||
.. code:: text | ||
cd ncov | ||
3. Download the example tutorial repository into a new directory ``ncov-tutorial/``: | ||
|
||
.. code:: text | ||
git clone https://github.com/nextstrain/ncov-tutorial | ||
Run the workflow | ||
---------------- | ||
|
||
From within the ``ncov/`` directory, run the ``ncov`` workflow using a configuration file provided in the tutorial directory: | ||
|
||
.. code:: text | ||
nextstrain build . --cores all --configfile ncov-tutorial/example-data.yaml | ||
Break down the command | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The workflow can take several minutes to run. While it is running, you can learn about the parts of this command: | ||
|
||
- ``nextstrain build .`` | ||
- This tells the :term:`docs.nextstrain.org:Nextstrain CLI` to :term:`build <docs.nextstrain.org:build (verb)>` the workflow from ``.``, the current directory. All subsequent command-line parameters are passed to the workflow manager, Snakemake. | ||
- ``--cores all`` | ||
- This required Snakemake parameter specifies the number of CPU cores to use (`more info <https://snakemake.readthedocs.io/en/stable/executing/cli.html>`_). | ||
- ``--configfile ncov-tutorial/example-data.yaml`` | ||
- ``--configfile`` is another Snakemake parameter used to configure the ncov workflow. | ||
- ``ncov-tutorial/example-data.yaml`` is a YAML file which provides custom workflow configuration including inputs and outputs. Contents with comments excluded: | ||
|
||
.. code-block:: yaml | ||
inputs: | ||
- name: reference_data | ||
metadata: https://data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz | ||
sequences: https://data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz | ||
refine: | ||
root: "Wuhan-Hu-1/2019" | ||
This provides the workflow with one input named ``reference_data``, which is a small dataset maintained by the Nextstrain team. The metadata and sequences files are downloaded directly from the associated URLs. `See the complete list of SARS-CoV-2 datasets we provide through data.nextstrain.org <https://docs.nextstrain.org/projects/ncov/en/latest/reference/remote_inputs.html>`_. | ||
|
||
The ``refine`` entry specifies the root sequence for the example GenBank data. | ||
|
||
For more information, visit `the complete configuration guide <../reference/configuration.html>`_. | ||
|
||
The workflow output produces a new directory ``auspice/`` containing a file ``ncov_default-build.json``, which will be visualized in the following section. The workflow also produces intermediate files in a new ``results/`` directory. | ||
|
||
Visualize the results | ||
--------------------- | ||
|
||
Run this command to start the :term:`docs.nextstrain.org:Auspice` server, providing ``auspice/`` as the directory containing output dataset files: | ||
|
||
.. code:: text | ||
nextstrain view auspice/ | ||
Navigate to ``http://127.0.0.1:4000/ncov/default-build``. The resulting :term:`docs.nextstrain.org:dataset` should show a phylogeny of ~200 sequences: | ||
|
||
.. figure:: ../images/dataset-example-data.png | ||
:alt: Phylogenetic tree from the "example data" tutorial as visualized in Auspice | ||
|
||
.. note:: | ||
|
||
You can also view the results by dragging the file ``auspice/ncov_default-build.json`` onto `auspice.us <https://auspice.us>`__. |
Oops, something went wrong.