Skip to content

Commit

Permalink
Update glossary and graphviz for repo/workflows
Browse files Browse the repository at this point in the history
With the push for pathogen repos to adhere to the pathogen-repo-guide,
the glossary and graphviz for repositories and workflows need to be
updated.

This will also make it easier to standardize terminology in the
upcoming ingest tutorials.
  • Loading branch information
joverlee521 committed Mar 2, 2024
1 parent 2ca0b72 commit a17497c
Show file tree
Hide file tree
Showing 2 changed files with 100 additions and 33 deletions.
111 changes: 86 additions & 25 deletions src/learn/parts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,18 @@ colloquially because they use a generic data format called JSON.
metadata -> filter;
}

Builds run several commands and are often automated by workflow managers such as `Snakemake <https://snakemake.readthedocs.io>`__, `Nextflow <https://nextflow.io>`__ and `WDL <https://openwdl.org>`__. A :term:`workflow` bundles one or more related :term:`builds<build>` which each produce a :term:`dataset` for visualization with :term:`Auspice`.
Builds run several commands and are often automated by workflow managers such as
`Snakemake <https://snakemake.readthedocs.io>`__, `Nextflow <https://nextflow.io>`__
and `WDL <https://openwdl.org>`__. A :term:`workflow` can bundle one or more related
:term:`builds<build>` which each produce a :term:`dataset` for visualization with :term:`Auspice`.

As an example, our core workflows are organized as `Git repositories <https://git-scm.com>`__ hosted on `GitHub <https://github.com/nextstrain>`__. Each contains a :doc:`Snakemake workflow </guides/bioinformatics/augur_snakemake>` using Augur, configuration, and data.
A workflow can also produce outputs that are not limited to Auspice datasets. For example,
ingest workflows produce curated metadata and sequences files and Nextclade workflows
produce :term:`Nextclade datasets<Nextclade dataset>`.

Our :term:`pathogen repositories<pathogen repository>` are organized as `Git repositories <https://git-scm.com>`__
hosted on `GitHub <https://github.com/nextstrain>`__. Each repository can contain
one or more workflows.

.. graphviz::
:align: center
Expand All @@ -176,44 +185,96 @@ As an example, our core workflows are organized as `Git repositories <https://gi
graph [
fontname="Lato, 'Helvetica Neue', sans-serif",
fontsize=12,
]
];
node [
shape=box,
style="rounded, filled",
fontname="Lato, 'Helvetica Neue', sans-serif",
fontsize=12,
height=0.1,
colorscheme=paired10,
pad=0.1,
margin=0.1,
];
rankdir=LR
rankdir=LR;

subgraph cluster_ncov {
label = "SARS-CoV-2 repository";
subgraph cluster_ncov_phylo {
label = "Phylogenetic workflow";
build0 [width=1, label="Global build"];
build1 [width=1, label="Africa build"];
build2 [width=1, label="Europe build"];
output0 [width=1, label="dataset"];
output1 [width=1, label="dataset"];
output2 [width=1, label="dataset"];
ellipses1 [width=1, label="...", penwidth=0, fillcolor="white"];
ellipses2 [width=1, label="...", penwidth=0, fillcolor="white"];
}
}

subgraph cluster_0 {
label = "Zika workflow";
build0 [width=1, label="Zika build"]
dataset0 [width=1, label="dataset"]
subgraph cluster_zika {
label = "Zika repository";
nojustify = true;
subgraph cluster_zika_ingest {
label = "Ingest workflow";
build3 [width=1, label="ingest build"];
output3 [width=1, label="output files"];
}
subgraph cluster_zika_phylo {
label = "Phylogenetic workflow";
build4 [width=1, label="phylogenetic build"];
output4 [width=1, label="dataset"];
}
}

subgraph cluster_1 {
label = "SARS-CoV-2 workflow";
build1 [width=1, label="Global build"]
build2 [width=1, label="Africa build"]
build3 [width=1, label="Europe build"]
dataset1 [width=1, label="dataset"]
dataset2 [width=1, label="dataset"]
dataset3 [width=1, label="dataset"]
ellipses1 [width=1, label="...", penwidth=0, fillcolor="white"]
ellipses2 [width=1, label="...", penwidth=0, fillcolor="white"]
subgraph cluster_mpox {
label = "Mpox repository";
subgraph cluster_mpox_ingest {
label = "Ingest workflow";
build5 [width=1, label="ingest build"];
output5 [width=1, label="output files"];
}
subgraph cluster_mpox_phylo {
label = "Phylogenetic workflow";
build6 [width=1, label="mpxv build"];
build7 [width=1, label="hmpxv1 build"];
build8 [width=1, label="hmpxv1_big build"];
output6 [width=1, label="dataset"];
output7 [width=1, label="dataset"];
output8 [width=1, label="dataset"];

}
subgraph cluster_mpox_nextclade {
label = "Nextclade workflow";
build9 [width=1, label="all-clades build"];
build10 [width=1, label="clade-iib build"];
build11 [width=1, label="lineage-b.1 build"];
output9 [width=1, label="nextclade dataset"];
output10 [width=1, label="nextclade dataset"];
output11 [width=1, label="nextclade dataset"];

}
}

build0 -> dataset0
build1 -> dataset1
build2 -> dataset2
build3 -> dataset3
build0 -> output0;
build1 -> output1;
build2 -> output2;
build3 -> output3;
build4 -> output4;
build5 -> output5;
build6 -> output6;
build7 -> output7;
build8 -> output8;
build9 -> output9;
build10 -> output10;
build11 -> output11;

{
edge[style=invis]
dataset0 -> build1 // arrange clusters on same row
ellipses1 -> ellipses2
edge[style=invis];
output0 -> build3; // arrange clusters on same row
output3 -> build5; // arrange clusters on same row
ellipses1 -> ellipses2;
}
}

Expand Down
22 changes: 14 additions & 8 deletions src/reference/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,16 @@ Glossary

A web application used for phylogenetic visualization and analysis. :doc:`Documentation<auspice:index>`

pathogen repository

A version-controlled folder containing all files necessary to run a pathogen's :term:`workflows<workflow>`.

workflow
also *pathogen workflow*, *pathogen analysis*, *Nextstrain workflow*
also *Nextstrain workflow*

A reproducible process comprised of one or more :term:`builds<build>` producing :term:`datasets<dataset>`, which can be visualized by :term:`Auspice`. Implementation varies per workflow, but generally they are run by workflow managers such as Snakemake.
A reproducible process comprised of one or more :term:`builds<build>` producing outputs.
The outputs produced are often :term:`datasets<dataset>`, which can be visualized by :term:`Auspice`.
Implementation varies per workflow, but generally they are run by workflow managers such as Snakemake.

Our :term:`core workflows<core workflow>` can be divided into two types:

Expand All @@ -30,15 +36,11 @@ Glossary

A :term:`workflow` maintained by the Nextstrain team.

workflow repository
also *pathogen workflow repository*

A version-controlled folder containing all files necessary to run a :term:`workflow`.

build
also *Nextstrain build*

*(noun)* A sequence of commands, parameters and input files which work together to reproducibly execute bioinformatic analyses and generate a :term:`dataset` for visualization with :term:`Auspice`.
*(noun)* A sequence of commands, parameters and input files which work together to reproducibly generate outputs.
Phylogenetic builds execute bioinformatic analyses and generate a :term:`dataset` for visualization with :term:`Auspice`.

build (verb)

Expand All @@ -60,6 +62,10 @@ Glossary

Some :term:`workflows<workflow>` produce a single, synonymous dataset, like Zika. Others, like seasonal flu, produce many datasets.

Nextclade dataset

A collection of input data files required for :doc:`Nextclade<nextclade:index>` to run an analysis. :doc:`Documentation<nextclade:user/datasets>`

narrative

A method of data-driven storytelling with interactive views of :term:`datasets <dataset>` displayed alongside multiple pages (or slides) of text and images.
Expand Down

0 comments on commit a17497c

Please sign in to comment.