-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #15 from nextstrain/additional-docs
Additional docs
- Loading branch information
Showing
26 changed files
with
519 additions
and
93 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# This configuration file should contain all required configuration parameters | ||
# for the ingest workflow to run with additional Nextstrain automation rules. | ||
|
||
# Custom rules to run as part of the Nextstrain automated workflow | ||
# The paths should be relative to the ingest directory. | ||
custom_rules: | ||
- profiles/nextstrain_automation/upload.smk | ||
|
||
# Nextstrain CloudFront domain to ensure that we invalidate CloudFront after the S3 uploads | ||
# This is required as long as we are using the AWS CLI for uploads | ||
cloudfront_domain: "data.nextstrain.org" | ||
|
||
# Nextstrain AWS S3 Bucket with pathogen prefix | ||
# Replace <pathogen> with the pathogen repo name. | ||
s3_dst: "s3://nextstrain-data/files/workflows/<pathogen>" | ||
|
||
# Mapping of files to upload | ||
files_to_upload: | ||
ncbi.ndjson.zst: data/ncbi.ndjson | ||
metadata.tsv.zst: results/metadata.tsv | ||
sequences.fasta.zst: results/sequences.fasta | ||
alignments.fasta.zst: results/alignment.fasta | ||
translations.zip: results/translations.zip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
""" | ||
This part of the workflow handles uploading files to AWS S3. | ||
Files to upload must be defined in the `files_to_upload` config param, where | ||
the keys are the remote files and the values are the local filepaths | ||
relative to the ingest directory. | ||
Produces a single file for each uploaded file: | ||
"results/upload/{remote_file}.upload" | ||
The rule `upload_all` can be used as a target to upload all files. | ||
""" | ||
import os | ||
|
||
slack_envvars_defined = "SLACK_CHANNELS" in os.environ and "SLACK_TOKEN" in os.environ | ||
send_notifications = ( | ||
config.get("send_slack_notifications", False) and slack_envvars_defined | ||
) | ||
|
||
|
||
rule upload_to_s3: | ||
input: | ||
file_to_upload=config["files_to_upload"][wildcards.remote_file], | ||
output: | ||
"results/upload/{remote_file}.upload", | ||
params: | ||
quiet="" if send_notifications else "--quiet", | ||
s3_dst=config["s3_dst"], | ||
cloudfront_domain=config["cloudfront_domain"], | ||
shell: | ||
""" | ||
./vendored/upload-to-s3 \ | ||
{params.quiet} \ | ||
{input.file_to_upload:q} \ | ||
{params.s3_dst:q}/{wildcards.remote_file:q} \ | ||
{params.cloudfront_domain} 2>&1 | tee {output} | ||
""" | ||
|
||
|
||
rule upload_all: | ||
input: | ||
uploads=[ | ||
f"results/upload/{remote_file}.upload" | ||
for remote_file in config["files_to_upload"].keys() | ||
], | ||
output: | ||
touch("results/upload_all.done") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,46 @@ | ||
""" | ||
This is the main Nextclade Snakefile that orchestrates the workflow to produce | ||
a Nextclade dataset. | ||
""" | ||
|
||
# Use default configuration values. Override with Snakemake's --configfile/--config options. | ||
configfile: "config/defaults.yaml" | ||
|
||
|
||
# This is the default rule that Snakemake will run when there are no specified targets. | ||
# The default output of the Nextclade workflow is usually the produced Nextclade dataset. | ||
# See Nextclade docs on expected naming conventions of dataset files | ||
# https://docs.nextstrain.org/projects/nextclade/page/user/datasets.html | ||
rule all: | ||
input: | ||
# Fill in path to the final exported Auspice JSON | ||
auspice_json="", | ||
# Fill in paths to the final exported Nextclade dataset. | ||
|
||
|
||
# These rules are imported in the order that they are expected to run. | ||
# Each Snakefile will have documented inputs and outputs that should be kept as | ||
# consistent interfaces across pathogen repos. This allows us to define typical | ||
# steps that are required for a phylogenetic workflow, but still allow pathogen | ||
# specific customizations within each step. | ||
# Note that only PATHOGEN level customizations should be added to these | ||
# core steps, meaning they are custom rules necessary for all builds of the pathogen. | ||
# If there are build specific customizations, they should be added with the | ||
# custom_rules imported below to ensure that the core workflow is not complicated | ||
# by build specific rules. | ||
include: "rules/preprocess.smk" | ||
include: "rules/prepare_sequences.smk" | ||
include: "rules/construct_phylogeny.smk" | ||
include: "rules/annotate_phylogeny.smk" | ||
include: "rules/export.smk" | ||
|
||
# Allow users to import custom rules provided via the config. | ||
# This allows users to run custom rules that can extend or override the workflow. | ||
# A concrete example of using custom rules is the extension of the workflow with | ||
# rules to do a test run of `nextclade run` with the produced Nextclade dataset. | ||
# For extensions, the user will have to specify the custom rule targets when | ||
# running the workflow. | ||
# For overrides, the custom Snakefile will have to use the `ruleorder` directive | ||
# to allow Snakemake to handle ambiguous rules | ||
# https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#handling-ambiguous-rules | ||
if "custom_rules" in config: | ||
for rule_file in config["custom_rules"]: | ||
|
||
include: rule_file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# This configuration file should contain all the required configuration parameters | ||
# for the Nextclade workflow to do a test run with a created dataset | ||
|
||
# Custom rules to run as part of the testing workflow | ||
# The paths should be relative to the phylogenetic directory. | ||
custom_rules: | ||
- profiles/test_dataset/test_dataset.smk |
Oops, something went wrong.