Skip to content

Commit

Permalink
Move nextstrain automation rules and configs to ingest/build-configs #50
Browse files Browse the repository at this point in the history
  • Loading branch information
j23414 authored Mar 16, 2024
2 parents 1654ca4 + b45b7b9 commit 7b3fe1a
Show file tree
Hide file tree
Showing 7 changed files with 73 additions and 167 deletions.
7 changes: 6 additions & 1 deletion ingest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,12 @@ This will produce two files (within the `ingest` directory):
Run the complete ingest pipeline and upload results to AWS S3 with

```sh
nextstrain build . --configfiles defaults/config.yaml defaults/optional.yaml
nextstrain build \
--env AWS_ACCESS_KEY_ID \
--env AWS_SECRET_ACCESS_KEY \
. \
upload_all \
--configfile build-configs/nextstrain-automation/config.yaml
```

### Adding new sequences not from GenBank
Expand Down
20 changes: 20 additions & 0 deletions ingest/build-configs/nextstrain-automation/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# This configuration file should contain all required configuration parameters
# for the ingest workflow to run with additional Nextstrain automation rules.

# Custom rules to run as part of the Nextstrain automated workflow
# The paths should be relative to the ingest directory.
custom_rules:
- build-configs/nextstrain-automation/upload.smk

# Nextstrain CloudFront domain to ensure that we invalidate CloudFront after the S3 uploads
# This is required as long as we are using the AWS CLI for uploads
cloudfront_domain: "data.nextstrain.org"

# Nextstrain AWS S3 Bucket with pathogen prefix
# Replace <pathogen> with the pathogen repo name.
s3_dst: "s3://nextstrain-data/files/workflows/zika"

files_to_upload:
metadata.tsv.zst: results/metadata.tsv
sequences.fasta.zst: results/sequences.fasta

47 changes: 47 additions & 0 deletions ingest/build-configs/nextstrain-automation/upload.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""
This part of the workflow handles uploading files to AWS S3.
Files to upload must be defined in the `files_to_upload` config param, where
the keys are the remote files and the values are the local filepaths
relative to the ingest directory.
Produces a single file for each uploaded file:
"results/upload/{remote_file}.upload"
The rule `upload_all` can be used as a target to upload all files.
"""
import os

slack_envvars_defined = "SLACK_CHANNELS" in os.environ and "SLACK_TOKEN" in os.environ
send_notifications = (
config.get("send_slack_notifications", False) and slack_envvars_defined
)


rule upload_to_s3:
input:
file_to_upload=lambda wildcards: config["files_to_upload"][wildcards.remote_file],
output:
"results/upload/{remote_file}.upload",
params:
quiet="" if send_notifications else "--quiet",
s3_dst=config["s3_dst"],
cloudfront_domain=config["cloudfront_domain"],
shell:
"""
./vendored/upload-to-s3 \
{params.quiet} \
{input.file_to_upload:q} \
{params.s3_dst:q}/{wildcards.remote_file:q} \
{params.cloudfront_domain} 2>&1 | tee {output}
"""


rule upload_all:
input:
uploads=[
f"results/upload/{remote_file}.upload"
for remote_file in config["files_to_upload"].keys()
],
output:
touch("results/upload_all.done")
25 changes: 0 additions & 25 deletions ingest/defaults/optional.yaml

This file was deleted.

55 changes: 0 additions & 55 deletions ingest/rules/slack_notifications.smk

This file was deleted.

22 changes: 0 additions & 22 deletions ingest/rules/trigger_rebuild.smk

This file was deleted.

64 changes: 0 additions & 64 deletions ingest/rules/upload.smk

This file was deleted.

0 comments on commit 7b3fe1a

Please sign in to comment.