Skip to content

Commit

Permalink
Add columns in a separate rule
Browse files Browse the repository at this point in the history
Narrows the scope of the append_usvi rule to just merging data files.
  • Loading branch information
victorlin committed Dec 7, 2024
1 parent 649e495 commit 41991ff
Showing 1 changed file with 25 additions and 14 deletions.
39 changes: 25 additions & 14 deletions phylogenetic/rules/merge_sequences_usvi.smk
Original file line number Diff line number Diff line change
Expand Up @@ -21,34 +21,45 @@ This part of the workflow usually includes the following steps:
"""

rule append_usvi:
"""Appending USVI sequences
rule add_metadata_columns:
"""Add columns to metadata
Notable columns:
- accession: Either the GenBank accession or USVI accession.
- genbank_accession: GenBank accession for Auspice to generate a URL to the NCBI GenBank record. Empty for USVI sequences.
- url: URL used in Auspice, to either link to the USVI github repo (https://github.com/blab/zika-usvi/) or link to the NCBI GenBank record ('https://www.ncbi.nlm.nih.gov/nuccore/*')
- genbank_accession: GenBank accession for Auspice to generate a URL to the NCBI GenBank record.
- [NEW] accession: The GenBank accession. Added to go alongside USVI accession.
- [NEW] url: URL linking to the NCBI GenBank record ('https://www.ncbi.nlm.nih.gov/nuccore/*'). Added to go alongside USVI url.
"""
input:
sequences = "data/sequences.fasta",
metadata = "data/metadata.tsv",
usvi_sequences = "data/sequences_usvi.fasta",
usvi_metadata = "data/metadata_usvi.tsv"
metadata = "data/metadata.tsv"
output:
sequences = "data/sequences_all.fasta",
metadata = "data/metadata_all.tsv"
metadata = "data/metadata_modified.tsv"
shell:
"""
cat {input.sequences} {input.usvi_sequences} > {output.sequences}
csvtk mutate2 -tl \
-n url \
-e '"https://www.ncbi.nlm.nih.gov/nuccore/" + $genbank_accession' \
{input.metadata} \
| csvtk mutate2 -tl \
-n accession \
-e '$genbank_accession' \
| csvtk concat -tl - {input.usvi_metadata} \
> {output.metadata}
"""

rule append_usvi:
"""Appending USVI sequences"""
input:
sequences = "data/sequences.fasta",
metadata = "data/metadata_modified.tsv",
usvi_sequences = "data/sequences_usvi.fasta",
usvi_metadata = "data/metadata_usvi.tsv"
output:
sequences = "data/sequences_all.fasta",
metadata = "data/metadata_all.tsv"
shell:
"""
cat {input.sequences} {input.usvi_sequences} > {output.sequences}
csvtk concat -tl {input.metadata} {input.usvi_metadata} \
| tsv-select -H -f accession --rest last \
> {output.metadata}
"""

0 comments on commit 41991ff

Please sign in to comment.