diff --git a/phylogenetic/rules/prepare_sequences_E.smk b/phylogenetic/rules/prepare_sequences_E.smk index 9235fc40..764c6c11 100644 --- a/phylogenetic/rules/prepare_sequences_E.smk +++ b/phylogenetic/rules/prepare_sequences_E.smk @@ -1,17 +1,17 @@ """ -This part of the workflow prepares sequences for constructing the phylogenetic tree. +This part of the workflow prepares reference files and sequences for constructing the gene phylogenetic trees. REQUIRED INPUTS: - metadata_url = url to metadata.tsv.zst - sequences_url = url to sequences.fasta.zst reference = path to reference sequence or genbank + sequences = path to all sequences from which gene sequences can be extracted + OUTPUTS: - prepared_sequences = results/aligned.fasta + gene_fasta = reference fasta for the gene (e.g. E gene) + gene_genbank = reference genbank for the gene (e.g. E gene) + sequences = sequences with gene sequences extracted and aligned to the reference gene sequence This part of the workflow usually includes the following steps: - - augur index - - augur filter - - augur align - - augur mask -See Augur's usage docs for these commands for more details. + - newreference.py: Creates new gene genbank and gene reference FASTA from the whole genome reference genbank + - nextclade: Aligns sequences to the reference gene sequence and extracts the gene sequences to ensure the reference files are valid +See Nextclade or script usage docs for these commands for more details. """ ruleorder: align_and_extract_E > decompress