-
Notifications
You must be signed in to change notification settings - Fork 128
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1348 from nextstrain/james/ancestral-translate-fixes
ancestral / translate improvements & tests
- Loading branch information
Showing
17 changed files
with
427 additions
and
90 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Setup | ||
|
||
$ source "$TESTDIR"/_setup.sh | ||
|
||
General running of augur ancestral. | ||
- Note that the (parsimonious) C18T on the branch of Sample_C is inferred by augur | ||
as a root mutation of C18T + T18C reversion on Node_AB. This is reflected in the | ||
node-data JSON we diff against. | ||
- We supply the refererence sequence so mutations are called at the root. | ||
- Ambiguous nucleotides (there are 3 Ns in sample_C) are inferred (default) | ||
|
||
$ ${AUGUR} ancestral \ | ||
> --tree "$TESTDIR/../data/simple-genome/tree.nwk" \ | ||
> --alignment "$TESTDIR/../data/simple-genome/sequences.fasta" \ | ||
> --root-sequence "$TESTDIR/../data/simple-genome/reference.fasta" \ | ||
> --output-node-data "nt_muts.ref-seq.json" \ | ||
> --inference marginal > /dev/null | ||
|
||
|
||
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \ | ||
> "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \ | ||
> "nt_muts.ref-seq.json" | ||
{} | ||
|
||
Same as above but without a root-sequence so no mutations inferred on the root node | ||
(and thus the inferred reference will be different) | ||
|
||
$ ${AUGUR} ancestral \ | ||
> --tree "$TESTDIR/../data/simple-genome/tree.nwk" \ | ||
> --alignment "$TESTDIR/../data/simple-genome/sequences.fasta" \ | ||
> --output-node-data "nt_muts.no-ref-seq.json" \ | ||
> --inference marginal > /dev/null | ||
|
||
|
||
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \ | ||
> "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \ | ||
> "nt_muts.no-ref-seq.json" \ | ||
> --exclude-paths "root['reference']['nuc']" "root['nodes']['node_root']['muts']" | ||
{} |
54 changes: 54 additions & 0 deletions
54
tests/functional/ancestral/data/simple-genome/nt_muts.ref-seq.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
{ | ||
"annotations": { | ||
"nuc": { | ||
"end": 50, | ||
"start": 1, | ||
"strand": "+", | ||
"type": "source" | ||
} | ||
}, | ||
"generated_by": { | ||
"program": "augur", | ||
"version": "23.1.1" | ||
}, | ||
"mask": "00000000000000000000000000000000000000000000000000", | ||
"nodes": { | ||
"node_AB": { | ||
"muts": [ | ||
"A7G", | ||
"C14T", | ||
"T18C", | ||
"A33G", | ||
"A43T" | ||
], | ||
"sequence": "AAAACAGAAATGCTCTGCGGGTAAAAAAAAAAGAACTACTTGTCCATAAA" | ||
}, | ||
"node_root": { | ||
"muts": [ | ||
"A5C", | ||
"C18T" | ||
], | ||
"sequence": "AAAACAAAAATGCCCTGTGGGTAAAAAAAAAAAAACTACTTGACCATAAA" | ||
}, | ||
"sample_A": { | ||
"muts": [ | ||
"G33C", | ||
"C39T" | ||
], | ||
"sequence": "AAAACAGAAATGCTCTGCGGGTAAAAAAAAAACAACTATTTGTCCATAAA" | ||
}, | ||
"sample_B": { | ||
"muts": [ | ||
"G42A" | ||
], | ||
"sequence": "AAAACAGAAATGCTCTGCGGGTAAAAAAAAAAGAACTACTTATCCATAAA" | ||
}, | ||
"sample_C": { | ||
"muts": [], | ||
"sequence": "AAAACAAAAATGCCCTGTGGGTAAAAAAAAAAAAACTACTTGACCATAAA" | ||
} | ||
}, | ||
"reference": { | ||
"nuc": "AAAAAAAAAATGCCCTGCGGGTAAAAAAAAAAAAACTACTTGACCATAAA" | ||
} | ||
} |
2 changes: 2 additions & 0 deletions
2
tests/functional/ancestral/data/simple-genome/reference.fasta
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
>reference_name | ||
AAAAAAAAAATGCCCTGCGGGTAAAAAAAAAAAAACTACTTGACCATAAA |
6 changes: 6 additions & 0 deletions
6
tests/functional/ancestral/data/simple-genome/sequences.fasta
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
>sample_A | ||
AAAACAGAAATGCTCTGCGGGTAAAAAAAAAACAACTATTTGTCCATAAA | ||
>sample_B | ||
AAAACAGAAATGCTCTGCGGGTAAAAAAAAAAGAACTACTTATCCATAAA | ||
>sample_C | ||
AAAACAAAAATGCCCTGTGGGTAAAAANNNAAAAACTACTTGACCATAAA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
(sample_C:0.02,(sample_B:0.02,sample_A:0.02)node_AB:0.06)node_root:0.02; |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
Setup | ||
|
||
$ export AUGUR="${AUGUR:-$TESTDIR/../../../../bin/augur}" | ||
$ export SCRIPTS="$TESTDIR/../../../../scripts" | ||
$ export ANC_DATA="$TESTDIR/../../ancestral/data/simple-genome" | ||
$ export DATA="$TESTDIR/../data/simple-genome" | ||
|
||
General running of augur translate. See the cram test general.t for `augur ancestral` | ||
which uses many of the same files. | ||
NOTE: The GFF reference here contains a 'source' ID because without this downstream commands | ||
which validate the output will fail as it's missing a 'nuc' annotation. | ||
|
||
$ ${AUGUR} translate \ | ||
> --tree "$ANC_DATA/tree.nwk" \ | ||
> --ancestral-sequences "$ANC_DATA/nt_muts.ref-seq.json" \ | ||
> --reference-sequence "$DATA/reference.source.gff" \ | ||
> --output-node-data "aa_muts.json" > /dev/null | ||
|
||
$ python3 "$SCRIPTS/diff_jsons.py" \ | ||
> "$DATA/aa_muts.json" \ | ||
> "aa_muts.json" \ | ||
> --exclude-regex-paths "root\['annotations'\]\['.+'\]\['seqid'\]" | ||
{} | ||
|
||
Same as above but using a GenBank file. This changes the 'type' of the annotations, | ||
but this is irrelevant for Auspice's use and simply reflects the reference source. | ||
|
||
$ ${AUGUR} translate \ | ||
> --tree "$ANC_DATA/tree.nwk" \ | ||
> --ancestral-sequences "$ANC_DATA/nt_muts.ref-seq.json" \ | ||
> --reference-sequence "$DATA/reference.gb" \ | ||
> --output-node-data "aa_muts.genbank.json" > /dev/null | ||
|
||
$ python3 "$SCRIPTS/diff_jsons.py" \ | ||
> "$DATA/aa_muts.json" \ | ||
> "aa_muts.genbank.json" \ | ||
> --exclude-regex-paths "root\['annotations'\]\['.+'\]\['seqid'\]" "root\['annotations'\]\['.+'\]\['type'\]" | ||
{} |
Oops, something went wrong.