-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ancestral, translate incompatible under certain invocations #1345
Comments
jameshadfield
added a commit
that referenced
this issue
Dec 4, 2023
This is the implicit expectation, and is true for all of our pipelines. It is possible to translate without having the full sequence attached to the node (by reconstructing mutations on the root), but that is not the approach taken by the current code. Closes #1345
jameshadfield
added a commit
that referenced
this issue
Dec 4, 2023
This is the implicit expectation, and is true for all of our pipelines. In theory it's possible to translate without having the full sequence attached to the node by reconstructing mutations on the root, but that is not the approach taken by the current code. As we explicitly check the tree nodes against the sequences in the node-data JSON we can skip the automatic tests optionally performed when reading the node-data JSON. Closes #1345
2 tasks
jameshadfield
added a commit
that referenced
this issue
Dec 4, 2023
This is the implicit expectation, and is true for all of our pipelines. In theory it's possible to translate without having the full sequence attached to the node by reconstructing mutations on the root, but that is not the approach taken by the current code. As we explicitly check the tree nodes against the sequences in the node-data JSON we can skip the automatic tests optionally performed when reading the node-data JSON. Closes #1345
jameshadfield
added a commit
that referenced
this issue
Dec 5, 2023
This is the implicit expectation, and is true for all of our pipelines. In theory it's possible to translate without having the full sequence attached to the node by reconstructing mutations on the root, but that is not the approach taken by the current code. As we explicitly check the tree nodes against the sequences in the node-data JSON we can skip the automatic tests optionally performed when reading the node-data JSON. Closes #1345
jameshadfield
added a commit
that referenced
this issue
Dec 11, 2023
This is the implicit expectation, and is true for all of our pipelines. In theory it's possible to translate without having the full sequence attached to the node by reconstructing mutations on the root, but that is not the approach taken by the current code. As we explicitly check the tree nodes against the sequences in the node-data JSON we can skip the automatic tests optionally performed when reading the node-data JSON. Closes #1345
Closed by #1348. Please see comment #1348 (comment) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Current Behaviour
augur translate
, when using a node-data JSON input (--ancestral-sequences
), requires the full sequence to be present on every node in the JSON.When
augur ancestral
produces this node-data JSON via a VCF input the resulting node-data JSON does not have sequences assigned to nodes.The resulting error message is misleading:
"Can't find M66 OR NODE_0000001 in the alignment provided!" (Sequence names provided as example only)
Code details
The sequences are only assigned to nodes if
full_sequences == True
, and this variable is defined via:full_sequences = not is_vcf
.Possible solution
augur translate
should check that all isolates in the tree have a corresponding sequence attached in the node-data JSON and error if not.Stepping back, this comes from deviating from the (implicit) path expected between
ancestral
andtranslate
¹: VCF inputs to ancestral should result in VCF outputs for consumption bytranslate
. Confusingly, thent-muts.json
can be used byexport
. So we can't block JSON output fromancestral
when VCF inputs are used.Additionally, schemas to validate the node-data JSON would be helpful here. They'd at least force us to consider whether node-data JSONs where nodes do not have sequences attached are valid.
Your environment: if running Nextstrain locally
augur 23.1.1
¹ Here i'm using TB as the guide, because it was the use-case the VCF code was written for and still the only (?) Nextstrain build to use VCFs.
The text was updated successfully, but these errors were encountered: