Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ancestral, translate incompatible under certain invocations #1345

Closed
jameshadfield opened this issue Nov 30, 2023 · 1 comment
Closed

ancestral, translate incompatible under certain invocations #1345

jameshadfield opened this issue Nov 30, 2023 · 1 comment
Assignees
Labels
bug Something isn't working VCF

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Nov 30, 2023

Current Behaviour

augur translate, when using a node-data JSON input (--ancestral-sequences), requires the full sequence to be present on every node in the JSON.

When augur ancestral produces this node-data JSON via a VCF input the resulting node-data JSON does not have sequences assigned to nodes.

The resulting error message is misleading:
"Can't find M66 OR NODE_0000001 in the alignment provided!" (Sequence names provided as example only)

Code details

The sequences are only assigned to nodes if full_sequences == True, and this variable is defined via: full_sequences = not is_vcf.

Possible solution

augur translate should check that all isolates in the tree have a corresponding sequence attached in the node-data JSON and error if not.

Stepping back, this comes from deviating from the (implicit) path expected between ancestral and translate¹: VCF inputs to ancestral should result in VCF outputs for consumption by translate. Confusingly, the nt-muts.json can be used by export. So we can't block JSON output from ancestral when VCF inputs are used.

Additionally, schemas to validate the node-data JSON would be helpful here. They'd at least force us to consider whether node-data JSONs where nodes do not have sequences attached are valid.

Your environment: if running Nextstrain locally

augur 23.1.1

¹ Here i'm using TB as the guide, because it was the use-case the VCF code was written for and still the only (?) Nextstrain build to use VCFs.

@jameshadfield jameshadfield added bug Something isn't working VCF labels Nov 30, 2023
@jameshadfield jameshadfield self-assigned this Nov 30, 2023
jameshadfield added a commit that referenced this issue Dec 4, 2023
This is the implicit expectation, and is true for all of our pipelines.
It is possible to translate without having the full sequence attached
to the node (by reconstructing mutations on the root), but that is
not the approach taken by the current code.

Closes #1345
jameshadfield added a commit that referenced this issue Dec 4, 2023
This is the implicit expectation, and is true for all of our pipelines.
In theory it's possible to translate without having the full sequence
attached to the node by reconstructing mutations on the root, but that
is not the approach taken by the current code.

As we explicitly check the tree nodes against the sequences in the
node-data JSON we can skip the automatic tests optionally performed when
reading the node-data JSON.

Closes #1345
jameshadfield added a commit that referenced this issue Dec 4, 2023
This is the implicit expectation, and is true for all of our pipelines.
In theory it's possible to translate without having the full sequence
attached to the node by reconstructing mutations on the root, but that
is not the approach taken by the current code.

As we explicitly check the tree nodes against the sequences in the
node-data JSON we can skip the automatic tests optionally performed when
reading the node-data JSON.

Closes #1345
jameshadfield added a commit that referenced this issue Dec 5, 2023
This is the implicit expectation, and is true for all of our pipelines.
In theory it's possible to translate without having the full sequence
attached to the node by reconstructing mutations on the root, but that
is not the approach taken by the current code.

As we explicitly check the tree nodes against the sequences in the
node-data JSON we can skip the automatic tests optionally performed when
reading the node-data JSON.

Closes #1345
jameshadfield added a commit that referenced this issue Dec 11, 2023
This is the implicit expectation, and is true for all of our pipelines.
In theory it's possible to translate without having the full sequence
attached to the node by reconstructing mutations on the root, but that
is not the approach taken by the current code.

As we explicitly check the tree nodes against the sequences in the
node-data JSON we can skip the automatic tests optionally performed when
reading the node-data JSON.

Closes #1345
@jameshadfield
Copy link
Member Author

Closed by #1348. Please see comment #1348 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working VCF
Projects
None yet
Development

No branches or pull requests

1 participant