-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Newick parsing #69
Conversation
Replaces the basic Newick parser with an external one that handles single-quoted node names. Quoted names are allowed by the format¹ and occur in trees produced by NCBI Pathogens.² The parser's API is a little awkward for our use case, but it's perfectly workable. Out of several parsers I tried on NPM, this was the only one which handled quoted names, so use it despite the slightly awkward API. ¹ See <https://en.wikipedia.org/wiki/Newick_format#Notes> for lack of any formal spec. ² <https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456>
f5cad6b
to
cabba98
Compare
I tested this by dragging on a couple example Newick files to auspice.us vs. the review app for this PR (well, local server). The particular example that's fixed by this PR is this Newick tree:
and it renders before/after like so: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like switching the newick parser resolves #66 as well. Using the example tree.nwk that Cornelius provided in the test app, I see the auspice.us error notification
and the following error message in the console
tree.nwk failed to be read as a newick tree. Error: Error: End of buffer reached.
Should we catch errors raised by the parser and display a the more specific error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared the following between review app and what's currently at auspice.us:
- The example file from discussion post does not show tip labels on auspice.us but does on the review app
-
tree.nwk
from local run of zika-tutorial - identical -
tree.nwk
from local run of rsv - identical
@victorlin Not just tip labels for the first test case you list, but the actual structure of the tree is vastly different (because the current parser misparses completely). |
If we could turn the obtuse parser errors (e.g. |
Merging this since it improves the situation. We can improve user-reported errors in a subsequent PR as desired. With some light prodding of
Grepping the source reveals two more that are applicable to its parsing routines:
These all basically boil down to "Check the syntax of your Newick tree." |
Two recent issues (#71, #72) provide examples where the improved parsing either didn't parse a valid newick tree or (much more worryingly) returned an entirely incorrect tree structure, including nodes not present in the newick. See those issues for details, including the tree files. While this reversion will re-introduce bugs such as #66 and the bug in <https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456/4>, but they are lesser than the bugs introduced by #69. This reverts commit cabba98, although subsequent changes to package-lock.json mean it's not a clean revert.
Two recent issues (#71, #72) provide examples where the improved parsing either didn't parse a valid newick tree or (much more worryingly) returned an entirely incorrect tree structure, including nodes not present in the newick. See those issues for details, including the tree files. While this reversion will re-introduce bugs such as #66 and the bug in <https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456/4>, but they are lesser than the bugs introduced by #69. This reverts commit cabba98, although subsequent changes to package-lock.json mean it's not a clean revert.
Two recent issues (#71, #72) provide examples where the improved parsing either didn't parse a valid newick tree or (much more worryingly) returned an entirely incorrect tree structure, including nodes not present in the newick. See those issues for details, including the tree files. While this reversion will re-introduce bugs such as #66 and the bug in <https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456/4>, but they are lesser than the bugs introduced by #69. This reverts commit cabba98, although subsequent changes to package-lock.json mean it's not a clean revert.
Replaces the basic Newick parser with an external one that handles single-quoted node names. Quoted names are allowed by the format¹ and occur in trees produced by NCBI Pathogens.²
The parser's API is a little awkward for our use case, but it's perfectly workable. Out of several parsers I tried on NPM, this was the only one which handled quoted names, so use it despite the slightly awkward API.
¹ See https://en.wikipedia.org/wiki/Newick_format#Notes for lack of
any formal spec.
² https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456
Checklist