-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interop with Automa.jl #28
Comments
Hi Jakob, thanks for reaching out! I would love if
Regarding route 1 it should be possible to determine parts of CombinedParsers that can be expressed as Automa FST and generating code Regarding code-gen, CP fully relies on julia @generated functions and compiler optimization. Regarding non-context-free grammars, CombinedParsers provides I was looking into the use of stacks in parsers, also for handling left-recursion in ENBF. What do you think? I am looking forward to your thoughts! |
Hello Gregor! I'm the current maintainer of Automa.jl (by accident, I didn't write it, but ended up maintaining it, for now).
You mentioned potential integration/collaboration between Automa and CombinedParsers (CP) and I think that sounds like an excellent idea!
The core limitation of Automa is that is uses FSMs, and there are many formats that just can't be parsed using FSMs - JSON, Newick, Julia code, and so on. There has been a longstanding desire among some users of Automa to create an Automa-like package for pushdown automata in order to parse context-free grammars. But it seems no-one who wants it has the skills and knowledge to implement it, so CP is probably the best bet for a good parsing package for that use case. I'll try out CP in the near future and see if it's a good fit for the BioJulia community :)
As I see it, CP and Automa are complementary: Automa is faster and more generic (in that it executes arbitrary code), but its limited to regular grammars, and its user experience is abysmal. I'm slowly pushing towards Automa v1.0, where I want to improve the interface: Make it simpler to use, have better debugging and so on.
What did you envision with integration? Perhaps CP can leverage Automa for some of its codegen? Automa could perhaps be automatically used to create smaller functions that CP can call when parsing. Like, CP could automatically find smaller regular patterns, and create mini-parsers using Automa, which it then calls itself.
While this should be possible in principle, and should make CP faster, I don't know if it's possible practically. Let's see.
If you need any information about Automa internals, let me know. For what it's worth, Automa does sort of support UTF-8, actually. It views non-ascii characters as a pattern of concatenated bytes.
Looking forward to hearing from you!
The text was updated successfully, but these errors were encountered: