-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating parser code #213
Comments
Question 1: Is my understanding correct that the |
Hi there, glad you're finding it useful. Yes definitely. There's a ticket tracking performance here. Those are some nice performance improvements, I like it. My biggest concern really is supporting the parser options, particularly lookahead. How are you dealing with that?
That's correct. It's probably the last remaining thing I would like to do on 2.x before moving it out of alpha. |
I guess I'll know once I progress far enough to try to implement that option (first I'd like to do disjunction and at least some non-string field types). But the groups (? * +) as I implemented them in the generated code should basically be capable of utilizing infinite lookahead and the option is just a matter of limiting that. Disjunction won't be trivial, but what worked for groups, should mostly work there too. The generated code is still a recursive descent parser, it just has the benefit of knowing the rules, types and options at compile time - this saves a TON of allocations, which were responsible for most of the overhead, along with reflection. It basically calls
OK, so I'll just leave their methods for parser code generation unimplemented for now. |
A small update for this weekend's progress (can't really give it much time during working days):
With the features above, the generated code is now able to parse the Jinja templating language with a quite complex grammar (EBNF has 50 lines and 500 words) that actually uses every single node type except negation and positive lookahead group. It also uses a very representative subset of the field types. The test suite for our Jinja interpreter passes with it completely with the exception of tests that compare the parsing errors, which are currently not always consistent. Here's the benchmark of parsing alone, with a pre-created PeekingLexer:
That's a >20x improvement in speed and an even bigger improvement in allocations. By the way, I've also tried the experimental generated lexer, which was able to improve lexing quite a bit but has some issues - it's missing the support for lazy quantifiers and even has a pretty significant bug (I will try to find it and make a PR later). I then wrote a lexer by hand, here are the results for comparison:
(the extra allocations in the hand-written version is something I will be able to optimize) |
Very nice!
This doesn't surprise me, and unfortunately error handling isn't really that great anyway. My main concern, and why I've held off on writing a parser generator, is that there a lot of features and improvements I'd still like to add, and I'm concerned about having to maintain two parallel implementations. That said, I think a generator is very important. But I do wonder if there isn't a hybrid approach where the runtime parser is refactored to drive both the reflection based implementation and the code generated implementations. The high level parsing logic would live in the parser runtime - error reporting, recovery, etc. while the low level path would be accelerated by the code generation. PS. The two big improvements I'm thinking of are:
|
This does not surprise me - it is definitely still experimental. |
Hey, another update after a long time, I was sick in some previous weeks so my work on this slowed down quite a lot. First of all, I just noticed this branch with parser codegen exists 😅, which I didn't notice before. I duplicated some of the effort but took a quite different approach (function per struct with some even inlined, not per node; not returning slices of tokens but capturing them to struct field ASAP). And my approach seems significantly faster and doing fewer allocations, so it's probably for the better.
As for a status update, I'm getting close to a relatively stable implementation so I might push something for your initial comments within a week. Error handling seems quite consistent with the reflection based parser now, lookahead works too. I'm just patching up an issue when a failed branch (disjunction/group) may leave some unwanted captured fields behind. Few questions for you, if you could spare the time:
|
Yep that should be fine.
From memory this is used when capturing into a
I've checked in code for generated lexers, but it hasn't been ideal. Maybe do one or two checked in, then we can consider what the best approach is. |
PS. Thanks for the update, and I'm glad you're feeling better. |
@petee-d - I'm very interested in trying out your prototype parser generator and offering some feedback how it works for my use case, and helping if I can with any code changes to get it production ready. Is there a branch with your work in progress I could try with my prototype? I've got a prototype lexer/parser with participle (kudos to @alecthomas I'm loving the interface!), and with a fairly complex grammar, I'm parsing hundreds of thousands of very small documents and performance is critical. Parsing about 200,000 documents that totals to about 256 MB of text (in aggregate) is producing about 26 GB total memory allocated (only <100 MB allocated at any given time, but gobs of memory being allocated in total over the 200,000 documents parsed and about 600 GCs triggered during the parsing run). This is even with using the experimental generated lexer. The generated lexer by itself is very fast. This level of allocations with the reflection-based parser is causing > 20x slowdown vs just lexing the data. I think the generated parser likely would be very fast. I'd love to see if this is the case, rather than hand rolling a parser and using the generated lexer. I have a test suite with about 500 comprehensive tests that cover the grammar, so I'll be able to offer feedback if your experimental generated parser produces the same results as the reflection parser, etc. |
@klondikedragon Thanks for the comment! My work priorities have quite shifted since the last time I wrote here and I didn't have any capacity to continue this. It's still very much on my mind and I wanted to finish it within personal time, but wasn't able to. You've definitely renewed my interest. :) I would be very happy to see what performance improvements it will give in your use case (I'm hopeful even 20x should be achievable) and whether the generated parser will pass your test suite. Unfortunately I just tried to rebase my WIP branch to current participle master and since @alecthomas has been quite busy (great changes, BTW!), there was a massive conflict. If you're not relying too much on the changes done in participle this year (my old base commit is Dec 30th 2021...), you could try using this newly pushed branch in my fork: master...petee-d:participle:old-master-parser-generation I will definitely try to resolve those conflicts soon and push a branch rebased to master, but it might be worth giving the old branch a shot. How to use it:
Good luck, eagerly waiting to hear about your results, if you're able to use it. |
Also, please coordinate with me before contributing on top of my mess - I definitely would first like to rebase it, split it to more reasonable commits and discuss certain breaking changes (minor, but breaking) I did to core participle. |
@petee-d -- that's awesome! I will try out this experimental branch and report back with the results. It might take me a few days. I'll share the perf/test-suite results and coordinate before attempting any PR changes. |
@petee-d - I'm working through it and have it generating the parser code from the grammar. Note that in the older version of participle about 40 of my test cases are failing, even without using the generated parser. It looks like it has to do with Lexer changes/fixes in the newer version. I haven't dug into it too deep yet. My lexer def has some very complex regular expressions and different states, so I haven't spent the time to completely understand. If you have time to rebase the code against the latest hopefully the newest code would pass cleanly. (Note that the generated lexer on the current version of participle also doesn't pass the test suite, so that also points to potential changes in the lexer since the auto-gen lexer code was crafted). I had it generate code in the same package, since I found that when generating in the mode for a different package, the "source" package wasn't auto-detected correctly (it had it as the participle package, instead of the application package that contained the grammar structures). No big deal, maybe that is hard to detect, and just ask the caller to specify the name of the package that contains the grammar structs (empty string indicates same package) instead of passing a bool. One minor issue, I have in the grammar a rule like the following, where it's indicating zero or more elements, and it's stored into an array of pointers to the struct representing the captured rule: The generated code is assuming that the array is an array of the struct, instead of an array of a pointer to the struct: This generated code had to be manually tweaked a little bit to compile: It might be the case that it's actually more efficient to just use an array of struct and the auto-gen code was expecting that? I tried changing the grammar definition to just have arrays of structs instead of struct pointers and then the autogen code didn't have that issue. Another small issue, it complained about At this point, I got the generated parser compiling, but it was panicing running the test suite, complaining about an out of bounds index in Here is a simpler version of what the grammar looks like: type aeData struct {
DataElements []aeDataElement `parser:"@@*"`
}
type aeDataElement struct {
Pos lexer.Position
EndPos lexer.Position
Rule1 *aeRule1 `parser:"( @@"`
Rule2 *aeRule2 `parser:"| @@"`
// otherwise it's garbage, just soak up any token
Garbage string `parser:"| @~Whitespace | @Whitespace )"`
} In the main generator function, inside the generated for loop for // capture Garbage from ~<whitespace>
// negation ~<whitespace>
{
branchCheckpoint := p.Lex.MakeCheckpoint()
// reference <whitespace>
if current.Type != -42 {
p.SetTokenError("")
goto negation401Error
}
if vAeDataElement.Garbage == "" {
vAeDataElement.Garbage = current.Value
} else {
vAeDataElement.Garbage += current.Value
}
_, _ = p.Lex.Next()
current = p.Lex.Current()
p.SetTokenError("")
goto disjunction44Alt2Error
negation401Error:
p.Lex.LoadCheckpoint(branchCheckpoint)
current = p.Lex.Current()
p.SetNoError(false)
if vAeDataElement.Garbage == "" {
vAeDataElement.Garbage = current.Value
} else {
vAeDataElement.Garbage += current.Value
}
_, _ = p.Lex.Next()
current = p.Lex.Current()
} Notice in the At first, I tried changing the func (p *PeekingLexer) Current() *Token {
if int(p.rawCursor) >= len(p.tokens) {
return &p.eof
} else {
return &p.tokens[p.rawCursor]
}
} However, this then resulted in the for loop continuing until it had captured 1000000 garbage elements (that negation case just kept on capturing, even the EOF token). I manually modified the auto-generated code to put a check for Lexer EOF at the top of the for loop for the // group AeDataElement*
for matches := 0; ; matches++ {
// =============== (manual edit begin)
if p.Lex.Current().EOF() {
break
}
// =============== (manual edit end)
branchCheckpoint := p.Lex.MakeCheckpoint() With this manual edit to the grammar, it's now able to run the full test suite! Interestingly, only 19 test cases failed with the auto-generated parser, instead of 40 with the reflection-based parser. The 19 cases that are failing look to all be related to the older lexer code producing wrong results (not related to the auto-gen parser). So the auto-generated parser somehow fixed some issues in the reflection parser for this older version of participle. (In the newer participle, all test cases are passing in the reflection parser.) Now that I have this running, I'll collect benchmark results and report back! |
Awesome result and thanks for the update! |
@klondikedragon thanks for trying the old branch! Rebasing to master is going to take some effort as there really were quite a few changes, plus I'd like to rework some stuff I initially did in a quick&dirty way to a version I would actually dare to submit as a merge request. :) But I already started working on it.
Very possible, I knew about one or two bugs in the generated lexer implementation in the December version of participle master I was using. It's very likely they have been fixed in the meantime. But do I understand correctly generating the lexer with current master didn't help? Maybe I'll look into it later - I've actually been thinking of trying to improve the lexer code generation as well, but that's for much much later.
I see it, I suppose I was trying something and forgot to revert it as the example grammar I was testing the
Yep, that's a bug, already see it, will fix. Wasn't intentional at all, although it's true you are better off not wrapping the slice elements in pointers anyway.
I know about that issue actually, it just didn't bother me too much as in my tests
Thanks for describing the problem so well, will look into it. :) So far I'm thinking of just adding
No idea what's happening there, but I'll test this stuff thoroughly after the rebase to make sure there are no bugs either way.
Eagerly awaiting the results! :) |
@petee-d - the results are pretty incredible for their speedup, both in wall clock time, CPU time, and total allocations saved... I ran various test runs as well as did CPU/allocation-site pprof to analyze the behavior. Here is the performance of the application parsing about 140,000 documents (totals to 250 MB in aggregate) using various combos of reflection or generated lexers/parser (all from your experimental branch):
(The input was the same each time, and it's a deterministic workload. Based on CPU profiling, the wall clock time is roughly equivalent to CPU time--app is CPU-bound the whole time.) Note the last line shows the overhead of the application just to receive/decompress the document data, so to understand lexer/parsing overhead, we need to subtract that line. Also note the difference between In other words, about 4 seconds (3 seconds plus (5 minus 4) seconds) of wall clock/CPU time is spent by the app doing things other than lexing/parsing, and about 1835 MB (745 MB plus (3649 - 2559) MB) of the total allocations are from the app doing other things. We can thus break down the wall/CPU time and total allocation performance of the various combinations as follows:
So with the code as-is, the CPU speedup is 11.6x and a total allocated bytes reduction of 7.5x! This is already fantastic! Profiling the genlexer-genparser case, a large portion of the time and total allocations were from the formation of the For reference, here is the code for the pooled peeking lexer:var peekingLexerPool = sync.Pool{
New: func() interface{} {
return &PeekingLexer{
elide: make(map[TokenType]bool, 4),
}
},
}
// Upgrade a Lexer to a PeekingLexer with arbitrary lookahead.
// Faster if you need to lex thousands of similar documents.
//
// "elide" is a slice of token types to elide from processing.
//
// You must call `PutBackPooledPeekingLexer` once done with the
// returned lexer in all cases (ok or error).
func UpgradePooled(lex Lexer, elide ...TokenType) (*PeekingLexer, error) {
r := peekingLexerPool.Get().(*PeekingLexer)
// reset the state of the PeekingLexer to empty (preserving any allocated capacity)
r.Checkpoint.rawCursor = 0
r.Checkpoint.cursor = 0
// note: this preserves capacity
r.tokens = r.tokens[:0]
for k := range r.elide {
delete(r.elide, k)
}
return fillInPeekingLexer(lex, r, elide...)
}
func PutBackPooledPeekingLexer(r *PeekingLexer) {
if r != nil {
peekingLexerPool.Put(r)
}
}
// Upgrade a Lexer to a PeekingLexer with arbitrary lookahead.
//
// "elide" is a slice of token types to elide from processing.
func Upgrade(lex Lexer, elide ...TokenType) (*PeekingLexer, error) {
r := &PeekingLexer{
elide: make(map[TokenType]bool, len(elide)),
}
return fillInPeekingLexer(lex, r, elide...)
}
func fillInPeekingLexer(lex Lexer, r *PeekingLexer, elide ...TokenType) (*PeekingLexer, error) {
for _, rn := range elide {
r.elide[rn] = true
}
if batchLex, ok := lex.(BatchLexer); ok {
for {
batch, err := batchLex.NextBatch()
if err != nil {
return r, err
}
r.tokens = append(r.tokens, batch...)
last := batch[len(batch)-1]
if last.EOF() {
r.eof = last
break
}
}
} else {
for {
t, err := lex.Next()
if err != nil {
return r, err
}
r.tokens = append(r.tokens, t)
if t.EOF() {
r.eof = t
break
}
}
}
r.rawCursor = r.findNonElided(0) // Set rawCursor to the first non-elided token
return r, nil
} So the performance of your generated parser combined with the generated lexer is very impressive for my use case! It's able to parse/lex about 250 MB of text in ~1.5 seconds (excluding time to load from disk, etc.). The performance of the generated lexer/parser using sync.Pool for the PeekingLexer is very good for this application's needs and further optimization wouldn't be needed. It seems the generated parser is also producing accurate results, it's the generated lexer that has more accuracy issues. Once the branch is rebased I can try it against the latest reflection parser to see if the generated parser passes the full test suite without any accuracy issues. I'm definitely motivated to help as I can, I'll try to isolate some simple cases where the generated lexer is not accurate. Since I'm curious I profiled the Note in the flamegraph that nearly all the orange blocks you see taking up CPU time are related to the runtime for allocation, rather than CPU time for actual parser work. This could potentially be optimized by using a It's much appreciated if you have time to make progress on your branch! I'll be excited to try it out and report back on accuracy in combo with the current reflection lexer. In the meantime I'll try to isolate the bugs I'm seeing in the generated lexer. |
P.S. using the pooled PeekingLexer is easy enough, and data := &aeData{}
stringLexer, _ := aelexer.Lexer.(lexer.StringDefinition)
lex, err := stringLexer.LexString("", s)
if err != nil { return nil, err }
peekingLex, err := lexer.UpgradePooled(lex)
defer lexer.PutBackPooledPeekingLexer(peekingLex)
if err != nil { return nil, err }
err = ParseAeData(peekingLex, data, true)
... |
@klondikedragon those are some really interesting results, thank you very much for sharing them. :) Could you also do a comparison benchmark for just the parsing step on some quite representative single sample? Meaning you would construct a single Also, it would be nice to show the number of allocations instead of the allocated volume - that's much more indicative of allocation pressure and related CPU slowdown. Anyway, it does indeed seem like the bottleneck are the allocations for the AST itself. You could also try optimizing the AST structs to avoid pointers whenever possible - copying is generally much cheaper than an allocation. As for pooling each pointer type used in the AST (and maybe even slices, although those are more tricky), this is actually something I totally already considered and will maybe do in the future. It might not be that complex - creating the |
oops LOL, sorry had the input size off by a factor of 10x in my previous post, the input size in above tests was 25 MB, not 250 MB. Still that means the app was doing about 16 MB/sec lexing/parsing throughput for a mixed real life workload, which is great. Pooling AST types (and possibly slices) would definitely be a nice bonus. Thanks for the tip on avoiding pointers in AST structs! This app will be operating in a low memory container, so allocation sizes might also be relevant as it could pressure more GC to happen if it backs up against the available mem. For a representative "large" doc (about 410 bytes), here are some results where the lexer/parser is generated each time:
Max throughput of 11.6 MB/sec. Observing a 16.8x CPU speedup and 36x allocs reduction. And again with a representative small doc (about 42 bytes):
Max throughput of 3.6 MB/sec. 10.4x CPU speedup and 13.7x allocs reduction. The above represents the app use case more directly where it has to construct a lexer/parser each time. As requested, here's some benchmarks constructing lexer/parser once and restoring the checkpoint each loop (timer reset after lex/parser construction before the loop): Large doc case:
25.2 MB/sec. 26.7x CPU speedup and 38.7x alloc reduction! Small doc case:
5.9 MB/sec. 14.0x CPU speedup and 15.5x alloc reduction! Very impressive! |
I'm loving these numbers! I have some concerns with the long term maintenance of both the lexer and parser code generators which we should solve (or at least, not make worse) before getting these changes into master. I've opened #264 with some initial thoughts. |
- Add a CLI tool that can ingest the JSON and dump out the generated code. - Lexers can now be JSON marshalled. - Add a goreleaser step for the binary. As discussed in #213
- Add a CLI tool that can ingest the JSON and dump out the generated code. - Lexers can now be JSON marshalled. - Add a goreleaser step for the binary. As discussed in #213
- Add a CLI tool that can ingest the JSON and dump out the generated code. - Lexers can now be JSON marshalled. - Add a goreleaser step for the binary. As discussed in #213
(edit: I had an issue with the PooledLexer benchmarks before, this has been fixed!) After #263 merged, I've tried the latest WIP parsergen code. Lots of exciting progress!!
// group AeDataElement*
for matches := 0; ; {
// =============== (manual edit begin)
if c.Lex.Peek().EOF() {
break
}
// =============== (manual edit end)
branchCheckpoint := c.Lex.MakeCheckpoint() This manual edit fixes the infinite loop. So adding a check for EOF at the top of a loop for group matches in the code generator is probably the simplest/easiest way to solve for that?
Results for the "large doc" (~410 bytes) case:
Small doc (42 bytes) case:
When profiling the latest code, memory allocation / GC is the largest bottleneck. The pooled lexer eliminates one of the largest sources of that bottleneck for my use case (where I'm parsing hundreds of thousands of relatively small docs). I'll submit a PR for lexer pooling for consideration. If the generated parser were to ever use pools for its allocations it could speed things up further, but it's already very speedy! |
@klondikedragon thanks for sharing those results. I'm very glad you like the approach to that chicken/egg problem, took quite some back-and-forth to arrive at that solution. I initially also wanted to somehow make it so that generating the parser code wouldn't even import the generated code and thus you could maybe re-generate the code even if changes to the grammar made the code invalid - but I wasn't able to find any way to do that. Deleting the file seems like a reasonable compromise as it can be added as a first step to some Make command the developer would use to regenerate the grammar. As for that negation EOF bug, I forgot to look into it once I fixed the first issue with consuming EOF out of the peeking lexer. Fixed now on the branch, along with a second bug. It's possible the test failures you had will be fixed by that second fix. BTW, I actually didn't have test coverage for negation yet and my Jinja parser doesn't use it, so I basically never tried it. But now the branch includes test coverage for negation with many edge cases, so I think it should be OK. If not, some small demonstration of the remaining issue would be very appreciated. :) Meanwhile, I'm continuing with writing a test suite for the generated parser. Since it will also test the reflective parser on the same grammar and that grammar will test just about every participle feature once it's finished, I think it will even give better test coverage than all the existing test suites, especially for error reporting. |
@petee-d -- with your latest branch, it fixes the negation EOF bug, and also the test suite is now passing 100%! With all of the work over the past months + the pooled lexer patch, there is now an incredibly impressive 25x speedup in parsing throughput vs the original reflection lexer/parser!! (41971 docs/second vs 1644 docs/second). Participle takes the drudgery out of creating custom lexers/parsers, and now it delivers production-level performance also! |
This is super exciting, awesome work :)
If this one is better, let's delete the old test suite once this lands. |
Hi there again @alecthomas,
I'm already using this great parser in a large project (a GoLang implementation of Jinja, will be open sourced eventually) and one of the things that somewhat bothers me is the speed and GC pressure (allocations) of the parser. I've considered using the generated lexer to improve it, but it's just not enough. So what if this library could also generate code for the parser? Is this something you've already considered or even started playing with?
I actually already did and have a very ugly prototype that can generate the parser code for this subset of features:
strct
,sequence
, tokenreference
, tokenliteral
,capture
(@
) andgroup
(?
,*
,+
, not!
yet)Example grammar it can parse consistently with the native parser (except for error messages):
For this grammar and with a pre-built
PeekingLexer
(so that I compare only parsing speed), here are the benchmarks at this moment:The reason for being >30x faster for this particular grammar (likely a very cherry-picked example) is that the generated code:
Stuff.A
goto
(responsibly) to solve things that would otherwise need calling a function or duplicating codeIt's very possible I'll run into an issue I won't be able to overcome, but for now it seems like this should be very doable. The generated code isn't too long (160 LOC for the above grammar), is quite well isolated (adds one generic method to all
node
s, plus a file for code generation utilities) and doesn't introduce any new dependencies. For now I would just like to let you know I'm working on this, so we can coordinate any similar efforts. :) I would also appreciate your support with a few questions (will post them in this issue later).What do you think? Cheers!
The text was updated successfully, but these errors were encountered: