-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy to reproduce baffling parse failure when using semantic predicates. Bug in parser generator? #4660
Comments
Run
|
Run
|
Run
|
It is more likely that your grammar is slightly misconfigured. The semantic precedent is kind of an on/off switch, but you cannot usually use them to select an alt based on context because ANTLR will generate a a predict() for the alt, and not a simple token lookahead (most of the time). You could try SLL mode, but I think your grammar is likely incorrect. Sll mode: ``java
e: e '*' e
| e '+' e
| INT
| FLOAT
;
- if you are using it to try and lookahead to force an alt selection, this is likely not going to work. You will have to make your grammar unambiguous or pre-parse it.
It is also possible, that your language does not lend itself well to generated parsers. Mathematical formulae tend to be like that. |
I think my use of semantic predicates is correct, I use it to distinguish between syntactically equivalent but semantically different parses. I.e. the formula "expression SYMBOL expression" can become a predicate if the text lexed into a SYMBOL (eg "ip") is marked as a predicate operator, but can become an expression if the lexed SYMBOL ("ie") is marked as an expression operator. From the antlr3 manual:
The FORTRAN parsers uses a symbol table to check if ID is a variable or a function. |
This is ANTLR4. However, I will take a look at your grammar. If the grammar is not standalone though, it may be difficult to diagnose. In general, don't try to make semantic judgements in the parse. Create a second pass: parser -> parse tree -> AST -> Execute/Codegen. |
After a quick glance, I think your expression rule is specifying incorrect precedence: expression
: ETRUE # ExpressionTRUE
| EFALSE # ExpressionFALSE
| BOOL # BOOLSet
| {
_input.LT(1).getText().equals("x") || _input.LT(1).getText().equals("y")
/*symbol_table.isVariableSymbol(_input.LT(1).getText()) */
}? variable=SYMBOL # ExpressionVariable
| left=expression operator=ADD right=expression # Addition
| '(' inner=expression ')' # ExpressionParentheses
| left=expression operator=operator_ie { System.out.println("PASSED IE operator"); } right=expression # OperatorInfixExpression
; Try: // The earlier a variant is specified in this rule, the higher the precedence and will resolve first
expression
: '(' inner=expression ')' # ExpressionParentheses
| left=expression operator=ADD right=expression # Addition
| left=expression operator=operator_ie { System.out.println("PASSED IE operator"); } right=expression # OperatorInfixExpression
| {
_input.LT(1).getText().equals("x") || _input.LT(1).getText().equals("y")
/*symbol_table.isVariableSymbol(_input.LT(1).getText()) */
}? variable=SYMBOL # ExpressionVariable
| ETRUE # ExpressionTRUE
| EFALSE # ExpressionFALSE
| BOOL # BOOLSet
; However, without knowing more about your language, I cannot say for certain. Have you specified it, or is it already extant? When you must do tricks like this, it means the language is not well specified, or the parser grammar isn't. Or, because you expect the parser to do one thing in a certain semantic case and another in another case, it can become impossible to correctly cover all possibilities. Hence try not to do this in the parser itself. Accept anything that looks like it is valid syntax, then work out the semantics in the next pass. The other thing is that you are looking to hoist the predicates from the operator_ie rule. I don't think that will work, and anyway, this rule is going to produce a prediction rather than look at the next tokens (look at the generated code). You can keep playing, but I think that this is the wrong approach for a left recursive ANTLR4 expression chain. You might have better luckk expanding the expression not to be left recursive in a traditional LL hierarchy where the LHS and RHS reference the next rule in the chain. This is what we did in ANTLR3. |
I would also say, that if you specify without the predicates, you might find that ANTLR resolves it for via its own predictions. Edit to clarify... Decelare IP and IE as lexer tokens and add them as operators in your expression tree with the correct precedence. Then create a value rule: value: VALUE | IP | IE (| any more keyword); And use that in expression instead of just value. Make sure your expression is correct because what you have does not deal with DOT and does not seem to have the ability to parse: X Y without an operator as in your example. DOT you can do withL | expression DOT expression Also, I would tend to favor banning the use of your keywords as |
Thank you Jim for taking the time to look at this. I think I have found the reason for the problem based on your answers and on a detailed reading of the manuals. The important sentence from the antl4 manual is this: "Predicates can appear anywhere within a parser rule just like actions can, but only those appearing on the left edge of alternatives can affect prediction (choosing between alternatives)" All my previous uses of semantic predicates have placed the predicate on the left edge of >>alternatives<<. The grammar and language is fixed and pretty straightforward, the sempreds are used to detect variable types and for The new support I am writing for dynamically added operators that are prefix operators work fine with the semantic predicate detection, since the sempred is at the left edge of alternatives, ie like this: (pseudo code below) But for dynamically added infix operators we have: They look so similar, but alas apparantly work so differently, since the sempred now is not used to discern between alternatives, instead if becomes a value checker of sorts, which clearly can fail if the parse steers into this path the wrong way. This creates very weird parse failures. Also the antlr4 ALL(*) parser can apparently pickup tokens after the first expression to steer the parse right sometimes, which perhaps is why parsing "(a ie b)" fails, but "(a ie b) + b" succeeds. Perhaps the good old { }?=> syntax to indicate that you want to use the sempred for parsing decisions was a useful, since you knew if the sempred was choice-making or value-checking. Now you have to know that the position of the sempred influences its purpose. Fortunately there was a simple fix, move the predicate into the lexer instead, and override the token. SYMBOL: [a-zA-Z][a-zA-Z0-9_]* { This will override the symbols alread in the lexer for the infix operators, and then the grammar rule becomes: Do you agree with this analysis Jim? |
I have a grammar that parses mathematical formulas. It uses semantic predicates to steer parsing based on a dynamic symbol table. (I.e. is this string a variable or a constant or a set symbol etc...) This has worked very well.
Now I am trying to add infix dynamic operators. It ought to be a piece of cake, since the semantic predicates for the other kind of symbols has worked so well. However, there is a behaviour of the generated parser that looks baffling and is perhaps an error.
The minimized grammar can be seen here: https://github.com/viklauverk/EventBTool/blob/AntlrReproducer/src/main/antlr4/com/viklauverk/eventbtools/core/EvBFormula.g4
./reproduce.sh console -q 'eb.show.formula x+y' q
Builds an expression and prints: x+y
./reproduce.sh console -q 'eb.show.formula false & true' q
Builds a predicate and prints: ⊥∧⊤
Now test semantic pred that detects "ip" is infix predicate operator, just like &.
However this operator "ip" takes expressions as arguments instead of predicates.
./reproduce.sh console -q 'eb.show.formula x ip y' q
Builds a predicate and prints: x ip y
Now test semantic pred that detects "ie" an infix expression operator, just like +.
But here is the problem:
./reproduce.sh console -q 'eb.show.formula x ie y' q
Should build an expression and print x ie y
However it fails.
But very strangely this works:
./reproduce.sh console -q 'eb.show.formula (x ie y)+y' q
Builds an expression and prints: (x ie y) + y
This behaviour looks very strange. Why fail with
(x ie y)
but succeed with(x ie y) + y
Is this a bug in the generated parser, or have I made a mistake in the grammer?
Man thanks for your help!
How to build on Linux/Mac:
git clone https://github.com/viklauverk/EventBTool.git
cd EventBTool/
git checkout AntlrReproducer
mvn package
./reproduce.sh
(or just ./configure; make)
Update:
I have simplifie the grammar further, now the variable symbols
x
andy
, the operator symbolsip
andie
are hardcoded in the grammar. The grammar has no external java package dependencies any more.Update:
By changing the grammar from:
to:
then suddenly it succeeds in parsing
x ie y
The text was updated successfully, but these errors were encountered: