Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgresql] a_expr_qual/a_expr_qual_op is ambiguous. #4306

Closed
kaby76 opened this issue Nov 2, 2024 · 5 comments
Closed

[postgresql] a_expr_qual/a_expr_qual_op is ambiguous. #4306

kaby76 opened this issue Nov 2, 2024 · 5 comments

Comments

@kaby76
Copy link
Contributor

kaby76 commented Nov 2, 2024

Consider the test input file collate.sql. The parse is ambiguous for a_expr_qual_op. See the two parse trees in ambig.txt.

This ambiguity is a major contributor to the performance issues in the grammar. This file takes ~1.3s to parse cold start (~0.4s warm start), and it is only 276 lines long!

Note, I chose this file because it had the least number of non-zero ambiguities, which is 1. If you try to take a test input with a large number of ambiguities, trparse --ambig will take a long time. To find the file with the least non-zero number of ambiguities, I did $ dotnet trperf -c aF ../examples/*.sql | grep -v '^0' | awk '{sum[$2] += $1} END {for (key in sum) print sum[key], key}' | sort -k1 -n | head.

@kaby76 kaby76 changed the title [postgresql] a_expr_qual_op is ambiguous. [postgresql] a_expr_qual/a_expr_qual_op is ambiguous. Nov 2, 2024
@kaby76
Copy link
Contributor Author

kaby76 commented Nov 2, 2024

Analysis

In the input file colate.sql.

select x || y from collate_test10; -- ok, because || is not collation aware

This seemingly simple expression involving || can be parsed in two ways because qual_op appears in two places in expressions:

This is all fine, but the problem is the context target_el.

target_el
: a_expr (AS collabel | identifier |) # target_label
| STAR # target_star
;

We see here that || can be treated as a unary, postfix op, or a binary op. This seems quite wrong because the only postfix ops are ! and !!. https://www.postgresql.org/message-id/BE2DF53D-251A-4E26-972F-930E523580E9%40enterprisedb.com

target_el seems wrong because it does not look like https://github.com/postgres/postgres/blob/9be4e5d293b554d8a0800790c57fc707a3b5cf0f/src/backend/parser/gram.y#L17121-L17158

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 2, 2024

Changing target_el : a_expr (AS collabel | identifier |) | STAR ; to target_el : a_expr (AS collabel |) | STAR ; as per the official gram.y grammar fixes the ambiguity, but breaks the parse in a few other places now. There are additional errors in the grammar.

Also, while this fixes the ambiguity for collate.sql, the parse time does not improve at all. This is because there is a huge max-k required. https://github.com/antlr/grammars-v4/blob/199a5121ece05d2f2e7eca330d0738220499e80c/sql/postgresql/examples/collate.sql#L87C9-L217C4. Rule target_list is the culprit. (There is a way to fix the grammar for fallbacks by duplication.)

First things first, though.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 3, 2024

Cloning the rule target_list in rule sql_expression (after applying #4299, #4301, #4303) has a large speed up (dotnet trparse -t ANTLRv4 PostgreSQLParser.g4 | dotnet trclonereplace " //parserRuleSpec[RULE_REF/text()='sql_expression']//RULE_REF[text()='target_list_']" -s "_1" | dotnet trsponge -c. NB: I had to remove the labels because Antlr chokes on identical # label across different rules). Before cloning, 35s. After cloning, 26s. This is because a full context is required for these elements. Rule cloning eliminates the fallbacks, but it doesn't come without a price: there are many more rules. There are still many ambiguities in the grammar, but I am confident that after resolving ambiguities and using cloning, the performance will be quite spectacular.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 3, 2024

Changing target_el : a_expr (AS collabel | identifier |) | STAR ; to target_el : a_expr (AS collabel |) | STAR ; as per the official gram.y grammar

I was in error. target_el is correct. The problem is expressions in the presence of the alias.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 3, 2024

The trick is to provide a semantic predicate before qual_op.

a_expr_qual
    : a_expr_lessless qual_op?
    ;

changed to

a_expr_qual
    : a_expr_lessless ({ this.OnlyAcceptableOps() }? qual_op | )
    ;

with

    public bool OnlyAcceptableOps()
    {
        var c = ((CommonTokenStream)this.InputStream).LT(1);
        var text = c.Text;
        return text == "!" || text == "!!";
    }

This disambiguates the use of the postfix operator, only allowing it for ! or !!. It is likely that there should be a companion semantic predicate to disallow these two operators as binary.

kaby76 added a commit to kaby76/grammars-v4 that referenced this issue Nov 3, 2024
kaby76 added a commit to kaby76/grammars-v4 that referenced this issue Nov 3, 2024
teverett pushed a commit that referenced this issue Nov 4, 2024
* Fix for #4306

* Workaround for the '!=-' operator.
@kaby76 kaby76 closed this as completed Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant