Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgresql] Fix for #4291 -- non-idiomatic usage of "(foo | )" instead of "foo?" #4299

Merged
merged 2 commits into from
Nov 4, 2024

Conversation

kaby76
Copy link
Contributor

@kaby76 kaby76 commented Nov 1, 2024

This is a change to fix one of the types of problems with the postgresql grammar.

As noted in #4291, the grammar contains numerous non-idiomatic rule definitions. In particular, the grammar contains rules with empty alternatives, appearing to be derived from a grammar for another parser generator. Indeed, that conclusion is correct: this grammar is derived from the Bison grammar at https://github.com/postgres/postgres/blob/9be4e5d293b554d8a0800790c57fc707a3b5cf0f/src/backend/parser/gram.y/. Bison/yacc does not support an optional operator EBNF. But, Antlr does.

This change refactors empty production rules to non-empty rules and adds the ?-operator to the applied occurrences of the parser symbol. The names of these parser rules are changed to not have the "opt_" prefix, as they don't derive empty (generally) anymore.

The scripts to make these changes have been added to the sub-directory fixups/. detect.sh finds all productions that have an empty top-level alt. fix.sh refactors these productions by removing the empty alt, and adding the ?-operator to the applied occurrences. rename.sh renames rules that are not nullable.

These changes do not affect the performance.

There are two new warnings from the Antlr Tool ( rule stmt_case contains an optional block with at least one alternative that can match an empty string and rule stmt_return contains an optional block with at least one alternative that can match an empty string). These do not affect the performance or correctness of the parser. These will be delt with in a later fix to correct other warnings (non-fragment lexer rule AfterEscapeStringConstantMode_NotContinued can match the empty string and non-fragment lexer rule AfterEscapeStringConstantWithNewlineMode_NotContinued can match the empty string). In addition, the ambiguity in the grammar results in terrible performance, which also needs to be corrected.

I've updated the readme.md to note the actual grammar source and issues.

@kaby76 kaby76 marked this pull request as ready for review November 1, 2024 08:34
@kaby76 kaby76 marked this pull request as draft November 1, 2024 08:37
@kaby76 kaby76 marked this pull request as ready for review November 1, 2024 08:42
@kaby76 kaby76 changed the title Fix for #4291 [postgresql] Fix for #4291 Nov 1, 2024
@kaby76 kaby76 changed the title [postgresql] Fix for #4291 [postgresql] Fix for #4291 -- non-idiomatic usage of "(foo | )" instead of "foo?" Nov 4, 2024
@teverett
Copy link
Member

teverett commented Nov 4, 2024

@kaby76 thanks!

@teverett teverett merged commit 04721ce into antlr:master Nov 4, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants