[postgresql] Fix for #4291 -- non-idiomatic usage of "(foo | )" instead of "foo?" #4299
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a change to fix one of the types of problems with the postgresql grammar.
As noted in #4291, the grammar contains numerous non-idiomatic rule definitions. In particular, the grammar contains rules with empty alternatives, appearing to be derived from a grammar for another parser generator. Indeed, that conclusion is correct: this grammar is derived from the Bison grammar at https://github.com/postgres/postgres/blob/9be4e5d293b554d8a0800790c57fc707a3b5cf0f/src/backend/parser/gram.y/. Bison/yacc does not support an optional operator EBNF. But, Antlr does.
This change refactors empty production rules to non-empty rules and adds the
?
-operator to the applied occurrences of the parser symbol. The names of these parser rules are changed to not have the "opt_" prefix, as they don't derive empty (generally) anymore.The scripts to make these changes have been added to the sub-directory
fixups/
.detect.sh
finds all productions that have an empty top-level alt.fix.sh
refactors these productions by removing the empty alt, and adding the?
-operator to the applied occurrences.rename.sh
renames rules that are not nullable.These changes do not affect the performance.
There are two new warnings from the Antlr Tool (
rule stmt_case contains an optional block with at least one alternative that can match an empty string
andrule stmt_return contains an optional block with at least one alternative that can match an empty string
). These do not affect the performance or correctness of the parser. These will be delt with in a later fix to correct other warnings (non-fragment lexer rule AfterEscapeStringConstantMode_NotContinued can match the empty string
andnon-fragment lexer rule AfterEscapeStringConstantWithNewlineMode_NotContinued can match the empty string
). In addition, the ambiguity in the grammar results in terrible performance, which also needs to be corrected.I've updated the readme.md to note the actual grammar source and issues.