Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java 20 bad Parser #4331

Open
HanzoDev1375 opened this issue Nov 18, 2024 · 13 comments
Open

Java 20 bad Parser #4331

HanzoDev1375 opened this issue Nov 18, 2024 · 13 comments

Comments

@HanzoDev1375
Copy link

Screenshot_2024-11-18-11-21-47-278_Ninja coder Ghostemane code
Screenshot_2024-11-18-11-21-42-149_Ninja coder Ghostemane code

As you can see in the photo, the Java 20 parser cannot analyze well

my code

ANTLRInputStream input = new ANTLRInputStream(content.toString());
        Java20Lexer lexer = new Java20Lexer(input);
        CommonTokenStream stream = new CommonTokenStream(lexer);
        Java20Parser paser = new Java20Parser(stream);
        Java20ParserBaseListener base =
            new Java20ParserBaseListener() {
              @Override
              public void visitErrorNode(ErrorNode node) {
                super.visitErrorNode(node);
                // TODO: Implement this method
                int line = node.getSymbol().getLine();
                int col = node.getSymbol().getCharPositionInLine();
                int[] errorMatch = Utils.setErrorSpan(result, line, col);
              }

              @Override
              public void enterTypeParameter(Java20Parser.TypeParameterContext ctx) {
                var token = ctx.getStart();
                int line = token.getLine() - 1;
                int colum = token.getCharPositionInLine();
                Utils.setSpanEFO(result, line, colum, EditorColorScheme.javatype);
              }

              /**
               * {@inheritDoc}
               *
               * <p>The default implementation does nothing.
               */
              @Override
              public void exitTypeParameter(Java20Parser.TypeParameterContext ctx) {}
            };

        ParseTreeWalker tree = new ParseTreeWalker();
        tree.walk(base, paser.start_());
      }
@kaby76
Copy link
Contributor

kaby76 commented Nov 18, 2024

Please post the entire input (or inputs) as text, or attach the input in a .txt file (or files). It's best to not post pictures. Thanks.

@HanzoDev1375
Copy link
Author

@kaby76
Copy link
Contributor

kaby76 commented Nov 18, 2024

Input is only 898 lines, takes ~90s to parse, result is success. Yes, this is terrible performance. But, unfortunately expected.

This grammar is a direct implementation of the Java Language Spec 20 grammar in Chapter 19. It is very ambiguous.

Here is the ambiguity uncovered for the input. The tools used are part of the Trash Toolkit.

$ dotnet trperf -c afdr /c/Users/Kenne/Downloads/SettingAppActivity.java.txt | grep -v '^0' | sort -k1 -n
Time to parse: 00:01:29.1786105
1       1       10      classOrInterfaceType
2       2       20      classType
5       5       341     relationalExpression
8       8       84      unannClassOrInterfaceType
21      22      6       referenceType
97      104     82      unannReferenceType
165     199     36      packageName
174     174     272     primaryNoNewArray
223     223     315     methodInvocation

Output:

  • Column 1 is the number of ambiguities counted.
  • Column 2 is the number of fallbacks counted.
  • Column 3 is the NFA state number for the decision.
  • Column 4 is the rule that the NFA state appears in.

The good news, if you say anything good about this, is that the ratio of the number of ambiguities to the number of fallbacks is more or less one-to-one. This means that most of the problem is with ambiguity and not DFA transition conflicts. DFA transition conflicts are generally harder to fix.

methodInvocation seems to be the worst. Here is an example that exhibits the problem.

$ cat /c/Users/Kenne/Downloads/SettingAppActivity.java3.txt
public class SettingAppActivity extends BaseCompat {

    @Override
    protected void onCreate(Bundle _savedInstanceState) {
        super.onCreate(_savedInstanceState);
    }
}
11/18-07:22:02 ~/issues/g4-current/java/java20/Generated-CSharp

Decision state 315 is the problem.

graphviz (18)

Decision 315, the first state after entry, has a choice among a half dozen alts. (I can't tell which alts because trparse --ambig is crashing. kaby76/Trash#507)

@HanzoDev1375
Copy link
Author

Input is only 898 lines, takes ~90s to parse, result is success. Yes, this is terrible performance. But, unfortunately expected.

This grammar is a direct implementation of the Java Language Spec 20 grammar in Chapter 19. It is very ambiguous.

Here is the ambiguity uncovered for the input. The tools used are part of the Trash Toolkit.

$ dotnet trperf -c afdr /c/Users/Kenne/Downloads/SettingAppActivity.java.txt | grep -v '^0' | sort -k1 -n
Time to parse: 00:01:29.1786105
1       1       10      classOrInterfaceType
2       2       20      classType
5       5       341     relationalExpression
8       8       84      unannClassOrInterfaceType
21      22      6       referenceType
97      104     82      unannReferenceType
165     199     36      packageName
174     174     272     primaryNoNewArray
223     223     315     methodInvocation

Output:

  • Column 1 is the number of ambiguities counted.
  • Column 2 is the number of fallbacks counted.
  • Column 3 is the NFA state number for the decision.
  • Column 4 is the rule that the NFA state appears in.

The good news, if you say anything good about this, is that the ratio of the number of ambiguities to the number of fallbacks is more or less one-to-one. This means that most of the problem is with ambiguity and not DFA transition conflicts. DFA transition conflicts are generally harder to fix.

methodInvocation seems to be the worst. Here is an example that exhibits the problem.

$ cat /c/Users/Kenne/Downloads/SettingAppActivity.java3.txt
public class SettingAppActivity extends BaseCompat {

    @Override
    protected void onCreate(Bundle _savedInstanceState) {
        super.onCreate(_savedInstanceState);
    }
}
11/18-07:22:02 ~/issues/g4-current/java/java20/Generated-CSharp

Decision state 315 is the problem.

graphviz (18)

Decision 315, the first state after entry, has a choice among a half dozen alts. (I can't tell which alts because trparse --ambig is crashing. kaby76/Trash#507)

So this is a problem of grammar?

@kaby76
Copy link
Contributor

kaby76 commented Nov 18, 2024

So this is a problem of grammar?

Yes. It's a grammar problem, not an "Antlr problem".

Many of the grammars in this repo can be slow because of ambiguity. This usually happens because someone derives the grammar from another grammar, then tries to use that with Antlr. Much of the time, the grammar requires a symbol table to disambiguate.

For better or worse, Antlr will accept an ambiguous grammar, and generate a parser for it. But, just as one can write atrocious code in Java, C#, JavaScript, etc., one can do with Antlr. People then say "Antlr is terrible," but it's usually the grammar that is the problem.

The solution is to eliminate ambiguity in the grammar.

@HanzoDev1375
Copy link
Author

So this is a problem of grammar?

Yes. It's a grammar problem, not an "Antlr problem".

Many of the grammars in this repo can be slow because of ambiguity. This usually happens because someone derives the grammar from another grammar, then tries to use that with Antlr. Much of the time, the grammar requires a symbol table to disambiguate.

For better or worse, Antlr will accept an ambiguous grammar, and generate a parser for it. But, just as one can write atrocious code in Java, C#, JavaScript, etc., one can do with Antlr. People then say "Antlr is terrible," but it's usually the grammar that is the problem.

The solution is to eliminate ambiguity in the grammar.

Is there a solution for treatment?

@kaby76
Copy link
Contributor

kaby76 commented Nov 21, 2024

Is there a solution for treatment?

Yes, the grammar should be fixed. I am working my way through the grammars and cleaning up ambiguity and fallbacks. I can address this grammar after postgresql, mysql/Oracle, then java/java20, likely a couple of weeks from now.

@HanzoDev1375
Copy link
Author

Is there a solution for treatment?

Yes, the grammar should be fixed. I am working my way through the grammars and cleaning up ambiguity and fallbacks. I can address this grammar after postgresql, mysql/Oracle, then java/java20, likely a couple of weeks from now.

oh tanks sir🥰

@HanzoDev1375
Copy link
Author

@kaby76 I have a question in my mind, is it possible to make a code formatter with lexer and parser?

@kaby76
Copy link
Contributor

kaby76 commented Nov 22, 2024

Yes, it can.

I ported Codebuff to C# but there is the original Java version. See https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=codebuff&btnG=

Also check out https://github.com/antlr/codebuff

Basically you need to provide a sampling of files that are formatted by hand, the grammar, and some input you want to format, and it outputs the formatted input file.

@HanzoDev1375
Copy link
Author

Yes, it can.

I ported Codebuff to C# but there is the original Java version. See https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=codebuff&btnG=

Also check out https://github.com/antlr/codebuff

Basically you need to provide a sampling of files that are formatted by hand, the grammar, and some input you want to format, and it outputs the formatted input file.

@kaby76 I saw that this source is very old and strangely, I couldn't understand it. Is there a simpler source that can provide lexer and parser to easily format my own Java codes?

@kaby76
Copy link
Contributor

kaby76 commented Nov 22, 2024

There is nothing newer written in Java. I ported it to C# but that was several years ago. The papers on Codebuff are a good intro to the code.

@HanzoDev1375
Copy link
Author

There is nothing newer written in Java. I ported it to C# but that was several years ago. The papers on Codebuff are a good intro to the code.

Can you port a normal version for me because there are a lot of Java files and most of them are for Swing, I tried to port it on Android but it didn't work, if you can help me to port it on Android, your experience is much higher😅🌹

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants