Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: downgrade MLIR generic parser from syntax categories to closed syntax #364

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

alexkeizer
Copy link
Collaborator

@alexkeizer alexkeizer commented May 31, 2024

Currently almost all parts of the generic MLIR parser uses syntax categories, meaning these parts of the (generic!) grammar are in theory extensible.

In theory this can be very convenient: if a dialect wanted to customize only their operands it could choose to hook into just that part of the parser. In practice, I would expect that even the low level elements (such as mlir_op_operand) being extensible is an anti-pattern:

  • Such extensions would not be scoped to just that dialect, it would affect parsing of all generic MLIR syntax
  • The parser doesn't really seem to be written for such extensions anyway. I tried to come up with an example of extending just the operands, but the most obvious ways that came to mind didn't actually work (macro "&" n:num : mlir_op_operand => `(mlir_op_operand| % $n:num)).
  • Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.

Thus, I propose we keep mlir_op (and potentially mlir_type) as the top-level syntax category that can be hooked into, but make low-level elements of the grammar closed. This PR is still a draft, but it show-cases what this change would look like for the case of mlir_op_operand.

This matches Lean itself, where term and command are extensible, but things like def are closed syntax.
Also note that this is exactly the level at which the current LLVM pretty syntax operates: that only extends mlir_op and would still work as-is after this refactor.

@alexkeizer
Copy link
Collaborator Author

@goens @bollu @tobiasgrosser I'm curious what you think about this refactor?

Copy link

Alive Statistics: 54 / 93 (39 failed)

@tobiasgrosser
Copy link
Collaborator

I think this is a great idea.

@bollu
Copy link
Collaborator

bollu commented May 31, 2024

@alexkeizer thanks for thinking about this, I'm generally happy with this change in terms of API surface area.

I don't quite understand:

Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.

Could you elaborate on this? Is it that each syntax category is a construction for our meta-level-AST, and having many constructors makes this unmanagable?

Copy link

Alive Statistics: 54 / 93 (39 failed)

@alexkeizer
Copy link
Collaborator Author

alexkeizer commented May 31, 2024

@alexkeizer thanks for thinking about this, I'm generally happy with this change in terms of API surface area.

I don't quite understand:

Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.

Could you elaborate on this? Is it that each syntax category is a construction for our meta-level-AST, and having many constructors makes this unmanagable?

So what I'm thinking about is having a function like TSyntax `mlir_op_operand -> MetaM SSAVal . 1 If mlir_op_operand is closed, we can just match on exactly the syntax we define, and nothing else is possible. If it's a category, though, we're promising the syntax is extensible, we would then have to somehow hook into macro expansion to make that promise actually work.
Of course, we could just say that, even though it's a category, it's not actually intended to be extensible and throw some kind of error if we encounter unexpected syntax, but I'd rather codify this.

Footnotes

  1. With the caveat that SSAVal might have to become some MetaSSAVal that can possibly store a Lean expression for something, to support antiquotations

Apparently closed `syntax` is namespaced, while syntax categories live in the root namespace
Copy link

Alive Statistics: 54 / 93 (39 failed)

Copy link

Alive Statistics: 54 / 93 (39 failed)

1 similar comment
Copy link

Alive Statistics: 54 / 93 (39 failed)

Apparently the closed syntax breaks `opt`
@goens
Copy link
Collaborator

goens commented Jun 1, 2024

oh that's pretty cool! I didn't know about this difference (between closed syntax and extensible syntax categories).

I think it makes a lot of sense, thanks for the very thorough explanation @alexkeizer!

About the meta level ast, I also don't quite understand the point. Even if we choose to do things at elaboration time, what's the drawback of using our (current) AST data structure?

Or do you mean just that we can do (full) pattern matching on the closed syntax categories, so it's about the TSyntax mlirsoandso types? When talking about these self-made compilers from custom syntax, Mac once asked us why we wanted our own data structure and not just used TSyntax. Maybe using closed syntax is what we were missing for that, if I'm understanding you correctly

@alexkeizer
Copy link
Collaborator Author

About the meta level ast, I also don't quite understand the point. Even if we choose to do things at elaboration time, what's the drawback of using our (current) AST data structure?

So the main point is to correctly communicate what is and isn't supported.

With a closed syntax we know exactly what forms of syntax to expect when we pattern-match on the TSyntax. I don't think the exhaustive check knows about this, so it definitely still makes sense to parse the TSyntax into our AST datastructure, but this parsing would have to do a pattern match on the syntax to do so, and presumably throw an error when it sees syntax that doesn't match the option we've given (which, if we wrote the match correctly, should never happen, because we control what syntax gets parsed).

An option that totally works, is to have a syntax category and just only match on the syntax we've defined. Then, when the category is extended, and this new syntax is actually used, we just throw the error that says "well, this syntax category is not actually meant to be extended". This option would be strictly easiest to implement, but it's unsatisfying IMO.

If we have syntax categories, and would like to properly support extensions, then in the function that parses the TSyntax to produce our AST we'd have to hook into the macro expansion system before pattern matching. This seems excessively complicated.


A slight road bump: apparently the machinery in the opt executable that parses things at runtime works specifically with a category parser. I'm not sure how to change that to work with a ParserDescr (as obtained from the closed syntax). There's compileParserDescr, but this runs in ImportM. I might keep mlir_region as a category for now to avoid dealing with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants