Skip to content

Experiences

Ian Kjos edited this page Aug 15, 2019 · 1 revision

I've been using MacroParse for a few different things now. These are some thoughts on how it feels to use:

Certain patterns come up ALL. OF. THE. TIME.

  • Ignore whitespace.
  • Parse numbers.
  • Deal well with delimited lists of nonterminals.

Macros do a fine job with higher-order patterns within a language. But a Swiss-Army-Knife of language processing tools might benefit from a library of pre-defined common linguistic patterns, including the relevant mix-ins to be able to scan/parse them into native Python objects.

Even though mid-pattern actions are supported, I don't tend to use them much.

Rather, I tend towards building an overt abstract-syntax-tree (via the bottom-up nature of LR parsing) and then walking/transducing that tree in various subsequent passes. When this pattern applies, then most of a driver's parse_foo methods just delegate to some or another NamedTuple type.

For the moment, this is workable, but there seems to be a lot of redundant activity in that pattern. What if a DSL defined the structure of the AST nodes (including optional location-tracking slots) with a good interface to grammar production rules? That might tighten things up, so long as there's ALSO a good interface between AST nodes and the implementation language for transforming the AST into the next intermediate form. Generated code might be the answer.

That is more of an idea for version 2.0.

The temptation is strong to jettison the landing gear.

MacroParse is built atop the foundation that MiniParse and MiniScan provide; and MiniScan relies on MiniParse to grok regular expressions. Not that I couldn't write such things by hand, but the complexity is high enough that I wouldn't want to.

In a real sense, MiniParse is sort of the "boot loader" for the rest of the system. You can analyze a production rule into constituent parts with the least amount of magic: just split strings on whitespace, for example.

Bootstrapping MiniScan is actually rather complex: The meta-parser (miniscan.rex) is only about three dozen production rules, but the meta-scanner (miniscan.META) currently takes 90 lines of densely-packed Python code that literally works by hand-constructing the regex ASTs for a scanner definition object and doing the whole regex -> NFA -> DFA -> Profit! dance.

It starts to get slightly weird with MacroParse in the picture. It re-uses most of the MiniScan pattern compiler. For production rules, macros, and directives, it has its own grammar and scanner implemented with MiniParse and MiniScan along with some rather less elegant hacky bits.

Now, MacroParse is powerful enough that a replacement front-end for ALMOST the entire system would be straightforward to write down as a MacroParse document and a new-model driver object. This would probably also mean some nice desirable new features might be easy to incorporate in the process.

The major downside is that you then get a meta-circular definition: you can't build a self-hosting compiler from scratch, which defeats the educational purpose of this project.