diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index f7e9f6da..d2ffd9a3 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -4,9 +4,8 @@ - [`formality_core`: the Formality system](./formality_core.md) - [Defining your lang](./formality_core/lang.md) - [Defining terms with the `term` macro](./formality_core/terms.md) - - [Parsing reference](./formality_core/parse.md) - - [Implementing `Fold` and `Visit` by hand](./formality_core/impl_fold_visit.md) - - [Implementing `Parse` by hand](./formality_core/impl_parse.md) + - [Parsing](./formality_core/parse.md) + - [Customizing debug](./formality_core/debug.md) - [Variables](./formality_core/variables.md) - [Collections](./formality_core/collections.md) - [Judgment functions and inference rules](./formality_core/judgment_fn.md) diff --git a/book/src/formality_core/debug.md b/book/src/formality_core/debug.md new file mode 100644 index 00000000..deffcaa5 --- /dev/null +++ b/book/src/formality_core/debug.md @@ -0,0 +1,14 @@ +# Customizing the debug + +By default, the `#[term]` macro will generate a `Debug` impl that is guided by the `#[grammar]` attributes on your type (see the [parsing](./parse.md) section for more details). But sometimes you want to generate custom logic. You can include a `#[customize(debug)]` declaration to allow that. Most of the type, when you do this, you will also want to [customize parsing](./parse.md#customizing-the-parse), as the `RigidTy` does: + +```rust +{{#include ../../../crates/formality-types/src/grammar/ty.rs:RigidTy_decl}} +``` + +Now you must simply implement `Debug` in the usual way. Here is the `RigidTy` declaration: + + +```rust +{{#include ../../../crates/formality-types/src/grammar/ty/debug_impls.rs:RigidTy_impl}} +``` \ No newline at end of file diff --git a/book/src/formality_core/impl_fold_visit.md b/book/src/formality_core/impl_fold_visit.md deleted file mode 100644 index 5fede039..00000000 --- a/book/src/formality_core/impl_fold_visit.md +++ /dev/null @@ -1,22 +0,0 @@ -# Implementing `Fold` and `Visit` by hand - -The `#[term]` macro auto-generates impls of `Fold`, `Visit`, and `Parse`. -But sometimes you don't want to use the macro. -Sometimes you want to write your own code. -One common reason is for substitution of a variable. -For example, in the Rust model, -the code that replaces type variables with types from a substitution -is defined by a manual impl of `Fold`. -Because `Fold` and `Visit` are trait aliases, you need to implement the underlying -trait (`CoreFold`) by hand. -Here is the custom impl of fold for `Ty` from `formality_types`: - -```rust -{{#include ../../../crates/formality-types/src/grammar/ty/term_impls.rs:core_fold_ty}} -``` - -That same module contains other examples, such as impls of `CoreVisit`. - -## Derives - -You can also manually derive `Visit` and `Fold` instead of using `#[term]`. \ No newline at end of file diff --git a/book/src/formality_core/parse.md b/book/src/formality_core/parse.md index c7928e13..b82f7a5f 100644 --- a/book/src/formality_core/parse.md +++ b/book/src/formality_core/parse.md @@ -19,9 +19,14 @@ struct MyEnum { } ``` -### Ambiguity, +### Ambiguity and precedence -When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parse is deemed ambiguous and an error is reported. If zero succeed, we will report back a parsing error, attempting to localize the problem as best we can. +When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parser will attempt to resolve the ambiguity. Ambiguity can be resolved in two ways: + +* Explicit precedence: By default, every variant has precedence 0, but you can override this by annotating variants with `#[precedence(N)]` (where `N` is some integer). This will override the precedence for that variant. Variants with higher precedences are preferred. +* Reduction prefix: When parsing, we track the list of things we had to parse. If there are two variants at the same precedence level, but one of them had to parse strictly more things than the other and in the same way, we'll prefer the longer one. So for example if one variant parsed a `Ty` and the other parsed a `Ty Ty`, we'd take the `Ty Ty`. + +Otherwise, the parser will panic and report ambiguity. The parser panics rather than returning an error because ambiguity doesn't mean that there is no way to parse the given text as the nonterminal -- rather that there are multiple ways. Errors mean that the text does not match the grammar for that nonterminal. ### Symbols @@ -37,15 +42,12 @@ A grammar consists of a series of *symbols*. Each symbol matches some text in th * Nonterminals can also accept modes: * `$field` -- just parse the field's type * `$*field` -- the field must be a `Vec` -- parse any number of `T` instances. Something like `[ $*field ]` would parse `[f1 f2 f3]`, assuming `f1`, `f2`, and `f3` are valid values for `field`. - * If the next token after `$*field` is a terminal, `$*` uses it as lookahead. The grammar `[ $*field ]`, for example, would stop parsing once a `]` is observed. - * If the `$*field` appears at the end of the grammar or the next symbol is a nonterminal, then the type of `field` must define a grammar that begins with some terminal or else your parsing will likely be ambiguous or infinite. * `$,field` -- similar to the above, but uses a comma separated list (with optional trailing comma). So `[ $,field ]` will parse something like `[f1, f2, f3]`. - * Lookahead is required for `$,field` (this could be changed). * `$?field` -- will parse `field` and use `Default::default()` value if not present. ### Greediness -Parsing is generally greedy. So `$*x` and `$,x`, for example, consume as many entries as they can, subject to lookahead. In a case like `$*f $*g`, the parser would parse as many `f` values as it could before parsing `g`. If some entry can be parsed as either `f` or `g`, the parser will not explore all possibilities. The caveat here is lookahead: given `$*f #`, for example, parsing will stop when `#` is observed, even if `f` could parse `#`. +Parsing is generally greedy. So `$*x` and `$,x`, for example, consume as many entries as they can. Typically this works best if `x` begins with some symbol that indicates whether it is present. ### Default grammar @@ -53,3 +55,20 @@ If no grammar is supplied, the default grammar is determined as follows: * If a `#[cast]` or `#[variable]` annotation is present, then the default grammar is just `$v0`. * Otherwise, the default grammar is the name of the type (for structs) or variant (for enums), followed by `()`, with the values for the fields in order. So `Mul(Expr, Expr)` would have a default grammar `mul($v0, $v1)`. + +### Customizing the parse + +If you prefer, you can customize the parse by annotating your term with `#[customize(parse)]`. In the Rust case, for example, the parsing of `RigidTy` is customized ([as is the debug impl](./debug.md)): + +```rust +{{#include ../../../crates/formality-types/src/grammar/ty.rs:RigidTy_decl}} +``` + +You must then supply an impl of `Parse` yourself. Because `Parse` is a trait alias, you actually want to implement `CoreParse` for your language type `L`. Inside the code you will want to instantiate a `Parser` and then invoke `parse_variant` for every variant, finally invoking `finish`. + +In the Rust code, the impl for `RigidTy` looks as follows: + + +```rust +{{#include ../../../crates/formality-types/src/grammar/ty/parse_impls.rs:RigidTy_impl}} +``` \ No newline at end of file diff --git a/crates/formality-types/src/grammar/ty/debug_impls.rs b/crates/formality-types/src/grammar/ty/debug_impls.rs index 551b11a2..06f6e0ca 100644 --- a/crates/formality-types/src/grammar/ty/debug_impls.rs +++ b/crates/formality-types/src/grammar/ty/debug_impls.rs @@ -1,6 +1,7 @@ use super::{AliasName, AliasTy, AssociatedTyName, Parameter, RefKind, RigidName, RigidTy}; use std::fmt::Debug; +// ANCHOR: RigidTy_impl impl Debug for RigidTy { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { let RigidTy { name, parameters } = self; @@ -42,6 +43,7 @@ impl Debug for RigidTy { } } } +// ANCHOR_END: RigidTy_impl impl Debug for AliasTy { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { diff --git a/crates/formality-types/src/grammar/ty/parse_impls.rs b/crates/formality-types/src/grammar/ty/parse_impls.rs index 26bb8046..07789d1a 100644 --- a/crates/formality-types/src/grammar/ty/parse_impls.rs +++ b/crates/formality-types/src/grammar/ty/parse_impls.rs @@ -12,12 +12,18 @@ use super::{AliasTy, AssociatedTyName, Lt, Parameter, RigidTy, ScalarId, Ty}; use crate::rust::FormalityLang as Rust; +// ANCHOR: RigidTy_impl +// Implement custom parsing for rigid types. impl CoreParse for RigidTy { fn parse<'t>(scope: &Scope, text: &'t str) -> ParseResult<'t, Self> { let mut parser: Parser<'_, '_, RigidTy, Rust> = Parser::new(scope, text, "AliasTy"); + // Parse a `ScalarId` (and upcast it to `RigidTy`) with the highest + // precedence. If someone writes `u8`, we always interpret it as a + // scalar-id. parser.parse_variant_cast::(1); + // Parse something like `Id<...>` as an ADT. parser.parse_variant("Adt", 0, |p| { let name: AdtId = p.nonterminal()?; let parameters: Vec = parse_parameters(p)?; @@ -27,6 +33,7 @@ impl CoreParse for RigidTy { }) }); + // Parse `&` parser.parse_variant("Ref", 0, |p| { p.expect_char('&')?; let lt: Lt = p.nonterminal()?; @@ -64,6 +71,7 @@ impl CoreParse for RigidTy { parser.finish() } } +// ANCHOR_END: RigidTy_impl impl CoreParse for AliasTy { fn parse<'t>(scope: &Scope, text: &'t str) -> ParseResult<'t, Self> {