Merge KDL v2 (#286)

Co-authored-by: Danielle Smith <[email protected]> Co-authored-by: Basile Henry <[email protected]> Co-authored-by: Bram Gotink <[email protected]> Co-authored-by: Nathan West <[email protected]> Co-authored-by: Hannah Kolbeck <[email protected]> Co-authored-by: Lars Willighagen <[email protected]> Co-authored-by: Tab Atkins-Bittner <[email protected]> Co-authored-by: Christopher Durham <[email protected]> Co-authored-by: Corey Powell <[email protected]> Co-authored-by: wackbyte <[email protected]> Co-authored-by: Bannerets <[email protected]> Co-authored-by: Romain Delamare <[email protected]> Co-authored-by: Thomas Jollans <[email protected]>
kdl-org · Nov 29, 2024 · c8632b7 · c8632b7
1 parent d8d583a
commit c8632b7
Show file tree

Hide file tree

Showing 268 changed files with 1,365 additions and 690 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,95 @@
+# KDL Changelog
+
+## 2.0.0-draft.5 (2024-11-28)
+
+* Equals signs other than `=` are no longer supported in properties.
+* 128-bit integer type annotations have been added to the list of "well-known"
+  type annotations.
+* Multiline string escape rules have been tweaked significantly.
+* `\s` is now a valid escape within a string, representing a space character.
+* Slashdash (`/-`)-compatible locations and related grammar adjusted to be more
+  clear and intuitive. This includes some changes relating to whitespace,
+  including comments and newlines, which are breaking changes.
+* Various updates to test suite to reflect changes.
+
+## 2.0.0 (Unreleased)
+
+### Grammar
+
+* Solidus/Forward slash (`/`) is no longer an escaped character.
+* Space (`U+0020`) can now be written into quoted strings with the `\s`
+  escape.
+* Single line comments (`//`) can now be immediately followed by a newline.
+* All literal whitespace following a `\` in a string is now discarded.
+* Vertical tabs (`U+000B`) are now considered to be whitespace.
+* The grammar syntax itself has been described, and some confusing definitions
+  in the grammar have been fixed accordingly (mostly related to escaped
+  characters).
+* `,`, `<`, and `>` are now legal identifier characters. They were previously
+  reserved for KQL but this is no longer necessary.
+* Code points under `0x20` (except newline and whitespace code points), code
+  points above `0x10FFFF`, Delete control character (`0x7F`), and the [unicode
+  "direction control"
+  characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
+  are now completely banned from appearing literally in KDL documents. They
+  can now only be represented in regular strings, and there's no facilities to
+  represent them in raw strings. This should be considered a security
+  improvement.
+* Raw strings no longer require an `r` prefix: they are now specified by using
+  `#""#`.
+* Line continuations can be followed by an EOF now, instead of requiring a
+  newline (or comment). `node \<EOF>` is now a legal KDL document.
+* `#` is no longer a legal identifier character.
+* `null`, `true`, and `false` are now `#null`, `#true`, and `#false`. Using
+  the unprefixed versions of these values is a syntax error.
+* The spec prose has more explicitly stated that whitespace and newlines are
+  not valid identifier characters, even though the grammar already expressed
+  this.
+* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
+* The spec prose now more explicitly states that strings and raw strings can
+  be used as type annotations.
+* Removed a statement in the spec prose that said "It is reasonable for an
+  implementation to ignore null values altogether when deserializing". This is
+  no longer encouraged or desired.
+* Code points have been constrained to [Unicode Scalar
+  Values](https://unicode.org/glossary/#unicode_scalar_value) only, including
+  values used in string escapes (`\u{}`). All KDL documents and string values
+  should be valid UTF-8 now, as was intended.
+* The last node in a child block no longer needs to be terminated with `;`,
+  even if the closing `}` is on the same line, so this is now a legal node:
+  `node {foo;bar;baz}`
+* More places allow whitespace (node-spaces, specifically) now. With great
+  power comes great responsibility:
+  * Inside `(foo)` annotations (so, `( foo )` would be legal (`( f oo )` would
+    not be, since it has two identifiers))
+  * Between annotations and the thing they're annotating (`(blah) node (thing)
+    1 y= (who) 2`)
+  * Around `=` for props (`x = 1`)
+* The BOM is now only allowed as the first character in a document. It was
+  previously treated as generic whitespace.
+* Multi-line strings are now automatically dedented, according to the common
+  whitespace matching the whitespace prefix of the closing line. Multiline
+  strings and raw strings now must have a newline immediately following their
+  opening `"`, and a final newline plus whitespace preceding the closing `"`.
+* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
+  conflicts with numbers.
+* Multi-line strings' literal Newline sequences are now normalized to single
+  `LF`s.
+* `#inf`, `#-inf`, and `#nan` have been added in order to properly support
+  IEEE floats for implementations that choose to represent their decimals that
+  way.
+* Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax
+  errors.
+* `u128` and `i128` have been added as well-known number type annotations.
+* Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive.
+
+### KQL
+
+* There's now a _required_ descendant selector (`>>`), instead of using plain
+  spaces for that purpose.
+* The "any sibling" selector is now `++` instead of `~`, for consistency with
+  the new descendant selector.
+* Some parsing logic around the grammar has changed.
+* Multi- and single-line comments are now supported, as well as line
+  continuations with `\`.
+* Map operators have been removed entirely.
diff --git a/JSON-IN-KDL.md b/JSON-IN-KDL.md
@@ -3,15 +3,15 @@ JSON-in-KDL (JiK)
 
 This specification describes a canonical way to losslessly encode [JSON](https://json.org) in [KDL](https://kdl.dev). While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with a JSON-consuming or -emitting service.
 
-This is version 3.0.1 of JiK.
+This is version 4.0.0 of JiK.
 
 JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of named nodes that represent objects, arrays, or literal values.
 
 ----
 
-JSON literals are, luckily, a subset of KDL's literals. There are two ways to write a JSON literal into JiK:
+There are two ways to write a JSON literal into JiK:
 
-* As a node with any nodename and a single argument, like `- true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
+* As a node with any nodename and a single argument, like `- #true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
 * When nested in arrays or objects, literals can usually be written as arguments (for array nodes) or properties (for object nodes). See below for details.
 
 ----
@@ -25,7 +25,7 @@ Children can encode literals and/or nested arrays and objects. For example, the
 ```kdl
 - {
 	- 1
-	- true false
+	- #true #false
 	- 3
 }
 ```
@@ -36,7 +36,7 @@ Arguments and children can be mixed, if desired. The preceding example could als
 
 ```kdl
 - 1 {
-	- true false
+	- #true #false
 	- 3
 }
 ```
@@ -54,10 +54,11 @@ The `(array)` type annotation can be used on any other valid array node if desir
 
 JSON objects are represented in JiK as a node with any nodename, with zero or more properties and/or zero or more children with any nodenames.
 
-Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=true`.
+Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=#true`.
 
 Children can encode literals and/or nested arrays and objects,
 using the nodename for the item's key. 
+
 For example, the JSON `{"foo": 1, "bar": [2, {"baz": 3}], "qux":4}` can be written in JiK as:
 
 ```kdl

diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md
@@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS
 selectors for familiarity and ease of use. Think of it as CSS Selectors or
 XPath, but for KDL!
 
-This document describes KQL `1.0.0`. It was released on September 11, 2021.
+This document describes KQL `next`. It is unreleased.
 
 ## Selectors
 
 Selectors use selection operators to filter nodes that will be returned by an
 API using KQL. The main differences between this and CSS selectors are the
-lack of `*` (use `[]` instead), and the specific syntax for
+lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for
 [matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.
 
 * `a > b`: Selects any `b` element that is a direct child of an `a` element.
-* `a b`: Selects any `b` element that is a _descendant_ of an `a` element.
-* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
+* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element.
+* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
 * `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
-* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
+* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
 * `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
 * `a[accessor()]`: Selects any `a` element, filtered by an accessor.
 * `[]`: Selects any element.
@@ -30,6 +30,11 @@ properties, node names, etc). With the exception of `top()` and `()`, they are a
 used inside a `[]` selector. Some matchers are unary, but most of them involve
 binary operators.
 
+The `top()` matcher can only be used as the first matcher of a selector. This means
+that it cannot be the right operand of the `>`, `>>`, `+`, or `++` operators. As `||`
+combines selectors, the `top()` can appear just after it. For instance,
+ `a > b || top() > b` is valid, but `a > top()` is not.
+
 * `top()`: Returns all toplevel children of the current document.
 * `top() > []`: Equivalent to `top()` on its own.
 * `(foo)`: Selects any element whose type annotation is `foo`.
@@ -44,8 +49,8 @@ Attribute matchers support certain binary operators:
 * `[val() = 1]`: Selects any element whose first value is 1.
 * `[prop(name) = 1]`: Selects any element with a property `name` whose value is 1.
 * `[name = 1]`: Equivalent to the above.
-* `[name() = "hi"]`: Selects any element whose _node name_ is `"hi"`. Equivalent to just `hi`, but more useful when using string operators.
-* `[tag() = "hi"]`: Selects any element whose type annotation is `"hi"`. Equivalent to just `(hi)`, but more useful when using string operators.
+* `[name() = hi]`: Selects any element whose _node name_ is "hi". Equivalent to just `hi`, but more useful when using string operators.
+* `[tag() = hi]`: Selects any element whose tag is "hi". Equivalent to just `(hi)`, but more useful when using string operators.
 * `[val() != 1]`: Selects any element whose first value exists, and is not 1.
 
 The following operators work with any `val()` or `prop()` values.
@@ -60,64 +65,37 @@ never coerced to 1, and there is no "universal" ordering across all types.):
 The following operators work only with string `val()`, `prop()`, `tag()`, or `name()` values.
 If the value is not a string, the matcher will always fail:
 
-* `[val() ^= "foo"]`: Selects any element whose first value starts with "foo".
-* `[val() $= "foo"]`: Selects any element whose first value ends with "foo".
-* `[val() *= "foo"]`: Selects any element whose first value contains "foo".
+* `[val() ^= foo]`: Selects any element whose first value starts with "foo".
+* `[val() $= foo]`: Selects any element whose first value ends with "foo".
+* `[val() *= foo]`: Selects any element whose first value contains "foo".
 
 The following operators work only with `val()` or `prop()` values. If the value
 is not one of those, the matcher will always fail:
 
 * `[val() = (foo)]`: Selects any element whose type annotation is `foo`.
 
-## Map Operator
-
-KQL implementations MAY support a "map operator", `=>`, that allows selection
-of specific parts of the selected notes, essentially "mapping" over a
-selector's result set.
-
-Only a single map operator may be used, and it must be the last element in a
-selector string.
-
-The map operator's right hand side is either an [`accessor`](#accessors) on
-its own, or a tuple of accessors, denoted by a comma-separated list wrapped in
-`()` (for example, `(a, b, c)`).
-
-## Accessors
-
-Accessors access/extract specific parts of a node. They are used with the [map
-operator](#map-operator), and have syntactic overlap with some
-[matchers](#matchers).
-
-* `name()`: Returns the name of the node itself.
-* `val(2)`: Returns the third value in a node.
-* `val()`: Equivalent to `val(0)`.
-* `prop(foo)`: Returns the value of the property `foo` in the node.
-* `foo`: Equivalent to `prop(foo)`.
-* `props()`: Returns all properties of the node as an object.
-* `values()`: Returns all values of the node as an array.
-
 ## Examples
 
 Given this document:
 
 ```kdl
 package {
-    name "foo"
+    name foo
     version "1.0.0"
-    dependencies platform="windows" {
+    dependencies platform=windows {
         winapi "1.0.0" path="./crates/my-winapi-fork"
     }
     dependencies {
-        miette "2.0.0" dev=true
+        miette "2.0.0" dev=#true integrity=(sri)sha512-deadbeef
     }
 }
 ```
 
 Then the following queries are valid:
 
-* `package name`
+* `package >> name`
     * -> fetches the `name` node itself
-* `top() > package name`
+* `top() > package >> name`
     * -> fetches the `name` node, guaranteeing that `package` is in the document root.
 * `dependencies`
     * -> deep-fetches both `dependencies` nodes
@@ -129,14 +107,25 @@ Then the following queries are valid:
     * -> fetches all direct-child nodes of any `dependencies` nodes in the
          document. In this case, it will fetch both `miette` and `winapi` nodes.
 
-If using an API that supports the [map operator](#map-operator), the following
-are valid queries:
-
-* `package name => val()`
-    * -> `["foo"]`.
-* `dependencies[platform] => platform`
-    * -> `["windows"]`
-* `dependencies > [] => (name(), val(), path)`
-    * -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]`
-* `dependencies > [] => (name(), values(), props())`
-    * -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]`
+## Full Grammar
+
+Rules that are not defined in this grammar are prefixed with `$`, see [the KDL
+grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar) for
+what they expand to.
+
+```
+query-str := $bom? query
+query := selector q-ws* "||" q-ws* query | selector
+selector := filter q-ws* selector-operator q-ws* selector-subsequent | filter
+selector-subsequent := matchers q-ws* selector-operator q-ws* selector-subsequent | matchers
+selector-operator := ">>" | ">" | "++" | "+"
+filter := "top(" q-ws* ")" | matchers
+matchers := type-matcher $string? accessor-matcher* | $string accessor-matcher* | accessor-matcher+
+type-matcher := "(" q-ws* ")" | $type
+accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]"
+comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword)
+accessor := "val(" q-ws* $integer q-ws* ")" | "prop(" q-ws* $string q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | $string
+matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*="
+
+q-ws := $plain-node-space
+```