Skip to content

Commit

Permalink
Merge KDL v2 (#286)
Browse files Browse the repository at this point in the history
Co-authored-by: Danielle Smith <[email protected]>
Co-authored-by: Basile Henry <[email protected]>
Co-authored-by: Bram Gotink <[email protected]>
Co-authored-by: Nathan West <[email protected]>
Co-authored-by: Hannah Kolbeck <[email protected]>
Co-authored-by: Lars Willighagen <[email protected]>
Co-authored-by: Tab Atkins-Bittner <[email protected]>
Co-authored-by: Christopher Durham <[email protected]>
Co-authored-by: Corey Powell <[email protected]>
Co-authored-by: wackbyte <[email protected]>
Co-authored-by: Bannerets <[email protected]>
Co-authored-by: Romain Delamare <[email protected]>
Co-authored-by: Thomas Jollans <[email protected]>
  • Loading branch information
14 people authored Nov 29, 2024
1 parent d8d583a commit c8632b7
Show file tree
Hide file tree
Showing 268 changed files with 1,365 additions and 690 deletions.
95 changes: 95 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# KDL Changelog

## 2.0.0-draft.5 (2024-11-28)

* Equals signs other than `=` are no longer supported in properties.
* 128-bit integer type annotations have been added to the list of "well-known"
type annotations.
* Multiline string escape rules have been tweaked significantly.
* `\s` is now a valid escape within a string, representing a space character.
* Slashdash (`/-`)-compatible locations and related grammar adjusted to be more
clear and intuitive. This includes some changes relating to whitespace,
including comments and newlines, which are breaking changes.
* Various updates to test suite to reflect changes.

## 2.0.0 (Unreleased)

### Grammar

* Solidus/Forward slash (`/`) is no longer an escaped character.
* Space (`U+0020`) can now be written into quoted strings with the `\s`
escape.
* Single line comments (`//`) can now be immediately followed by a newline.
* All literal whitespace following a `\` in a string is now discarded.
* Vertical tabs (`U+000B`) are now considered to be whitespace.
* The grammar syntax itself has been described, and some confusing definitions
in the grammar have been fixed accordingly (mostly related to escaped
characters).
* `,`, `<`, and `>` are now legal identifier characters. They were previously
reserved for KQL but this is no longer necessary.
* Code points under `0x20` (except newline and whitespace code points), code
points above `0x10FFFF`, Delete control character (`0x7F`), and the [unicode
"direction control"
characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
are now completely banned from appearing literally in KDL documents. They
can now only be represented in regular strings, and there's no facilities to
represent them in raw strings. This should be considered a security
improvement.
* Raw strings no longer require an `r` prefix: they are now specified by using
`#""#`.
* Line continuations can be followed by an EOF now, instead of requiring a
newline (or comment). `node \<EOF>` is now a legal KDL document.
* `#` is no longer a legal identifier character.
* `null`, `true`, and `false` are now `#null`, `#true`, and `#false`. Using
the unprefixed versions of these values is a syntax error.
* The spec prose has more explicitly stated that whitespace and newlines are
not valid identifier characters, even though the grammar already expressed
this.
* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
* The spec prose now more explicitly states that strings and raw strings can
be used as type annotations.
* Removed a statement in the spec prose that said "It is reasonable for an
implementation to ignore null values altogether when deserializing". This is
no longer encouraged or desired.
* Code points have been constrained to [Unicode Scalar
Values](https://unicode.org/glossary/#unicode_scalar_value) only, including
values used in string escapes (`\u{}`). All KDL documents and string values
should be valid UTF-8 now, as was intended.
* The last node in a child block no longer needs to be terminated with `;`,
even if the closing `}` is on the same line, so this is now a legal node:
`node {foo;bar;baz}`
* More places allow whitespace (node-spaces, specifically) now. With great
power comes great responsibility:
* Inside `(foo)` annotations (so, `( foo )` would be legal (`( f oo )` would
not be, since it has two identifiers))
* Between annotations and the thing they're annotating (`(blah) node (thing)
1 y= (who) 2`)
* Around `=` for props (`x = 1`)
* The BOM is now only allowed as the first character in a document. It was
previously treated as generic whitespace.
* Multi-line strings are now automatically dedented, according to the common
whitespace matching the whitespace prefix of the closing line. Multiline
strings and raw strings now must have a newline immediately following their
opening `"`, and a final newline plus whitespace preceding the closing `"`.
* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
conflicts with numbers.
* Multi-line strings' literal Newline sequences are now normalized to single
`LF`s.
* `#inf`, `#-inf`, and `#nan` have been added in order to properly support
IEEE floats for implementations that choose to represent their decimals that
way.
* Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax
errors.
* `u128` and `i128` have been added as well-known number type annotations.
* Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive.

### KQL

* There's now a _required_ descendant selector (`>>`), instead of using plain
spaces for that purpose.
* The "any sibling" selector is now `++` instead of `~`, for consistency with
the new descendant selector.
* Some parsing logic around the grammar has changed.
* Multi- and single-line comments are now supported, as well as line
continuations with `\`.
* Map operators have been removed entirely.
13 changes: 7 additions & 6 deletions JSON-IN-KDL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ JSON-in-KDL (JiK)

This specification describes a canonical way to losslessly encode [JSON](https://json.org) in [KDL](https://kdl.dev). While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with a JSON-consuming or -emitting service.

This is version 3.0.1 of JiK.
This is version 4.0.0 of JiK.

JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of named nodes that represent objects, arrays, or literal values.

----

JSON literals are, luckily, a subset of KDL's literals. There are two ways to write a JSON literal into JiK:
There are two ways to write a JSON literal into JiK:

* As a node with any nodename and a single argument, like `- true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
* As a node with any nodename and a single argument, like `- #true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
* When nested in arrays or objects, literals can usually be written as arguments (for array nodes) or properties (for object nodes). See below for details.

----
Expand All @@ -25,7 +25,7 @@ Children can encode literals and/or nested arrays and objects. For example, the
```kdl
- {
- 1
- true false
- #true #false
- 3
}
```
Expand All @@ -36,7 +36,7 @@ Arguments and children can be mixed, if desired. The preceding example could als

```kdl
- 1 {
- true false
- #true #false
- 3
}
```
Expand All @@ -54,10 +54,11 @@ The `(array)` type annotation can be used on any other valid array node if desir

JSON objects are represented in JiK as a node with any nodename, with zero or more properties and/or zero or more children with any nodenames.

Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=true`.
Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=#true`.

Children can encode literals and/or nested arrays and objects,
using the nodename for the item's key.

For example, the JSON `{"foo": 1, "bar": [2, {"baz": 3}], "qux":4}` can be written in JiK as:

```kdl
Expand Down
95 changes: 42 additions & 53 deletions QUERY-SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS
selectors for familiarity and ease of use. Think of it as CSS Selectors or
XPath, but for KDL!

This document describes KQL `1.0.0`. It was released on September 11, 2021.
This document describes KQL `next`. It is unreleased.

## Selectors

Selectors use selection operators to filter nodes that will be returned by an
API using KQL. The main differences between this and CSS selectors are the
lack of `*` (use `[]` instead), and the specific syntax for
lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for
[matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.

* `a > b`: Selects any `b` element that is a direct child of an `a` element.
* `a b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
* `a[accessor()]`: Selects any `a` element, filtered by an accessor.
* `[]`: Selects any element.
Expand All @@ -30,6 +30,11 @@ properties, node names, etc). With the exception of `top()` and `()`, they are a
used inside a `[]` selector. Some matchers are unary, but most of them involve
binary operators.

The `top()` matcher can only be used as the first matcher of a selector. This means
that it cannot be the right operand of the `>`, `>>`, `+`, or `++` operators. As `||`
combines selectors, the `top()` can appear just after it. For instance,
`a > b || top() > b` is valid, but `a > top()` is not.

* `top()`: Returns all toplevel children of the current document.
* `top() > []`: Equivalent to `top()` on its own.
* `(foo)`: Selects any element whose type annotation is `foo`.
Expand All @@ -44,8 +49,8 @@ Attribute matchers support certain binary operators:
* `[val() = 1]`: Selects any element whose first value is 1.
* `[prop(name) = 1]`: Selects any element with a property `name` whose value is 1.
* `[name = 1]`: Equivalent to the above.
* `[name() = "hi"]`: Selects any element whose _node name_ is `"hi"`. Equivalent to just `hi`, but more useful when using string operators.
* `[tag() = "hi"]`: Selects any element whose type annotation is `"hi"`. Equivalent to just `(hi)`, but more useful when using string operators.
* `[name() = hi]`: Selects any element whose _node name_ is "hi". Equivalent to just `hi`, but more useful when using string operators.
* `[tag() = hi]`: Selects any element whose tag is "hi". Equivalent to just `(hi)`, but more useful when using string operators.
* `[val() != 1]`: Selects any element whose first value exists, and is not 1.

The following operators work with any `val()` or `prop()` values.
Expand All @@ -60,64 +65,37 @@ never coerced to 1, and there is no "universal" ordering across all types.):
The following operators work only with string `val()`, `prop()`, `tag()`, or `name()` values.
If the value is not a string, the matcher will always fail:

* `[val() ^= "foo"]`: Selects any element whose first value starts with "foo".
* `[val() $= "foo"]`: Selects any element whose first value ends with "foo".
* `[val() *= "foo"]`: Selects any element whose first value contains "foo".
* `[val() ^= foo]`: Selects any element whose first value starts with "foo".
* `[val() $= foo]`: Selects any element whose first value ends with "foo".
* `[val() *= foo]`: Selects any element whose first value contains "foo".

The following operators work only with `val()` or `prop()` values. If the value
is not one of those, the matcher will always fail:

* `[val() = (foo)]`: Selects any element whose type annotation is `foo`.

## Map Operator

KQL implementations MAY support a "map operator", `=>`, that allows selection
of specific parts of the selected notes, essentially "mapping" over a
selector's result set.

Only a single map operator may be used, and it must be the last element in a
selector string.

The map operator's right hand side is either an [`accessor`](#accessors) on
its own, or a tuple of accessors, denoted by a comma-separated list wrapped in
`()` (for example, `(a, b, c)`).

## Accessors

Accessors access/extract specific parts of a node. They are used with the [map
operator](#map-operator), and have syntactic overlap with some
[matchers](#matchers).

* `name()`: Returns the name of the node itself.
* `val(2)`: Returns the third value in a node.
* `val()`: Equivalent to `val(0)`.
* `prop(foo)`: Returns the value of the property `foo` in the node.
* `foo`: Equivalent to `prop(foo)`.
* `props()`: Returns all properties of the node as an object.
* `values()`: Returns all values of the node as an array.

## Examples

Given this document:

```kdl
package {
name "foo"
name foo
version "1.0.0"
dependencies platform="windows" {
dependencies platform=windows {
winapi "1.0.0" path="./crates/my-winapi-fork"
}
dependencies {
miette "2.0.0" dev=true
miette "2.0.0" dev=#true integrity=(sri)sha512-deadbeef
}
}
```

Then the following queries are valid:

* `package name`
* `package >> name`
* -> fetches the `name` node itself
* `top() > package name`
* `top() > package >> name`
* -> fetches the `name` node, guaranteeing that `package` is in the document root.
* `dependencies`
* -> deep-fetches both `dependencies` nodes
Expand All @@ -129,14 +107,25 @@ Then the following queries are valid:
* -> fetches all direct-child nodes of any `dependencies` nodes in the
document. In this case, it will fetch both `miette` and `winapi` nodes.

If using an API that supports the [map operator](#map-operator), the following
are valid queries:

* `package name => val()`
* -> `["foo"]`.
* `dependencies[platform] => platform`
* -> `["windows"]`
* `dependencies > [] => (name(), val(), path)`
* -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]`
* `dependencies > [] => (name(), values(), props())`
* -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]`
## Full Grammar

Rules that are not defined in this grammar are prefixed with `$`, see [the KDL
grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar) for
what they expand to.

```
query-str := $bom? query
query := selector q-ws* "||" q-ws* query | selector
selector := filter q-ws* selector-operator q-ws* selector-subsequent | filter
selector-subsequent := matchers q-ws* selector-operator q-ws* selector-subsequent | matchers
selector-operator := ">>" | ">" | "++" | "+"
filter := "top(" q-ws* ")" | matchers
matchers := type-matcher $string? accessor-matcher* | $string accessor-matcher* | accessor-matcher+
type-matcher := "(" q-ws* ")" | $type
accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]"
comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword)
accessor := "val(" q-ws* $integer q-ws* ")" | "prop(" q-ws* $string q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | $string
matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*="
q-ws := $plain-node-space
```
Loading

0 comments on commit c8632b7

Please sign in to comment.