From 910f6e90a7559113f1a96abb497965700c7fb8f1 Mon Sep 17 00:00:00 2001 From: Danielle Smith Date: Sun, 28 Aug 2022 21:59:26 +0200 Subject: [PATCH 001/105] Do not escape / (Solidus, Forwardslash) (#197) --- SPEC.md | 3 +-- tests/test_cases/input/all_escapes.kdl | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/SPEC.md b/SPEC.md index e2fd106..f625ba1 100644 --- a/SPEC.md +++ b/SPEC.md @@ -319,7 +319,6 @@ interpreted as described in the following table: | Carriage Return | `\r` | `U+000D` | | Character Tabulation (Tab) | `\t` | `U+0009` | | Reverse Solidus (Backslash) | `\\` | `U+005C` | -| Solidus (Forwardslash) | `\/` | `U+002F` | | Quotation Mark (Double Quote) | `\"` | `U+0022` | | Backspace | `\b` | `U+0008` | | Form Feed | `\f` | `U+000C` | @@ -461,7 +460,7 @@ type := '(' identifier ')' string := raw-string | escaped-string escaped-string := '"' character* '"' character := '\' escape | [^\"] -escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}' +escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' hex-digit := [0-9a-fA-F] raw-string := 'r' raw-string-hash diff --git a/tests/test_cases/input/all_escapes.kdl b/tests/test_cases/input/all_escapes.kdl index 5bb1dc3..024cda2 100644 --- a/tests/test_cases/input/all_escapes.kdl +++ b/tests/test_cases/input/all_escapes.kdl @@ -1 +1 @@ -node "\"\\\/\b\f\n\r\t" +node "\"\\\b\f\n\r\t" From 69ac280bf058ad0003a807577585570e8e646723 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:01:07 -0700 Subject: [PATCH 002/105] KQL: require operator and change operator grammar a bit (#221) --- QUERY-SPEC.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 766794f..829f978 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS selectors for familiarity and ease of use. Think of it as CSS Selectors or XPath, but for KDL! -This document describes KQL `1.0.0`. It was released on September 11, 2021. +This document describes KQL `next`. It is unreleased. ## Selectors Selectors use selection operators to filter nodes that will be returned by an API using KQL. The main differences between this and CSS selectors are the -lack of `*` (use `[]` instead), and the specific syntax for +lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for [matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS. * `a > b`: Selects any `b` element that is a direct child of an `a` element. -* `a b`: Selects any `b` element that is a _descendant_ of an `a` element. -* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported. +* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element. +* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported. * `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element. -* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later. +* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later. * `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor) * `a[accessor()]`: Selects any `a` element, filtered by an accessor. * `[]`: Selects any element. @@ -108,16 +108,16 @@ package { winapi "1.0.0" path="./crates/my-winapi-fork" } dependencies { - miette "2.0.0" dev=true + miette "2.0.0" dev=true integrity=(sri)"sha512-deadbeef" } } ``` Then the following queries are valid: -* `package name` +* `package >> name` * -> fetches the `name` node itself -* `top() > package name` +* `top() > package >> name` * -> fetches the `name` node, guaranteeing that `package` is in the document root. * `dependencies` * -> deep-fetches both `dependencies` nodes From 2d5e543bbe5ad68e54ea0394368c30d4be4313a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:01:53 -0700 Subject: [PATCH 003/105] KQL: remove map operator and accessors (#222) Honestly, they're just too implementation-specific --- QUERY-SPEC.md | 39 --------------------------------------- 1 file changed, 39 deletions(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 829f978..7ab1b46 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -69,33 +69,6 @@ is not one of those, the matcher will always fail: * `[val() = (foo)]`: Selects any element whose tag is "foo". -## Map Operator - -KQL implementations MAY support a "map operator", `=>`, that allows selection -of specific parts of the selected notes, essentially "mapping" over a -selector's result set. - -Only a single map operator may be used, and it must be the last element in a -selector string. - -The map operator's right hand side is either an [`accessor`](#accessors) on -its own, or a tuple of accessors, denoted by a comma-separated list wrapped in -`()` (for example, `(a, b, c)`). - -## Accessors - -Accessors access/extract specific parts of a node. They are used with the [map -operator](#map-operator), and have syntactic overlap with some -[matchers](#matchers). - -* `name()`: Returns the name of the node itself. -* `val(2)`: Returns the third value in a node. -* `val()`: Equivalent to `val(0)`. -* `prop(foo)`: Returns the value of the property `foo` in the node. -* `foo`: Equivalent to `prop(foo)`. -* `props()`: Returns all properties of the node as an object. -* `values()`: Returns all values of the node as an array. - ## Examples Given this document: @@ -128,15 +101,3 @@ Then the following queries are valid: * `dependencies > []` * -> fetches all direct-child nodes of any `dependencies` nodes in the document. In this case, it will fetch both `miette` and `winapi` nodes. - -If using an API that supports the [map operator](#map-operator), the following -are valid queries: - -* `package name => val()` - * -> `["foo"]`. -* `dependencies[platform] => platform` - * -> `["windows"]` -* `dependencies > [] => (name(), val(), path)` - * -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]` -* `dependencies > [] => (name(), values(), props())` - * -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]` From 1bf4d740faad46299f55bd4711f96b850663156e Mon Sep 17 00:00:00 2001 From: Basile Henry Date: Sun, 28 Aug 2022 22:07:17 +0200 Subject: [PATCH 004/105] Allow "empty" single line comments in the spec (#234) As I read the grammar in the spec, `"//"` wouldn't parse as a single-line-comment as it requires as least one non-newline character after the slashes. --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index f625ba1..01d7570 100644 --- a/SPEC.md +++ b/SPEC.md @@ -493,7 +493,7 @@ bom := '\u{FEFF}' unicode-space := See Table (All White_Space unicode characters which are not `newline`) -single-line-comment := '//' ^newline+ (newline | eof) +single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block ``` From 78a2d5f5ed821f82acc097d1f7ccba914d08dbef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:14:09 -0700 Subject: [PATCH 005/105] Draft changelog --- CHANGELOG.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..6f9ded7 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,16 @@ +# KDL Changelog + +## 2.0.0 (2022-08-28) + +### Grammar + +* Solidus/Forward slash (`/`) is no longer an escaped character. +* Single line comments (`//`) can now be immediately followed by a newline. + +### KQL + +* There's now a _required_ descendant selector (`>>`), instead of using plain + spaces for that purpose. +* The "any sibling" selector is now `++` instead of `~`, for consistency with + the new descendant selector. +* Map operators have been removed entirely. From f38edc765d4d35238cbef0991153765ae84f337e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 28 Aug 2022 13:28:47 -0700 Subject: [PATCH 006/105] add failing test for removed solidus escape --- tests/test_cases/input/no_solidus_escape.kdl | 1 + 1 file changed, 1 insertion(+) create mode 100644 tests/test_cases/input/no_solidus_escape.kdl diff --git a/tests/test_cases/input/no_solidus_escape.kdl b/tests/test_cases/input/no_solidus_escape.kdl new file mode 100644 index 0000000..5702080 --- /dev/null +++ b/tests/test_cases/input/no_solidus_escape.kdl @@ -0,0 +1 @@ +node "\\" From ffeea8e5aa86edba2ac3fba40ce96755a152d8c6 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Tue, 30 Aug 2022 17:11:51 +0200 Subject: [PATCH 007/105] Use forward slash in solidus-escape test (#288) --- tests/test_cases/input/no_solidus_escape.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/test_cases/input/no_solidus_escape.kdl b/tests/test_cases/input/no_solidus_escape.kdl index 5702080..2dbc2d1 100644 --- a/tests/test_cases/input/no_solidus_escape.kdl +++ b/tests/test_cases/input/no_solidus_escape.kdl @@ -1 +1 @@ -node "\\" +node "\/" From 337bd1bccf2e0fb141e5d4e41ff58eb97d05f892 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Tue, 30 Aug 2022 19:44:44 +0200 Subject: [PATCH 008/105] Update expected output of test with changed input (#289) --- tests/test_cases/expected_kdl/all_escapes.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/test_cases/expected_kdl/all_escapes.kdl b/tests/test_cases/expected_kdl/all_escapes.kdl index c25f434..024cda2 100644 --- a/tests/test_cases/expected_kdl/all_escapes.kdl +++ b/tests/test_cases/expected_kdl/all_escapes.kdl @@ -1 +1 @@ -node "\"\\/\b\f\n\r\t" +node "\"\\\b\f\n\r\t" From 825ff2c17d201688331afc020751b3c9de6de3e4 Mon Sep 17 00:00:00 2001 From: Nathan West Date: Thu, 1 Sep 2022 00:49:01 -0400 Subject: [PATCH 009/105] Add escaped whitespace to KDL strings (#290) * Add escaped whitespace to KDL spec * Add test cases for escaped whitespace * Spelling error --- SPEC.md | 33 ++++++++++++++++++- .../expected_kdl/escaped_whitespace.kdl | 1 + tests/test_cases/input/escaped_whitespace.kdl | 15 +++++++++ 3 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 tests/test_cases/expected_kdl/escaped_whitespace.kdl create mode 100644 tests/test_cases/input/escaped_whitespace.kdl diff --git a/SPEC.md b/SPEC.md index 01d7570..cfeac86 100644 --- a/SPEC.md +++ b/SPEC.md @@ -309,6 +309,8 @@ String Value can encompass multiple lines without behaving like a Newline for Strings _MUST_ be represented as UTF-8 values. +#### Escapes + In addition to literal code points, a number of "escapes" are supported. "Escapes" are the character `\` followed by another character, and are interpreted as described in the following table: @@ -323,6 +325,35 @@ interpreted as described in the following table: | Backspace | `\b` | `U+0008` | | Form Feed | `\f` | `U+000C` | | Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` | +| Whitespace Escape | See below | N/A | + +##### Escaped Whitespace + +In addition to escaping individual characters, `\` can also escape whitespace. +When a `\` is followed by one or more literal whitespace characters, the `\` +and all of that whitespace are discarded. For example, `"Hello World"` and +`"Hello \ World"` are semantically identical. See [whitespace](#whitespace) +and [newlines](#newlines) for how whitespace is defined. + +Note that only literal whitespace is escaped; *escaped* whitespace is retained. +For example, these strings are all semantically identical: + +```kdl +"Hello\ \nWorld" + + "Hello\n\ + World" + +"Hello\nWorld" + +"Hello +World" +``` + +##### Invalid escapes + +Except as described in the escapes table, above, `\` *MUST NOT* precede any +other characters in a string. ### Raw String @@ -460,7 +491,7 @@ type := '(' identifier ')' string := raw-string | escaped-string escaped-string := '"' character* '"' character := '\' escape | [^\"] -escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' +escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] raw-string := 'r' raw-string-hash diff --git a/tests/test_cases/expected_kdl/escaped_whitespace.kdl b/tests/test_cases/expected_kdl/escaped_whitespace.kdl new file mode 100644 index 0000000..a97d10a --- /dev/null +++ b/tests/test_cases/expected_kdl/escaped_whitespace.kdl @@ -0,0 +1 @@ +node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" diff --git a/tests/test_cases/input/escaped_whitespace.kdl b/tests/test_cases/input/escaped_whitespace.kdl new file mode 100644 index 0000000..1f2e67c --- /dev/null +++ b/tests/test_cases/input/escaped_whitespace.kdl @@ -0,0 +1,15 @@ +// All of these strings are the same +node \ + "Hello\n\tWorld" \ + "Hello + World" \ + "Hello\n\ \tWorld" \ + "Hello\n\ + \tWorld" \ + "Hello +\ \tWorld" \ + "Hello\n\t\ + World" + +// Note that this file deliberately mixes space and newline indentation for +// test purposes From 0a4a14d87a4f87fb3fb424d23e021aa6df17d346 Mon Sep 17 00:00:00 2001 From: Hannah Kolbeck Date: Thu, 1 Sep 2022 13:05:53 -0700 Subject: [PATCH 010/105] Add escaped whitespace note to v2 changelog (#291) * Add escaped whitespace note to v2 changelog * Make changelog note on escaping whitespace more detailed --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f9ded7..aa97ac5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,7 @@ * Solidus/Forward slash (`/`) is no longer an escaped character. * Single line comments (`//`) can now be immediately followed by a newline. +* All literal whitespace following a `\` in a string is now discarded. ### KQL From d437cf228b62cf91263b81af590f823ed46ef5c3 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Fri, 2 Sep 2022 16:37:10 +0200 Subject: [PATCH 011/105] Add test for empty single-line comment (#292) --- tests/test_cases/expected_kdl/empty_line_comment.kdl | 1 + tests/test_cases/input/empty_line_comment.kdl | 2 ++ 2 files changed, 3 insertions(+) create mode 100644 tests/test_cases/expected_kdl/empty_line_comment.kdl create mode 100644 tests/test_cases/input/empty_line_comment.kdl diff --git a/tests/test_cases/expected_kdl/empty_line_comment.kdl b/tests/test_cases/expected_kdl/empty_line_comment.kdl new file mode 100644 index 0000000..64f5a0a --- /dev/null +++ b/tests/test_cases/expected_kdl/empty_line_comment.kdl @@ -0,0 +1 @@ +node diff --git a/tests/test_cases/input/empty_line_comment.kdl b/tests/test_cases/input/empty_line_comment.kdl new file mode 100644 index 0000000..e62ef84 --- /dev/null +++ b/tests/test_cases/input/empty_line_comment.kdl @@ -0,0 +1,2 @@ +// +node \ No newline at end of file From 06d1d67359e1be070050922cd202f053039e6171 Mon Sep 17 00:00:00 2001 From: Lars Willighagen Date: Sun, 9 Oct 2022 21:04:10 +0200 Subject: [PATCH 012/105] Add draft grammar for KQL 1.0.0 (#303) * Add draft grammar for KQL 1.0.0 * Change whitespace in KQL grammar * Update KQL grammar to use new operators --- QUERY-SPEC.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 7ab1b46..bf918e7 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -101,3 +101,21 @@ Then the following queries are valid: * `dependencies > []` * -> fetches all direct-child nodes of any `dependencies` nodes in the document. In this case, it will fetch both `miette` and `winapi` nodes. + +## Full Grammar + +For rules that are not defined in this grammar, see [the KDL grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar). + +``` +query := selector q-ws* "||" q-ws* query | selector +selector := filter q-ws* selector-operator q-ws* selector | filter +selector-operator := ">>" | ">" | "++" | "+" +filter := matcher+ +matcher := "top()"| "()" | identifier | type | accessor-matcher +accessor-matcher := "[" (comparison | accessor)? "]" +comparison := accessor q-ws* matcher-operator q-ws* (type | string | number | keyword) +accessor := "val(" number ")" | "prop(" identifier ")" | "name()" | "tag()" | "values()" | "props()" | identifier +matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*=" + +q-ws := bom | unicode-space +``` From 3b39e29feecabe80af70a765e596673ce761cb29 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Fri, 6 Oct 2023 14:13:43 -0700 Subject: [PATCH 013/105] Add vertical tab to whitespace. Closes #331 --- SPEC.md | 1 + 1 file changed, 1 insertion(+) diff --git a/SPEC.md b/SPEC.md index cfeac86..55a4d1a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -427,6 +427,7 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt): | Name | Code Pt | |----------------------|---------| | Character Tabulation | `U+0009` | +| Line Tabulation | `U+000B` | | Space | `U+0020` | | No-Break Space | `U+00A0` | | Ogham Space Mark | `U+1680` | From 568c096465693d3a4bd58e853bdb9c335b135f63 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Fri, 6 Oct 2023 14:30:18 -0700 Subject: [PATCH 014/105] Document the vertical tab addition. --- CHANGELOG.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index aa97ac5..a3bc032 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,7 +6,8 @@ * Solidus/Forward slash (`/`) is no longer an escaped character. * Single line comments (`//`) can now be immediately followed by a newline. -* All literal whitespace following a `\` in a string is now discarded. +* All literal whitespace following a `\` in a string is now discarded. +* Vertical tabs (`U+000B`) are now considered to be whitespace. ### KQL From 0836df1c192e9586bb6b54795ebd69cbeb127715 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Fri, 6 Oct 2023 14:32:01 -0700 Subject: [PATCH 015/105] Restrict idents from looking like raw strings. Closes #200, closes #204, closes #241 --- CHANGELOG.md | 1 + SPEC.md | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a3bc032..cd30307 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ * Single line comments (`//`) can now be immediately followed by a newline. * All literal whitespace following a `\` in a string is now discarded. * Vertical tabs (`U+000B`) are now considered to be whitespace. +* Identifiers can't start with `r#`, so they're easy to distinguish from raw strings. (They already similarly can't start with a digit, or a sign+digit, so they're easy to distinguish from numbers.) ### KQL diff --git a/SPEC.md b/SPEC.md index 55a4d1a..cbd90c7 100644 --- a/SPEC.md +++ b/SPEC.md @@ -482,7 +482,10 @@ node-space := ws* escline ws* | ws+ node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier -bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword +bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - keyword +unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char* +numberish-ident := sign ((identifier-char - digit) identifier-char*)? +stringish-ident := "r" ((identifier-char - "#") identifier-char*)? identifier-char := unicode - linespace - [\/(){}<>;[]=,"] keyword := boolean | 'null' prop := identifier '=' value From eb55930264a347d4656aae9d1aa82cc7dc1cfd7f Mon Sep 17 00:00:00 2001 From: Christopher Durham Date: Sun, 10 Dec 2023 20:44:55 -0500 Subject: [PATCH 016/105] Update formal grammar for KDL 2.0 (#285) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes: https://github.com/kdl-org/kdl/issues/284 * Update formal grammar * Update SPEC.md for KDL 2.0 preview * Update SPEC.md Co-authored-by: Christopher Durham --------- Co-authored-by: Tab Atkins Jr Co-authored-by: Kat Marchán --- SPEC.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/SPEC.md b/SPEC.md index cbd90c7..b06c311 100644 --- a/SPEC.md +++ b/SPEC.md @@ -473,12 +473,20 @@ Note that for the purpose of new lines, CRLF is considered _a single newline_. ## Full Grammar ``` -nodes := linespace* (node nodes?)? linespace* +nodes := (line-space* node)* line-space* -node := ('/-' node-space*)? type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children ws*)? node-space* node-terminator -node-prop-or-arg := ('/-' node-space*)? (prop | value) -node-children := ('/-' node-space*)? '{' nodes '}' -node-space := ws* escline ws* | ws+ +plain-line-space := newline | ws | single-line-comment +plain-node-space := ws* escline ws* | ws+ + +line-space := plain-line-space+ ('/-' plain-node-space* node)? +node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node-children))? + +required-node-space := node-space* plain-node-space+ +optional-node-space := node-space* + +node := type? identifier (required-node-space node-prop-or-arg)* (required-node-space node-children)? optional-node-space node-terminator +node-prop-or-arg := prop | value +node-children := '{' nodes '}' node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier @@ -486,7 +494,7 @@ bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - key unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char* numberish-ident := sign ((identifier-char - digit) identifier-char*)? stringish-ident := "r" ((identifier-char - "#") identifier-char*)? -identifier-char := unicode - linespace - [\/(){}<>;[]=,"] +identifier-char := unicode - line-space - [\/(){}<>;[]=,"] keyword := boolean | 'null' prop := identifier '=' value value := type? (string | number | keyword) @@ -518,8 +526,6 @@ boolean := 'true' | 'false' escline := '\\' ws* (single-line-comment | newline) -linespace := newline | ws | single-line-comment - newline := See Table (All line-break white_space) ws := bom | unicode-space | multi-line-comment From 99abeef6d3b86615161bd6f2de68b06789136ece Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 20:20:15 -0800 Subject: [PATCH 017/105] fix some confusion in grammar syntax, and actually specify the syntax itself (#351) Fixes: https://github.com/kdl-org/kdl/issues/345 --- CHANGELOG.md | 1 + SPEC.md | 30 ++++++++++++++++++++++++++++-- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index cd30307..0e75f51 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ * All literal whitespace following a `\` in a string is now discarded. * Vertical tabs (`U+000B`) are now considered to be whitespace. * Identifiers can't start with `r#`, so they're easy to distinguish from raw strings. (They already similarly can't start with a digit, or a sign+digit, so they're easy to distinguish from numbers.) +* The grammar syntax itself has been described, and some confusing definitions in the grammar have been fixed accordingly (mostly related to escaped characters). ### KQL diff --git a/SPEC.md b/SPEC.md index b06c311..3b5a782 100644 --- a/SPEC.md +++ b/SPEC.md @@ -98,7 +98,7 @@ codepoints other than [non-identifier characters](#non-identifier-characters), so long as this doesn't produce something confusable for a [Number](#number), [Boolean](#boolean), or [Null](#null). For example, both a [Number](#number) and an Identifier can start with `-`, but when an Identifier starts with `-` -the second character cannot be a digit. This is precicely specified in the +the second character cannot be a digit. This is precicely specified in the [Full Grammar](#full-grammar) below. Identifiers are terminated by [Whitespace](#whitespace) or @@ -472,6 +472,10 @@ Note that for the purpose of new lines, CRLF is considered _a single newline_. ## Full Grammar +This is the full official grammar for KDL and should be considered +authoritative if something seems to disagree with the text above. The [grammar +language syntax](#grammar-language) is defined below. + ``` nodes := (line-space* node)* line-space* @@ -494,7 +498,7 @@ bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - key unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char* numberish-ident := sign ((identifier-char - digit) identifier-char*)? stringish-ident := "r" ((identifier-char - "#") identifier-char*)? -identifier-char := unicode - line-space - [\/(){}<>;[]=,"] +identifier-char := unicode - line-space - [\\/(){}<>;\[\]=,"] keyword := boolean | 'null' prop := identifier '=' value value := type? (string | number | keyword) @@ -538,3 +542,25 @@ single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block ``` + +### Grammar language + +The grammar language syntax is a combination of ABNF with some regex spice thrown in. +Specifically: + +* Single quotes (`'`) are used to denote literal text. `\` within a literal + string is used for escaping other single-quotes, for initiating unicode + characters using hex values (`\u{FEFF}`), and for escaping `\` itself + (`\\`). +* `*` is used for "zero or more", `+` is used for "one or more", and `?` is + used for "zero or one". +* `()` can be used to group matches that must be matched together. +* `a | b` means `a or b`, whichever matches first. If multipe items are before + a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`. +* `[]` are used for regex-style character matches, where any character between + the brackets will be a single match. `\` is used to escape `\`, `[`, and + `]`. They also support character ranges (`0-9`), and negation (`^`) +* `-` is used for "except for" or "minus" whatever follows it. For example, `a + - `'x'` means "any `a`, except something that matches the literal `'x'`". +* The prefix `^` means "something that does not match" whatever follows it. + For example, `^foo` means "must not match `foo`". From e6356d5a03416cd8f504b61024100d6b5b7896c5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 20:27:37 -0800 Subject: [PATCH 018/105] =?UTF-8?q?allow=20,<>=20as=20identifier=20charact?= =?UTF-8?q?ers=20since=20they=20no=20longer=20need=20to=20be=20re=E2=80=A6?= =?UTF-8?q?=20(#352)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix some confusion in grammar syntax, and actually specify the syntax itself Fixes: https://github.com/kdl-org/kdl/issues/345 * allow ,<> as identifier characters since they no longer need to be reserved * fix typo * disallow more code points and outright ban certain ones from KDL documents altogether (#353) Fixes: https://github.com/kdl-org/kdl/issues/250 * `r` prefix is no longer required for raw strings (#354) Fixes: https://github.com/kdl-org/kdl/issues/337 --- CHANGELOG.md | 23 +++++++++++++++-- SPEC.md | 70 +++++++++++++++++++++++++++++++++++----------------- 2 files changed, 69 insertions(+), 24 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0e75f51..bc2c41e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,8 +8,27 @@ * Single line comments (`//`) can now be immediately followed by a newline. * All literal whitespace following a `\` in a string is now discarded. * Vertical tabs (`U+000B`) are now considered to be whitespace. -* Identifiers can't start with `r#`, so they're easy to distinguish from raw strings. (They already similarly can't start with a digit, or a sign+digit, so they're easy to distinguish from numbers.) -* The grammar syntax itself has been described, and some confusing definitions in the grammar have been fixed accordingly (mostly related to escaped characters). +* Identifiers can't start with `r#`, so they're easy to distinguish from raw + strings. (They already similarly can't start with a digit, or a sign+digit, + so they're easy to distinguish from numbers.) +* The grammar syntax itself has been described, and some confusing definitions + in the grammar have been fixed accordingly (mostly related to escaped + characters). +* `,`, `<`, and `>` are now legal identifier characters. They were previously + reserved for KQL but this is no longer necessary. +* Code points under `0x20`, code points above `0x10FFFF`, Delete control + character (`0x7F`), and the [unicode "direction control" + characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) + are now completely banned from appearing literally in KDL documents. They + can now only be represented in regular strings, and there's no facilities to + represent them in raw strings. This should be considered a security + improvement. +* Raw strings no longer require an `r` prefix: they are now specified by using + `#""#`. +* `#` is an illegal initial identifier character, but is allowed in other + places in identifiers. +* Line continuations can be followed by an EOF now, instead of requiring a + newline (or comment). `node \` is now a legal KDL document. ### KQL diff --git a/SPEC.md b/SPEC.md index 3b5a782..9480301 100644 --- a/SPEC.md +++ b/SPEC.md @@ -94,7 +94,7 @@ foo 1 key="val" 3 { A bare Identifier is composed of any Unicode codepoint other than [non-initial characters](#non-initial-characters), followed by any number of Unicode -codepoints other than [non-identifier characters](#non-identifier-characters), +code points other than [non-identifier characters](#non-identifier-characters), so long as this doesn't produce something confusable for a [Number](#number), [Boolean](#boolean), or [Null](#null). For example, both a [Number](#number) and an Identifier can start with `-`, but when an Identifier starts with `-` @@ -122,9 +122,9 @@ of having an identifier look like a negative number. The following characters cannot be used anywhere in a bare [Identifier](#identifier): -* Any codepoint with hexadecimal value `0x20` or below. -* Any codepoint with hexadecimal value higher than `0x10FFFF`. -* Any of `\/(){}<>;[]=,"` +* Any of `\/(){};[]="` +* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL + documents. ### Line Continuation @@ -137,6 +137,7 @@ characters and an optional single-line comment. It must be terminated by a Following a line continuation, processing of a Node can continue as usual. #### Example + ```kdl my-node 1 2 \ // comments are ok after \ 3 4 // This is the actual end of the Node. @@ -309,6 +310,10 @@ String Value can encompass multiple lines without behaving like a Newline for Strings _MUST_ be represented as UTF-8 values. +Strings _MUST NOT_ include the code points for [disallowed literal +code points](#disallowed-literal-code-points) directly. If needed, they can be +specified with their corresponding `\u{}` escape. + #### Escapes In addition to literal code points, a number of "escapes" are supported. @@ -362,17 +367,27 @@ support `\`-escapes. They otherwise share the same properties as far as literal [Newline](#newline) characters go, and the requirement of UTF-8 representation. -Raw String literals are represented as `r`, followed by zero or more `#` -characters, followed by `"`, followed by any number of UTF-8 literals. The string is then -closed by a `"` followed by a _matching_ number of `#` characters. This means -that the string sequence `"` or `"#` and such must not match the closing `"` -with the same or more `#` characters as the opening `r`. +Raw String literals are represented with one or more `#` characters, followed +by `"`, followed by any number of UTF-8 literals. The string is then closed by +a `"` followed by a _matching_ number of `#` characters. This means that the +string sequence `"` or `"#` and such must not match the closing `"` with the +same or more `#` characters as the opening `#`, in the body of the string. + +Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal +code-points](#disallowed-literal-code-points) as code points in their body. +Unlike with Strings, these cannot simply be escaped, and are thus +unrepresentable when using Raw Strings. + +Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal +code-points](#disallowed-literal-code-points) as code points in their body. +Unlike with Strings, these cannot simply be escaped, and are thus +unrepresentable when using Raw Strings. #### Example ```kdl -just-escapes r"\n will be literal" -quotes-and-escapes r#"hello\n\r\asd"world"# +just-escapes #"\n will be literal"# +quotes-and-escapes ##"hello\n\r\asd"#world"## ``` ### Number @@ -470,6 +485,16 @@ lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf): Note that for the purpose of new lines, CRLF is considered _a single newline_. +### Disallowed Literal Code Points + +The following code points may not appear literally anywhere in the document. +They may be represented in Strings (but not Raw Strings) using `\u{}`. + +* Any codepoint with hexadecimal value `0x20` or below (various control characters). +* `0x7F` (the Delete control character). +* Any codepoint with hexadecimal value higher than `0x10FFFF`. +* `0x2066-2069` and `0x202A-202E`, the [unicode "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) + ## Full Grammar This is the full official grammar for KDL and should be considered @@ -494,25 +519,24 @@ node-children := '{' nodes '}' node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier -bare-identifier := (unambiguous-ident | numberish-ident | stringish-ident) - keyword -unambiguous-ident := (identifier-char - digit - sign - "r") identifier-char* +bare-identifier := (unambiguous-ident | numberish-ident) - keyword +unambiguous-ident := (identifier-char - digit - sign - "#") identifier-char* numberish-ident := sign ((identifier-char - digit) identifier-char*)? -stringish-ident := "r" ((identifier-char - "#") identifier-char*)? -identifier-char := unicode - line-space - [\\/(){}<>;\[\]=,"] +identifier-char := unicode - line-space - [\\/(){};\[\]="] - disallowed-literal-code-points + keyword := boolean | 'null' -prop := identifier '=' value +prop := identifier '=' valuel value := type? (string | number | keyword) type := '(' identifier ')' string := raw-string | escaped-string -escaped-string := '"' character* '"' -character := '\' escape | [^\"] +escaped-string := '"' string-character* '"' +string-character := '\' escape | [^\"] - disallowed-literal-code-points escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] -raw-string := 'r' raw-string-hash -raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes -raw-string-quotes := '"' .* '"' +raw-string := '#' raw-string-quotes '#' | '#' raw-string '#' +raw-string-quotes := '"' (unicode - disallowed-literal-code-points) '"' number := decimal | hex | octal | binary @@ -528,7 +552,7 @@ binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* boolean := 'true' | 'false' -escline := '\\' ws* (single-line-comment | newline) +escline := '\\' ws* (single-line-comment | newline | eof) newline := See Table (All line-break white_space) @@ -536,6 +560,8 @@ ws := bom | unicode-space | multi-line-comment bom := '\u{FEFF}' +disallowed-literal-code-points := See Table (Disallowed Literal Code Points) + unicode-space := See Table (All White_Space unicode characters which are not `newline`) single-line-comment := '//' ^newline* (newline | eof) From 85aa3a09abd618c2bcbc2843771e40e262db7db7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:03:30 -0800 Subject: [PATCH 019/105] treat bare identifiers and strings in value locations (#358) Fixes: https://github.com/kdl-org/kdl/issues/339 --- CHANGELOG.md | 12 +++++++++ SPEC.md | 75 ++++++++++++++++++++++++++++++---------------------- 2 files changed, 55 insertions(+), 32 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index bc2c41e..36fbe9a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,18 @@ places in identifiers. * Line continuations can be followed by an EOF now, instead of requiring a newline (or comment). `node \` is now a legal KDL document. +* `#` is no longer a legal identifier character. +* `null`, `true`, and `false` are now `#null`, `#true`, and `#false`. Using + the unprefixed versions of these values is a syntax error. +* The spec prose has more explicitly stated that whitespace and newlines are + not valid identifier characters, even though the grammar already expressed + this. +* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values. +* The spec prose now more explicitly states that strings and raw strings can + be used as type annotations. +* A statement in the spec prose that said "It is reasonable for an + implementation to ignore null values altogether when deserializing". This is + no longer encouraged or desired. ### KQL diff --git a/SPEC.md b/SPEC.md index 9480301..3457aaa 100644 --- a/SPEC.md +++ b/SPEC.md @@ -93,17 +93,27 @@ foo 1 key="val" 3 { ### Identifier A bare Identifier is composed of any Unicode codepoint other than [non-initial -characters](#non-initial-characters), followed by any number of Unicode -code points other than [non-identifier characters](#non-identifier-characters), -so long as this doesn't produce something confusable for a [Number](#number), -[Boolean](#boolean), or [Null](#null). For example, both a [Number](#number) -and an Identifier can start with `-`, but when an Identifier starts with `-` -the second character cannot be a digit. This is precicely specified in the -[Full Grammar](#full-grammar) below. +characters](#non-initial-characters), followed by any number of Unicode code +points other than [non-identifier characters](#non-identifier-characters), so +long as this doesn't produce something confusable for a [Number](#number). For +example, both a [Number](#number) and an Identifier can start with `-`, but +when an Identifier starts with `-` the second character cannot be a digit. +This is precicely specified in the [Full Grammar](#full-grammar) below. + +When Identifiers are used as the values in [Arguments](#argument) and +[Properties](#property), they are treated as strings, just like they are with +node names and property keys. Identifiers are terminated by [Whitespace](#whitespace) or [Newlines](#newline). +In all places where Identifiers are used, [Strings](#string) and [Raw +Strings](#raw-string) can be used in the same place, without an Identifier's +character restrictions. + +The literal identifiers `true`, `false`, and `null` are illegal identifiers, +and _MUST_ be treated as a syntax error. + ### Non-initial characters The following characters cannot be the first character in a bare @@ -112,17 +122,18 @@ The following characters cannot be the first character in a bare * Any decimal digit (0-9) * Any [non-identifier characters](#non-identifier-characters) -Be aware that the `-` character can only be used as an initial -character if the second character is not a digit. This allows -identifiers to look like `--this`, and removes the ambiguity -of having an identifier look like a negative number. +Additionally, the `-` character can only be used as an initial character if +the second character is *not* a digit. This allows identifiers to look like +`--this`, and removes the ambiguity of having an identifier look like a +negative number. ### Non-identifier characters The following characters cannot be used anywhere in a bare [Identifier](#identifier): -* Any of `\/(){};[]="` +* Any of `(){}[]/\="#;` +* Any [Whitespace](#whitespace) or [Newline](#newline). * Any [disallowed literal code points](#disallowed-literal-code-points) in KDL documents. @@ -180,7 +191,7 @@ make it act as plain whitespace, even if it spreads across multiple lines. #### Example ```kdl -my-node 1 2 3 "a" "b" "c" +my-node 1 2 3 a b c ``` ### Children Block @@ -205,8 +216,9 @@ parent { child1; child2; } ### Value -A value is either: a [String](#string), a [Raw String](#raw-string), a -[Number](#number), a [Boolean](#boolean), or [Null](#null) +A value is either: an [Identifier](#identifier), a [String](#string), a [Raw +String](#raw-string), a [Number](#number), a [Boolean](#boolean), or +[Null](#null) Values _MUST_ be either [Arguments](#argument) or values of [Properties](#property). @@ -221,9 +233,9 @@ or as a _context-specific elaboration_ of the more generic type the node name indicates. Type annotations are written as a set of `(` and `)` with a single -[Identifier](#identifier) in it. Any valid identifier is considered a valid -type annotation. There must be no whitespace between a type annotation and its -associated Node Name or Value. +[Identifier](#identifier) in it. Any valid identifier or string is considered +a valid type annotation. There must be no whitespace between a type annotation +and its associated Node Name or Value. KDL does not specify any restrictions on what implementations might do with these annotations. They are free to ignore them, or use them to make decisions @@ -295,7 +307,7 @@ IEEE 754-2008 decimal floating point numbers ```kdl node (u8)123 -node prop=(regex)".*" +node prop=(regex).* (published)date "1970-01-01" (contributor)person name="Foo McBar" ``` @@ -411,27 +423,26 @@ There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary. ### Boolean -A boolean [Value](#value) is either the symbol `true` or `false`. These +A boolean [Value](#value) is either the symbol `#true` or `#false`. These _SHOULD_ be represented by implementation as boolean logical values, or some approximation thereof. #### Example ```kdl -my-node true value=false +my-node true value=#false ``` ### Null -The symbol `null` represents a null [Value](#value). It's up to the +The symbol `#null` represents a null [Value](#value). It's up to the implementation to decide how to represent this, but it generally signals the -"absence" of a value. It is reasonable for an implementation to ignore null -values altogether when deserializing. +"absence" of a value. #### Example ```kdl -my-node null key=null +my-node #null key=#null ``` ### Whitespace @@ -519,19 +530,19 @@ node-children := '{' nodes '}' node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier -bare-identifier := (unambiguous-ident | numberish-ident) - keyword -unambiguous-ident := (identifier-char - digit - sign - "#") identifier-char* +bare-identifier := (unambiguous-ident - boolean - 'null') | numberish-ident +unambiguous-ident := (identifier-char - digit - sign) identifier-char* numberish-ident := sign ((identifier-char - digit) identifier-char*)? -identifier-char := unicode - line-space - [\\/(){};\[\]="] - disallowed-literal-code-points +identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points -keyword := boolean | 'null' -prop := identifier '=' valuel -value := type? (string | number | keyword) +keyword := '#' (boolean | 'null') +prop := identifier '=' value +value := type? (identifier | string | number | keyword) type := '(' identifier ')' string := raw-string | escaped-string escaped-string := '"' string-character* '"' -string-character := '\' escape | [^\"] - disallowed-literal-code-points +string-character := '\' escape | [^\\"] - disallowed-literal-code-points escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] From 2694146af4fd2fb027c362080302e36923037ffc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:12:11 -0800 Subject: [PATCH 020/105] # is just plain illegal now --- CHANGELOG.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 36fbe9a..98e1561 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -25,8 +25,6 @@ improvement. * Raw strings no longer require an `r` prefix: they are now specified by using `#""#`. -* `#` is an illegal initial identifier character, but is allowed in other - places in identifiers. * Line continuations can be followed by an EOF now, instead of requiring a newline (or comment). `node \` is now a legal KDL document. * `#` is no longer a legal identifier character. From 5e89c4550abe2f7ccbddb972fd8876573c38bd8a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:34:02 -0800 Subject: [PATCH 021/105] Update all examples to use most changes --- examples/Cargo.kdl | 4 +- examples/ci.kdl | 32 ++--- examples/kdl-schema.kdl | 310 ++++++++++++++++++++-------------------- examples/nuget.kdl | 154 ++++++++++---------- examples/website.kdl | 22 +-- 5 files changed, 261 insertions(+), 261 deletions(-) diff --git a/examples/Cargo.kdl b/examples/Cargo.kdl index 7b58dba..f3465b4 100644 --- a/examples/Cargo.kdl +++ b/examples/Cargo.kdl @@ -1,9 +1,9 @@ package { - name "kdl" + name kdl version "0.0.0" description "kat's document language" authors "Kat Marchán " - license-file "LICENSE.md" + license-file LICENSE.md edition "2018" } diff --git a/examples/ci.kdl b/examples/ci.kdl index d5443c7..3dccd83 100644 --- a/examples/ci.kdl +++ b/examples/ci.kdl @@ -1,46 +1,46 @@ // This example is a GitHub Action if it used KDL syntax. // See .github/workflows/ci.yml for the file this was based on. -name "CI" +name CI -on "push" "pull_request" +on push pull_request env { - RUSTFLAGS "-Dwarnings" + RUSTFLAGS -Dwarnings } jobs { fmt_and_docs "Check fmt & build docs" { - runs-on "ubuntu-latest" + runs-on ubuntu-latest steps { step uses="actions/checkout@v1" step "Install Rust" uses="actions-rs/toolchain@v1" { - profile "minimal" - toolchain "stable" - components "rustfmt" - override true + profile minimal + toolchain stable + components rustfmt + override #true } - step "rustfmt" run="cargo fmt --all -- --check" - step "docs" run="cargo doc --no-deps" + step rustfmt run="cargo fmt --all -- --check" + step docs run="cargo doc --no-deps" } } build_and_test "Build & Test" { runs-on "${{ matrix.os }}" strategy { matrix { - rust "1.46.0" "stable" - os "ubuntu-latest" "macOS-latest" "windows-latest" + rust "1.46.0" stable + os ubuntu-latest macOS-latest windows-latest } } steps { step uses="actions/checkout@v1" step "Install Rust" uses="actions-rs/toolchain@v1" { - profile "minimal" + profile minimal toolchain "${{ matrix.rust }}" - components "clippy" - override true + components clippy + override #true } - step "Clippy" run="cargo clippy --all -- -D warnings" + step Clippy run="cargo clippy --all -- -D warnings" step "Run tests" run="cargo test --all --verbose" } } diff --git a/examples/kdl-schema.kdl b/examples/kdl-schema.kdl index 76a1080..041c464 100644 --- a/examples/kdl-schema.kdl +++ b/examples/kdl-schema.kdl @@ -1,374 +1,374 @@ document { info { - title "KDL Schema" lang="en" - description "KDL Schema KDL schema in KDL" lang="en" + title "KDL Schema" lang=en + description "KDL Schema KDL schema in KDL" lang=en author "Kat Marchán" { - link "https://github.com/zkat" rel="self" + link "https://github.com/zkat" rel=self } contributor "Lars Willighagen" { - link "https://github.com/larsgw" rel="self" + link "https://github.com/larsgw" rel=self } - link "https://github.com/zkat/kdl" rel="documentation" - license "Creative Commons Attribution-ShareAlike 4.0 International License" spdx="CC-BY-SA-4.0" { - link "https://creativecommons.org/licenses/by-sa/4.0/" lang="en" + link "https://github.com/zkat/kdl" rel=documentation + license "Creative Commons Attribution-ShareAlike 4.0 International License" spdx=CC-BY-SA-4.0 { + link "https://creativecommons.org/licenses/by-sa/4.0/" lang=en } published "2021-08-31" modified "2021-09-01" } - node "document" { + node document { min 1 max 1 - children id="node-children" { - node "node-names" id="node-names-node" description="Validations to apply specifically to arbitrary node names" { - children ref=r#"[id="validations"]"# + children id=node-children { + node node-names id=node-names-node description="Validations to apply specifically to arbitrary node names" { + children ref=#"[id="validations"]"# } - node "other-nodes-allowed" id="other-nodes-allowed-node" description="Whether to allow child nodes other than the ones explicitly listed. Defaults to 'false'." { + node other-nodes-allowed id=other-nodes-allowed-node description="Whether to allow child nodes other than the ones explicitly listed. Defaults to '#false'." { max 1 value { min 1 max 1 - type "boolean" + type boolean } } - node "tag-names" description="Validations to apply specifically to arbitrary type tag names" { - children ref=r#"[id="validations"]"# + node tag-names description="Validations to apply specifically to arbitrary type tag names" { + children ref=#"[id="validations"]"# } - node "other-tags-allowed" description="Whether to allow child node tags other than the ones explicitly listed. Defaults to 'false'." { + node other-tags-allowed description="Whether to allow child node tags other than the ones explicitly listed. Defaults to '#false'." { max 1 value { min 1 max 1 - type "boolean" + type boolean } } - node "info" description="A child node that describes the schema itself." { + node info description="A child node that describes the schema itself." { children { - node "title" description="The title of the schema or the format it describes" { + node title description="The title of the schema or the format it describes" { value description="The title text" { - type "string" + type string min 1 max 1 } - prop "lang" id="info-lang" description="The language of the text" { - type "string" + prop lang id=info-lang description="The language of the text" { + type string } } - node "description" description="A description of the schema or the format it describes" { + node description description="A description of the schema or the format it describes" { value description="The description text" { - type "string" + type string min 1 max 1 } - prop ref=r#"[id="info-lang"]"# + prop ref=#"[id="info-lang"]"# } - node "author" description="Author of the schema" { - value id="info-person-name" description="Person name" { - type "string" + node author description="Author of the schema" { + value id=info-person-name description="Person name" { + type string min 1 max 1 } - prop "orcid" id="info-orcid" description="The ORCID of the person" { - type "string" - pattern r"\d{4}-\d{4}-\d{4}-\d{4}" + prop orcid id=info-orcid description="The ORCID of the person" { + type string + pattern #"\d{4}-\d{4}-\d{4}-\d{4}"# } children { - node ref=r#"[id="info-link"]"# + node ref=#"[id="info-link"]"# } } - node "contributor" description="Contributor to the schema" { - value ref=r#"[id="info-person-name"]"# - prop ref=r#"[id="info-orcid"]"# + node contributor description="Contributor to the schema" { + value ref=#"[id="info-person-name"]"# + prop ref=#"[id="info-orcid"]"# children { - node ref=r#"[id="info-link"]"# + node ref=#"[id="info-link"]"# } } - node "link" id="info-link" description="Links to itself, and to sources describing it" { + node link id=info-link description="Links to itself, and to sources describing it" { value description="A URL that the link points to" { - type "string" - format "url" "irl" + type string + format url irl min 1 max 1 } - prop "rel" description="The relation between the current entity and the URL" { - type "string" - enum "self" "documentation" + prop rel description="The relation between the current entity and the URL" { + type string + enum self documentation } - prop ref=r#"[id="info-lang"]"# + prop ref=#"[id="info-lang"]"# } - node "license" description="The license(s) that the schema is licensed under" { + node license description="The license(s) that the schema is licensed under" { value description="Name of the used license" { - type "string" + type string min 1 max 1 } - prop "spdx" description="An SPDX license identifier" { - type "string" + prop spdx description="An SPDX license identifier" { + type string } children { - node ref=r#"[id="info-link"]"# + node ref=#"[id="info-link"]"# } } - node "published" description="When the schema was published" { + node published description="When the schema was published" { value description="Publication date" { - type "string" - format "date" + type string + format date min 1 max 1 } - prop "time" id="info-time" description="A time to accompany the date" { - type "string" - format "time" + prop time id=info-time description="A time to accompany the date" { + type string + format time } } - node "modified" description="When the schema was last modified" { + node modified description="When the schema was last modified" { value description="Modification date" { - type "string" - format "date" + type string + format date min 1 max 1 } - prop ref=r#"[id="info-time"]"# + prop ref=#"[id="info-time"]"# } - node "version" description="The version number of this version of the schema" { + node version description="The version number of this version of the schema" { value description="Semver version number" { - type "string" - pattern r"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$" + type string + pattern #"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"# min 1 max 1 } } } } - node "tag" id="tag-node" description="A tag belonging to a child node of `document` or another node." { + node tag id=tag-node description="A tag belonging to a child node of `document` or another node." { value description="The name of the tag. If a tag name is not supplied, the node rules apply to _all_ nodes belonging to the parent." { - type "string" + type string max 1 } - prop "description" description="A description of this node's purpose." { - type "string" + prop description description="A description of this node's purpose." { + type string } - prop "id" description="A globally-unique ID for this node." { - type "string" + prop id description="A globally-unique ID for this node." { + type string } - prop "ref" description="A globally unique reference to another node." { - type "string" - format "kdl-query" + prop ref description="A globally unique reference to another node." { + type string + format kdl-query } children { - node ref=r#"[id="node-names-node"]"# - node ref=r#"[id="other-nodes-allowed-node"]"# - node ref=r#"[id="node-node"]"# + node ref=#"[id="node-names-node"]"# + node ref=#"[id="other-nodes-allowed-node"]"# + node ref=#"[id="node-node"]"# } } - node "node" id="node-node" description="A child node belonging either to `document` or to another `node`. Nodes may be anonymous." { + node node id=node-node description="A child node belonging either to `document` or to another `node`. Nodes may be anonymous." { value description="The name of the node. If a node name is not supplied, the node rules apply to _all_ nodes belonging to the parent." { - type "string" + type string max 1 } - prop "description" description="A description of this node's purpose." { - type "string" + prop description description="A description of this node's purpose." { + type string } - prop "id" description="A globally-unique ID for this node." { - type "string" + prop id description="A globally-unique ID for this node." { + type string } - prop "ref" description="A globally unique reference to another node." { - type "string" - format "kdl-query" + prop ref description="A globally unique reference to another node." { + type string + format kdl-query } children { - node "prop-names" description="Validations to apply specifically to arbitrary property names" { - children ref=r#"[id="validations"]"# + node prop-names description="Validations to apply specifically to arbitrary property names" { + children ref=#"[id="validations"]"# } - node "other-props-allowed" description="Whether to allow properties other than the ones explicitly listed. Defaults to 'false'." { + node other-props-allowed description="Whether to allow properties other than the ones explicitly listed. Defaults to '#false'." { max 1 value { min 1 max 1 - type "boolean" + type boolean } } - node "min" description="minimum number of instances of this node in its parent's children." { + node min description="minimum number of instances of this node in its parent's children." { max 1 value { min 1 max 1 - type "number" + type number } } - node "max" description="maximum number of instances of this node in its parent's children." { + node max description="maximum number of instances of this node in its parent's children." { max 1 value { min 1 max 1 - type "number" + type number } } - node ref=r#"[id="value-tag-node"]"# - node "prop" id="prop-node" description="A node property key/value pair." { + node ref=#"[id="value-tag-node"]"# + node prop id="prop-node" description="A node property key/value pair." { value description="The property key." { - type "string" + type string } - prop "id" description="A globally-unique ID of this property." { - type "string" + prop id description="A globally-unique ID of this property." { + type string } - prop "ref" description="A globally unique reference to another property node." { - type "string" - format "kdl-query" + prop ref description="A globally unique reference to another property node." { + type string + format kdl-query } - prop "description" description="A description of this property's purpose." { - type "string" + prop description description="A description of this property's purpose." { + type string } children description="Property-specific validations." { - node "required" description="Whether this property is required if its parent is present." { + node required description="Whether this property is required if its parent is present." { max 1 value { min 1 max 1 - type "boolean" + type boolean } } } - children id="validations" description="General value validations." { - node "tag" id="value-tag-node" description="The tags associated with this value" { + children id=validations description="General value validations." { + node tag id=value-tag-node description="The tags associated with this value" { max 1 - children ref=r#"[id="validations"]"# + children ref=#"[id="validations"]"# } - node "type" description="The type for this prop's value." { + node type description="The type for this prop's value." { max 1 value { min 1 - type "string" + type string } } - node "enum" description="An enumeration of possible values" { + node enum description="An enumeration of possible values" { max 1 value description="Enumeration choices" { min 1 } } - node "pattern" description="PCRE (Regex) pattern or patterns to test prop values against." { + node pattern description="PCRE (Regex) pattern or patterns to test prop values against." { value { min 1 - type "string" + type string } } - node "min-length" description="Minimum length of prop value, if it's a string." { + node min-length description="Minimum length of prop value, if it's a string." { max 1 value { min 1 - type "number" + type number } } - node "max-length" description="Maximum length of prop value, if it's a string." { + node max-length description="Maximum length of prop value, if it's a string." { max 1 value { min 1 - type "number" + type number } } - node "format" description="Intended data format." { + node format description="Intended data format." { max 1 value { min 1 - type "string" + type string // https://json-schema.org/understanding-json-schema/reference/string.html#format - enum "date-time" "date" "time" "duration" "decimal" "currency" "country-2" "country-3" "country-subdivision" "email" "idn-email" "hostname" "idn-hostname" "ipv4" "ipv6" "url" "url-reference" "irl" "irl-reference" "url-template" "regex" "uuid" "kdl-query" "i8" "i16" "i32" "i64" "u8" "u16" "u32" "u64" "isize" "usize" "f32" "f64" "decimal64" "decimal128" + enum date-time date time duration decimal currency country-2 country-3 country-subdivision email idn-email hostname idn-hostname ipv4 ipv6 url url-reference irl irl-reference url-template regex uuid kdl-query i8 i16 i32 i64 u8 u16 u32 u64 isize usize f32 f64 decimal64 decimal128 } } - node "%" description="Only used for numeric values. Constrains them to be multiples of the given number(s)" { + node % description="Only used for numeric values. Constrains them to be multiples of the given number(s)" { max 1 value { min 1 - type "number" + type number } } - node ">" description="Only used for numeric values. Constrains them to be greater than the given number(s)" { + node > description="Only used for numeric values. Constrains them to be greater than the given number(s)" { max 1 value { min 1 max 1 - type "number" + type number } } - node ">=" description="Only used for numeric values. Constrains them to be greater than or equal to the given number(s)" { + node >= description="Only used for numeric values. Constrains them to be greater than or equal to the given number(s)" { max 1 value { min 1 max 1 - type "number" + type number } } - node "<" description="Only used for numeric values. Constrains them to be less than the given number(s)" { + node < description="Only used for numeric values. Constrains them to be less than the given number(s)" { max 1 value { min 1 max 1 - type "number" + type number } } - node "<=" description="Only used for numeric values. Constrains them to be less than or equal to the given number(s)" { + node <= description="Only used for numeric values. Constrains them to be less than or equal to the given number(s)" { max 1 value { min 1 max 1 - type "number" + type number } } } } - node "value" id="value-node" description="one or more direct node values" { - prop "id" description="A globally-unique ID of this value." { - type "string" + node value id=value-node description="one or more direct node values" { + prop id description="A globally-unique ID of this value." { + type string } - prop "ref" description="A globally unique reference to another value node." { - type "string" - format "kdl-query" + prop ref description="A globally unique reference to another value node." { + type string + format kdl-query } - prop "description" description="A description of this property's purpose." { - type "string" + prop description description="A description of this property's purpose." { + type string } - children ref=r#"[id="validations"]"# + children ref=#"[id="validations"]"# children description="Node value-specific validations" { - node "min" description="minimum number of values for this node." { + node min description="minimum number of values for this node." { max 1 value { min 1 max 1 - type "number" + type number } } - node "max" description="maximum number of values for this node." { + node max description="maximum number of values for this node." { max 1 value { min 1 max 1 - type "number" + type number } } } } - node "children" id="children-node" { - prop "id" description="A globally-unique ID of this children node." { - type "string" + node children id=children-node { + prop id description="A globally-unique ID of this children node." { + type string } - prop "ref" description="A globally unique reference to another children node." { - type "string" - format "kdl-query" + prop ref description="A globally unique reference to another children node." { + type string + format kdl-query } - prop "description" description="A description of this these children's purpose." { - type "string" + prop description description="A description of this these children's purpose." { + type string } - children ref=r#"[id="node-children"]"# + children ref=#"[id="node-children"]"# } } } - node "definitions" description="Definitions to reference in parts of the top-level nodes" { + node definitions description="Definitions to reference in parts of the top-level nodes" { children { - node ref=r#"[id="node-node"]"# - node ref=r#"[id="value-node"]"# - node ref=r#"[id="prop-node"]"# - node ref=r#"[id="children-node"]"# - node ref=r#"[id="tag-node"]"# + node ref=#"[id="node-node"]"# + node ref=#"[id="value-node"]"# + node ref=#"[id="prop-node"]"# + node ref=#"[id="children-node"]"# + node ref=#"[id="tag-node"]"# } } } diff --git a/examples/nuget.kdl b/examples/nuget.kdl index 9ab4aa1..0319999 100644 --- a/examples/nuget.kdl +++ b/examples/nuget.kdl @@ -1,48 +1,48 @@ // Based on https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Clients/NuGet.CommandLine/NuGet.CommandLine.csproj Project { PropertyGroup { - IsCommandLinePackage true + IsCommandLinePackage #true } - Import Project=r"$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), 'README.md'))\build\common.props" - Import Project="Sdk.props" Sdk="Microsoft.NET.Sdk" - Import Project="ilmerge.props" + Import Project=#"$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), 'README.md'))\build\common.props"# + Import Project=Sdk.props Sdk=Microsoft.NET.Sdk + Import Project=ilmerge.props PropertyGroup { - RootNamespace "NuGet.CommandLine" - AssemblyName "NuGet" + RootNamespace NuGet.CommandLine + AssemblyName NuGet AssemblyTitle "NuGet Command Line" - PackageId "NuGet.CommandLine" + PackageId NuGet.CommandLine TargetFramework "$(NETFXTargetFramework)" - GenerateDocumentationFile false + GenerateDocumentationFile #false Description "NuGet Command Line Interface." - ApplicationManifest "app.manifest" - Shipping true - OutputType "Exe" - ComVisible false + ApplicationManifest app.manifest + Shipping #true + OutputType Exe + ComVisible #false // Pack properties - PackProject true - IncludeBuildOutput false + PackProject #true + IncludeBuildOutput #false TargetsForTfmSpecificContentInPackage "$(TargetsForTfmSpecificContentInPackage)" "CreateCommandlineNupkg" - SuppressDependenciesWhenPacking true - DevelopmentDependency true - PackageRequireLicenseAcceptance false - UsePublicApiAnalyzer false + SuppressDependenciesWhenPacking #true + DevelopmentDependency #true + PackageRequireLicenseAcceptance #false + UsePublicApiAnalyzer #false } - Target Name="CreateCommandlineNupkg" { + Target Name=CreateCommandlineNupkg { ItemGroup { - TfmSpecificPackageFile Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe" { + TfmSpecificPackageFile Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"# { PackagePath "tools/" } - TfmSpecificPackageFile Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb" { + TfmSpecificPackageFile Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb"# { PackagePath "tools/" } } } ItemGroup Condition="$(DefineConstants.Contains(SIGNED_BUILD))" { - AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" { + AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo { _Parameter1 "NuGet.CommandLine.FuncTest, PublicKey=002400000480000094000000060200000024000052534131000400000100010007d1fa57c4aed9f0a32e84aa0faefd0de9e8fd6aec8f87fb03766c834c99921eb23be79ad9d5dcc1dd9ad236132102900b723cf980957fc4e177108fc607774f29e8320e92ea05ece4e821c0a5efe8f1645c4c0c93c1ab99285d622caa652c1dfad63d745d6f2de5f17e5eaf0fc4963d261c8a12436518206dc093344d5ad293" } AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" { @@ -51,81 +51,81 @@ Project { } ItemGroup Condition="!$(DefineConstants.Contains(SIGNED_BUILD))" { - AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" { - _Parameter1 "NuGet.CommandLine.FuncTest" + AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo { + _Parameter1 NuGet.CommandLine.FuncTest } - AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" { - _Parameter1 "NuGet.CommandLine.Test" + AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo { + _Parameter1 NuGet.CommandLine.Test } } ItemGroup Condition="$(DefineConstants.Contains(SIGNED_BUILD))" { - AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" { + AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo { _Parameter1 "NuGet.CommandLine.Test, PublicKey=002400000480000094000000060200000024000052534131000400000100010007d1fa57c4aed9f0a32e84aa0faefd0de9e8fd6aec8f87fb03766c834c99921eb23be79ad9d5dcc1dd9ad236132102900b723cf980957fc4e177108fc607774f29e8320e92ea05ece4e821c0a5efe8f1645c4c0c93c1ab99285d622caa652c1dfad63d745d6f2de5f17e5eaf0fc4963d261c8a12436518206dc093344d5ad293" } } ItemGroup Condition="!$(DefineConstants.Contains(SIGNED_BUILD))" { - AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo" { - _Parameter1 "NuGet.CommandLine.Test" + AssemblyAttribute Include=System.Runtime.CompilerServices.InternalsVisibleTo { + _Parameter1 NuGet.CommandLine.Test } } ItemGroup { - Reference Include="Microsoft.Build.Utilities.v4.0" - Reference Include="Microsoft.CSharp" - Reference Include="System" - Reference Include="System.ComponentModel.Composition" - Reference Include="System.ComponentModel.Composition.Registration" - Reference Include="System.ComponentModel.DataAnnotations" - Reference Include="System.IO.Compression" - Reference Include="System.Net.Http" - Reference Include="System.Xml" - Reference Include="System.Xml.Linq" - Reference Include="NuGet.Core" { - HintPath r"$(SolutionPackagesFolder)nuget.core\2.14.0-rtm-832\lib\net40-Client\NuGet.Core.dll" - Aliases "CoreV2" + Reference Include=Microsoft.Build.Utilities.v4.0 + Reference Include=Microsoft.CSharp + Reference Include=System + Reference Include=System.ComponentModel.Composition + Reference Include=System.ComponentModel.Composition.Registration + Reference Include=System.ComponentModel.DataAnnotations + Reference Include=System.IO.Compression + Reference Include=System.Net.Http + Reference Include=System.Xml + Reference Include=System.Xml.Linq + Reference Include=NuGet.Core" { + HintPath #"$(SolutionPackagesFolder)nuget.core\2.14.0-rtm-832\lib\net40-Client\NuGet.Core.dll"# + Aliases CoreV2 } } ItemGroup { - PackageReference Include="Microsoft.VisualStudio.Setup.Configuration.Interop" - ProjectReference Include=r"$(NuGetCoreSrcDirectory)NuGet.PackageManagement\NuGet.PackageManagement.csproj" - ProjectReference Include=r"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.Build.Tasks.csproj" + PackageReference Include=Microsoft.VisualStudio.Setup.Configuration.Interop + ProjectReference Include=#"$(NuGetCoreSrcDirectory)NuGet.PackageManagement\NuGet.PackageManagement.csproj"# + ProjectReference Include=#"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.Build.Tasks.csproj"# } ItemGroup { - EmbeddedResource Update="NuGetCommand.resx" { - Generator "ResXFileCodeGenerator" - LastGenOutput "NuGetCommand.Designer.cs" + EmbeddedResource Update=NuGetCommand.resx { + Generator ResXFileCodeGenerator + LastGenOutput NuGetCommand.Designer.cs } - Compile Update="NuGetCommand.Designer.cs" { - DesignTime true - AutoGen true - DependentUpon "NuGetCommand.resx" + Compile Update=NuGetCommand.Designer.cs { + DesignTime #true + AutoGen #true + DependentUpon NuGetCommand.resx } - EmbeddedResource Update="NuGetResources.resx" { + EmbeddedResource Update=NuGetResources.resx { // Strings are shared by other projects, use public strings. - Generator "PublicResXFileCodeGenerator" - LastGenOutput "NuGetResources.Designer.cs" + Generator PublicResXFileCodeGenerator + LastGenOutput NuGetResources.Designer.cs } - Compile Update="NuGetResources.Designer.cs" { - DesignTime true - AutoGen true - DependentUpon "NuGetResources.resx" + Compile Update=NuGetResources.Designer.cs { + DesignTime #true + AutoGen #true + DependentUpon NuGetResources.resx } } ItemGroup { - EmbeddedResource Include=r"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.targets" { - Link "NuGet.targets" - SubType "Designer" + EmbeddedResource Include=#"$(NuGetCoreSrcDirectory)NuGet.Build.Tasks\NuGet.targets"# { + Link NuGet.targets + SubType Designer } } // Since we are moving some code and strings from NuGet.CommandLine to NuGet.Commands, we opted to go through normal localization process (build .resources.dll) and then add them to the ILMerged nuget.exe // This will also be called from CI build, after assemblies are localized, since our test infra takes nuget.exe before Localization - Target Name="ILMergeNuGetExe" \ - AfterTargets="Build" \ + Target Name=ILMergeNuGetExe \ + AfterTargets=Build \ Condition="'$(BuildingInsideVisualStudio)' != 'true' and '$(SkipILMergeOfNuGetExe)' != 'true'" \ { PropertyGroup { @@ -133,9 +133,9 @@ Project { ExpectedLocalizedArtifactCount 0 Condition="'$(ExpectedLocalizedArtifactCount)' == ''" } ItemGroup { - BuildArtifacts Include=r"$(OutputPath)\*.dll" Exclude="@(MergeExclude)" + BuildArtifacts Include=#"$(OutputPath)\*.dll"# Exclude="@(MergeExclude)" // NuGet.exe needs all NuGet.Commands.resources.dll merged in - LocalizedArtifacts Include=r"$(ArtifactsDirectory)\NuGet.Commands\**\$(NETFXTargetFramework)\**\*.resources.dll" + LocalizedArtifacts Include=#"$(ArtifactsDirectory)\NuGet.Commands\**\$(NETFXTargetFramework)\**\*.resources.dll"# } Error Text="Build dependencies are inconsistent with mergeinclude specified in ilmerge.props" \ Condition="'@(BuildArtifacts->Count())' != '@(MergeInclude->Count())'" @@ -143,36 +143,36 @@ Project { Condition="'@(LocalizedArtifacts->Count())' != '$(ExpectedLocalizedArtifactCount)'" PropertyGroup { PathToBuiltNuGetExe "$(OutputPath)NuGet.exe" - IlmergeCommand r"$(ILMergeExePath) /lib:$(OutputPath) /out:$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe @(MergeAllowDup -> '/allowdup:%(Identity)', ' ') /log:$(OutputPath)IlMergeLog.txt" + IlmergeCommand #"$(ILMergeExePath) /lib:$(OutputPath) /out:$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe @(MergeAllowDup -> '/allowdup:%(Identity)', ' ') /log:$(OutputPath)IlMergeLog.txt"# IlmergeCommand Condition="Exists($(MS_PFX_PATH))" "$(IlmergeCommand) /delaysign /keyfile:$(MS_PFX_PATH)" // LocalizedArtifacts need fullpath, since there will be duplicate file names IlmergeCommand "$(IlmergeCommand) $(PathToBuiltNuGetExe) @(BuildArtifacts->'%(filename)%(extension)', ' ') @(LocalizedArtifacts->'%(fullpath)', ' ')" } MakeDir Directories="$(ArtifactsDirectory)$(VsixOutputDirName)" - Exec Command="$(IlmergeCommand)" ContinueOnError="false" + Exec Command="$(IlmergeCommand)" ContinueOnError=#false } Import Project="$(BuildCommonDirectory)common.targets" Import Project="$(BuildCommonDirectory)embedinterop.targets" // Do nothing. This basically strips away the framework assemblies from the resulting nuspec. - Target Name="_GetFrameworkAssemblyReferences" DependsOnTargets="ResolveReferences" + Target Name=_GetFrameworkAssemblyReferences DependsOnTargets=ResolveReferences - Target Name="GetSigningInputs" Returns="@(DllsToSign)" { + Target Name=GetSigningInputs Returns="@(DllsToSign)" { ItemGroup { - DllsToSign Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe" { - StrongName "MsSharedLib72" - Authenticode "Microsoft400" + DllsToSign Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"# { + StrongName MsSharedLib72 + Authenticode Microsoft400 } } } - Target Name="GetSymbolsToIndex" Returns="@(SymbolsToIndex)" { + Target Name=GetSymbolsToIndex Returns="@(SymbolsToIndex)" { ItemGroup { - SymbolsToIndex Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe" - SymbolsToIndex Include=r"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb" + SymbolsToIndex Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.exe"# + SymbolsToIndex Include=#"$(ArtifactsDirectory)$(VsixOutputDirName)\NuGet.pdb"# } } - Import Project="Sdk.targets" Sdk="Microsoft.NET.Sdk" + Import Project=Sdk.targets Sdk=Microsoft.NET.Sdk } diff --git a/examples/website.kdl b/examples/website.kdl index a1df90a..b8faafe 100644 --- a/examples/website.kdl +++ b/examples/website.kdl @@ -1,20 +1,20 @@ -!doctype "html" -html lang="en" { +!doctype html +html lang=en { head { - meta charset="utf-8" - meta name="viewport" content="width=device-width, initial-scale=1.0" + meta charset=utf-8 + meta name=viewport content="width=device-width, initial-scale=1.0" meta \ - name="description" \ + name=description \ content="kdl is a document language, mostly based on SDLang, with xml-like semantics that looks like you're invoking a bunch of CLI commands!" title "kdl - Kat's Document Language" - link rel="stylesheet" href="/styles/global.css" + link rel=stylesheet href="/styles/global.css" } body { main { header class="py-10 bg-gray-300" { h1 class="text-4xl text-center" "kdl - Kat's Document Language" } - section class="kdl-section" id="description" { + section class=kdl-section id=description { p { - "kdl is a document language, mostly based on " a href="https://sdlang.org" "SDLang" @@ -22,7 +22,7 @@ html lang="en" { } p "It's meant to be used both as a serialization format and a configuration language, and is relatively light on syntax compared to XML." } - section class="kdl-section" id="design-and-discussion" { + section class=kdl-section id=design-and-discussion { h2 "Design and Discussion" p { - "kdl is still extremely new, and discussion about the format should happen over on the " @@ -32,11 +32,11 @@ html lang="en" { - " page in the Github repo. Feel free to jump in and give us your 2 cents!" } } - section class="kdl-section" id="design-principles" { + section class=kdl-section id=design-principles { h2 "Design Principles" ol { - li "Maintainability" - li "Flexibility" + li Maintainability + li Flexibility li "Cognitive simplicity and Learnability" li "Ease of de/serialization" li "Ease of implementation" From fada1fc1dd6243b55b78d5d33496f36069691ba1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:37:13 -0800 Subject: [PATCH 022/105] Update KQL text, too --- QUERY-SPEC.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index bf918e7..56a2449 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -44,8 +44,8 @@ Attribute matchers support certain binary operators: * `[val() = 1]`: Selects any element whose first value is 1. * `[prop(name) = 1]`: Selects any element with a property `name` whose value is 1. * `[name = 1]`: Equivalent to the above. -* `[name() = "hi"]`: Selects any element whose _node name_ is "hi". Equivalent to just `hi`, but more useful when using string operators. -* `[tag() = "hi"]`: Selects any element whose tag is "hi". Equivalent to just `(hi)`, but more useful when using string operators. +* `[name() = hi]`: Selects any element whose _node name_ is "hi". Equivalent to just `hi`, but more useful when using string operators. +* `[tag() = hi]`: Selects any element whose tag is "hi". Equivalent to just `(hi)`, but more useful when using string operators. * `[val() != 1]`: Selects any element whose first value exists, and is not 1. The following operators work with any `val()` or `prop()` values. @@ -60,9 +60,9 @@ never coerced to 1, and there is no "universal" ordering across all types.): The following operators work only with string `val()`, `prop()`, `tag()`, or `name()` values. If the value is not a string, the matcher will always fail: -* `[val() ^= "foo"]`: Selects any element whose first value starts with "foo". -* `[val() $= "foo"]`: Selects any element whose first value ends with "foo". -* `[val() *= "foo"]`: Selects any element whose first value contains "foo". +* `[val() ^= foo]`: Selects any element whose first value starts with "foo". +* `[val() $= foo]`: Selects any element whose first value ends with "foo". +* `[val() *= foo]`: Selects any element whose first value contains "foo". The following operators work only with `val()` or `prop()` values. If the value is not one of those, the matcher will always fail: @@ -75,13 +75,13 @@ Given this document: ```kdl package { - name "foo" + name foo version "1.0.0" - dependencies platform="windows" { + dependencies platform=windows { winapi "1.0.0" path="./crates/my-winapi-fork" } dependencies { - miette "2.0.0" dev=true integrity=(sri)"sha512-deadbeef" + miette "2.0.0" dev=#true integrity=(sri)sha512-deadbeef } } ``` @@ -113,7 +113,7 @@ selector-operator := ">>" | ">" | "++" | "+" filter := matcher+ matcher := "top()"| "()" | identifier | type | accessor-matcher accessor-matcher := "[" (comparison | accessor)? "]" -comparison := accessor q-ws* matcher-operator q-ws* (type | string | number | keyword) +comparison := accessor q-ws* matcher-operator q-ws* (type | identifier | string | number | keyword) accessor := "val(" number ")" | "prop(" identifier ")" | "name()" | "tag()" | "values()" | "props()" | identifier matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*=" From 63feef70fe615841ff5299aa89afb9145ac26afa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:39:21 -0800 Subject: [PATCH 023/105] Update schema spec --- SCHEMA-SPEC.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/SCHEMA-SPEC.md b/SCHEMA-SPEC.md index 084f002..56233b1 100644 --- a/SCHEMA-SPEC.md +++ b/SCHEMA-SPEC.md @@ -34,10 +34,10 @@ None. * [`node`](#node-node) - zero or more toplevel nodes for the KDL document this schema describes. * [`definitions`](#definitions-node) (optional): Definitions of nodes, values, props, and children block to reference in the toplevel nodes. * `node-names` (optional): [Validations](#validation-nodes) to apply to the _names_ of child nodes. -* `other-nodes-allowed` (optional): Whether to allow nodes other than the ones explicitly listed here. Defaults to `false`. +* `other-nodes-allowed` (optional): Whether to allow nodes other than the ones explicitly listed here. Defaults to `#false`. * [`tag`](#tag-node) - zero or more toplevel tags for nodes in the KDL document that this schema describes. * `tag-names` (optional): [Validations](#validation-nodes) to apply to the _names_ of tags of child nodes. -* `other-tags-allowed` (optional): Whether to allow node tags other than the ones explicitly listed here. Defaults to `false`. +* `other-tags-allowed` (optional): Whether to allow node tags other than the ones explicitly listed here. Defaults to `#false`. ### `info` node @@ -113,7 +113,7 @@ Links to the schema itself, and to sources about the schema. #### Properties -* `rel`: what the link is for (`"self"` or `"documentation"`) +* `rel`: what the link is for (`self` or `documentation`) * `lang` (optional): An IETF BCP 47 language tag ### `license` node From 31fd7bd00a9bc5fb3aa6cb9d3a22c2b1da72d4fb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:43:35 -0800 Subject: [PATCH 024/105] Update JiK and XiK too --- JSON-IN-KDL.md | 10 +++++----- XML-IN-KDL.md | 10 +++++----- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/JSON-IN-KDL.md b/JSON-IN-KDL.md index 5340cce..7ccf76b 100644 --- a/JSON-IN-KDL.md +++ b/JSON-IN-KDL.md @@ -13,7 +13,7 @@ JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of three types of ---- -Literal nodes are used to represent a JSON literal, which luckily KDL's literal syntax is a superset of. They contain a single value, the literal they're representing. For example, to represent the JSON literal `true`, you'd write `- true` in JiK. +Literal nodes are used to represent a JSON literal, which luckily KDL's literal syntax is a superset of. They contain a single value, the literal they're representing. For example, to represent the JSON literal `true`, you'd write `- #true` in JiK. (In many cases this isn't necessary, and KDL literals can be directly used instead. Literal nodes are necessary only for a top-level literal, or to intersperse literals with arrays or objects inside an array or object node.) @@ -26,7 +26,7 @@ This means that simple arrays of literals can be written compactly and simply; a ```kdl array { - 1 - array true false + array #true #false - 3 } ``` @@ -35,7 +35,7 @@ The two methods of writing children can be mixed, pulling the prefix of the arra ```kdl array 1 { - array true false + array #true #false - 3 } ``` @@ -44,14 +44,14 @@ array 1 { Object nodes are used to represent a JSON object. They can contain zero or more named properties, followed by zero or more child nodes; these are taken as the key/value pairs of the object, in order of appearance. -If the value of a key/value pair is a literal, it can be encoded as a named property on the object. For example, the JSON object `{"foo": 1, "bar": true}` could be written in JiK as `object foo=1 bar=true`. +If the value of a key/value pair is a literal, it can be encoded as a named property on the object. For example, the JSON object `{"foo": 1, "bar": true}` could be written in JiK as `object foo=1 bar=#true`. Alternately, key/value pairs can be encoded as child nodes, using a type annotation on the node name to encode the key, and the node itself as the value. The preceding example could instead have been written as: ```kdl object { (foo)- 1 - (bar)- true + (bar)- #true } ``` diff --git a/XML-IN-KDL.md b/XML-IN-KDL.md index 8cb64fa..32ce487 100644 --- a/XML-IN-KDL.md +++ b/XML-IN-KDL.md @@ -25,7 +25,7 @@ XML elements and KDL nodes have a direct correspondence. In XiK, an XML element * making the attributes into KDL properties * making the child nodes as KDL child nodes -For example, the XML `` is encoded into XiK as `element foo="bar" { child baz="qux" }`. +For example, the XML `` is encoded into XiK as `element foo=bar { child baz=quux }`. XML namespaces are encoded the same as XML: the node name simply contains a `:` character. Note that KDL identifier syntax allows `:` directly in an ident, so a name like `xml:space` or `xlink:href` is a valid node or property name. @@ -35,9 +35,9 @@ Raw text contents of an element can be encoded in two possible ways. If the element contains *only* text, it should be encoded as a final string unnamed argument. For example, the XML `here's a link` can be encoded as `a href="http://example.com" "here's a link"`. -If the element contains mixed text and element children, the text can be encoded as a KDL node with the name `-` with a single string unnamed argument. For example, the XML `some bold text` can be encoded as `span { - "some "; b "bold"; - " text" }`. +If the element contains mixed text and element children, the text can be encoded as a KDL node with the name `-` with a single string unnamed argument. For example, the XML `some bold text` can be encoded as `span { - "some "; b bold; - " text" }`. -An element that contains only text *is allowed to* encode it as `-` children. For example, `foo` *may* be encoded as `span { - "foo" }` instead of `span "foo"`. However, an element cannot mix the "final string attribute" with child nodes; `span "foo" { b "bar" }` is an **invalid** encoding of `foobar`. (It must be encoded as `span { - "foo"; b "bar" }`.) +An element that contains only text *is allowed to* encode it as `-` children. For example, `foo` *may* be encoded as `span { - foo }` instead of `span foo`. However, an element cannot mix the "final string attribute" with child nodes; `span foo { b bar }` is an **invalid** encoding of `foobar`. (It must be encoded as `span { - foo; b bar }`.) CDATA sections are not preserved in this encoding, as they are merely a source convenience so you don't have to escape a bunch of characters. They are encoded as normal textual contents would be. @@ -53,13 +53,13 @@ Processing instructions and XML declarations (nodes that look like ` The contents of a PI are technically completely unstructured. However, in practice most PIs' contents look like start-tag attributes. If this is the case, they should be encoded as properties on the node, with string values. For example, `` is encoded as `?xml version="1.0"`. -If the contents of a PI do *not* look like attributes, then instead the entire contents (from the end of the whitespace following the PI name, to the closing `?>` characters) are encoded as a single unnamed string value. For example, the preceding XML declaration *could* be alternately encoded as `?xml r#"version="1.0""#` (but shouldn't be). +If the contents of a PI do *not* look like attributes, then instead the entire contents (from the end of the whitespace following the PI name, to the closing `?>` characters) are encoded as a single unnamed string value. For example, the preceding XML declaration *could* be alternately encoded as `?xml #"version="1.0""#` (but shouldn't be). (Note that XML declarations are not needed when writing XiK directly; the version is always 1.0, and the encoding is always UTF-8 since it's KDL.) ---- -Doctypes (nodes that look like ``) are encoded similarly to unstructured Processing Instructions. They have a node name of `!doctype`, and the entire contents of the node, from the end of the whitespace following the "DOCTYPE" to the closing `>`, are encoded as a single unnamed string value. For example, the HTML doctype `` is encoded as `!doctype "html"`, while the XHTML 1 Strict doctype would be encoded as `!doctype r#"html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd""#` +Doctypes (nodes that look like ``) are encoded similarly to unstructured Processing Instructions. They have a node name of `!doctype`, and the entire contents of the node, from the end of the whitespace following the "DOCTYPE" to the closing `>`, are encoded as a single unnamed string value. For example, the HTML doctype `` is encoded as `!doctype html`, while the XHTML 1 Strict doctype would be encoded as `!doctype #"html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd""#` ---- From b42b6c80f0c9cb8c2c9dd380b3543b22f505ee80 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 21:57:57 -0800 Subject: [PATCH 025/105] Clarify that multiline comments are allowed after line continuations, per grammar Fixes: https://github.com/kdl-org/kdl/issues/322 --- SPEC.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/SPEC.md b/SPEC.md index 3457aaa..facb0b1 100644 --- a/SPEC.md +++ b/SPEC.md @@ -142,8 +142,9 @@ The following characters cannot be used anywhere in a bare Line continuations allow [Nodes](#node) to be spread across multiple lines. A line continuation is a `\` character followed by zero or more whitespace -characters and an optional single-line comment. It must be terminated by a -[Newline](#newline) (including the Newline that is part of single-line comments). +items (including multiline comments) and an optional single-line comment. It +must be terminated by a [Newline](#newline) (including the Newline that is +part of single-line comments). Following a line continuation, processing of a Node can continue as usual. From 5a7b339ed44f9464af04483547bbad0aa328ccf6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 22:10:26 -0800 Subject: [PATCH 026/105] Constrain code points to unicode scalar values Fixes: https://github.com/kdl-org/kdl/issues/207 --- CHANGELOG.md | 4 ++++ SPEC.md | 20 +++++++++++--------- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 98e1561..927a048 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -39,6 +39,10 @@ * A statement in the spec prose that said "It is reasonable for an implementation to ignore null values altogether when deserializing". This is no longer encouraged or desired. +* Code points have been constrained to [Unicode Scalar + Values](https://unicode.org/glossary/#unicode_scalar_value) only, including + values used in string escapes (`\u{}`). All KDL documents and string values + should be valid UTF-8 now, as was intended. ### KQL diff --git a/SPEC.md b/SPEC.md index facb0b1..c0f665f 100644 --- a/SPEC.md +++ b/SPEC.md @@ -92,13 +92,15 @@ foo 1 key="val" 3 { ### Identifier -A bare Identifier is composed of any Unicode codepoint other than [non-initial -characters](#non-initial-characters), followed by any number of Unicode code -points other than [non-identifier characters](#non-identifier-characters), so -long as this doesn't produce something confusable for a [Number](#number). For -example, both a [Number](#number) and an Identifier can start with `-`, but -when an Identifier starts with `-` the second character cannot be a digit. -This is precicely specified in the [Full Grammar](#full-grammar) below. +A bare Identifier is composed of any [Unicode Scalar +Value](https://unicode.org/glossary/#unicode_scalar_value) other than +[non-initial characters](#non-initial-characters), followed by any number of +Unicode Scalar Values other than [non-identifier +characters](#non-identifier-characters), so long as this doesn't produce +something confusable for a [Number](#number). For example, both a +[Number](#number) and an Identifier can start with `-`, but when an Identifier +starts with `-` the second character cannot be a digit. This is precicely +specified in the [Full Grammar](#full-grammar) below. When Identifiers are used as the values in [Arguments](#argument) and [Properties](#property), they are treated as strings, just like they are with @@ -342,7 +344,7 @@ interpreted as described in the following table: | Quotation Mark (Double Quote) | `\"` | `U+0022` | | Backspace | `\b` | `U+0008` | | Form Feed | `\f` | `U+000C` | -| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` | +| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, as long as it represents a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) | | Whitespace Escape | See below | N/A | ##### Escaped Whitespace @@ -504,7 +506,7 @@ They may be represented in Strings (but not Raw Strings) using `\u{}`. * Any codepoint with hexadecimal value `0x20` or below (various control characters). * `0x7F` (the Delete control character). -* Any codepoint with hexadecimal value higher than `0x10FFFF`. +* Any codepoint that is not a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value). * `0x2066-2069` and `0x202A-202E`, the [unicode "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) ## Full Grammar From c8488db13eeabe015af46830b0494226db8946f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 22:20:52 -0800 Subject: [PATCH 027/105] Make last semicolon optional for inline nodes Fixes: https://github.com/kdl-org/kdl/issues/341 --- CHANGELOG.md | 3 +++ SPEC.md | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 927a048..29904e6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -43,6 +43,9 @@ Values](https://unicode.org/glossary/#unicode_scalar_value) only, including values used in string escapes (`\u{}`). All KDL documents and string values should be valid UTF-8 now, as was intended. +* The last node in a child block no longer needs to be terminated with `;`, + even if the closing `}` is on the same line, so this is now a legal node: + `node {foo;bar;baz}` ### KQL diff --git a/SPEC.md b/SPEC.md index c0f665f..5067133 100644 --- a/SPEC.md +++ b/SPEC.md @@ -527,9 +527,11 @@ node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node required-node-space := node-space* plain-node-space+ optional-node-space := node-space* -node := type? identifier (required-node-space node-prop-or-arg)* (required-node-space node-children)? optional-node-space node-terminator +base-node := type? identifier (required-node-space node-prop-or-arg)* (required-node-space node-children)? +node := base-node optional-node-space node-terminator +final-node := base-node optional-node-space node-terminator? node-prop-or-arg := prop | value -node-children := '{' nodes '}' +node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier From 13799de32b4a44bf802e681d25f608526c18526d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 22:28:52 -0800 Subject: [PATCH 028/105] Allow whitespace in more places Fixes: https://github.com/kdl-org/kdl/issues/355 --- CHANGELOG.md | 7 +++++++ SPEC.md | 8 ++++---- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 29904e6..cfbf263 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -46,6 +46,13 @@ * The last node in a child block no longer needs to be terminated with `;`, even if the closing `}` is on the same line, so this is now a legal node: `node {foo;bar;baz}` +* More places allow whitespace (node-spaces, specifically) now. With great + power comes great responsibility: + * Inside `(foo)` annotations (so, `( foo )` would be legal (`( f oo )` would + not be, since it has two identifiers)) + * Between annotations and the thing they're annotating (`(blah) node (thing) + 1 y= (who) 2`) + * Around `=` for props (`x = 1`) ### KQL diff --git a/SPEC.md b/SPEC.md index 5067133..88332ac 100644 --- a/SPEC.md +++ b/SPEC.md @@ -527,7 +527,7 @@ node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node required-node-space := node-space* plain-node-space+ optional-node-space := node-space* -base-node := type? identifier (required-node-space node-prop-or-arg)* (required-node-space node-children)? +base-node := type? optional-node-space identifier (required-node-space node-prop-or-arg)* (required-node-space node-children)? node := base-node optional-node-space node-terminator final-node := base-node optional-node-space node-terminator? node-prop-or-arg := prop | value @@ -541,9 +541,9 @@ numberish-ident := sign ((identifier-char - digit) identifier-char*)? identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points keyword := '#' (boolean | 'null') -prop := identifier '=' value -value := type? (identifier | string | number | keyword) -type := '(' identifier ')' +prop := identifier optional-node-space '=' optional-node-space value +value := type? optional-node-space (identifier | string | number | keyword) +type := '(' optional-node-space identifier optional-node-space ')' string := raw-string | escaped-string escaped-string := '"' string-character* '"' From 49402ccb7b9ac8b0b17a7f293e86710c64b6d419 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 22:51:52 -0800 Subject: [PATCH 029/105] allow BOM only in the first unicode scalar in a document --- CHANGELOG.md | 2 ++ SPEC.md | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index cfbf263..a6eee12 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -53,6 +53,8 @@ * Between annotations and the thing they're annotating (`(blah) node (thing) 1 y= (who) 2`) * Around `=` for props (`x = 1`) +* The BOM is now only allowed as the first character in a document. It was + previously treated as generic whitespace. ### KQL diff --git a/SPEC.md b/SPEC.md index 88332ac..3b971fb 100644 --- a/SPEC.md +++ b/SPEC.md @@ -516,6 +516,8 @@ authoritative if something seems to disagree with the text above. The [grammar language syntax](#grammar-language) is defined below. ``` +document := bom? nodes + nodes := (line-space* node)* line-space* plain-line-space := newline | ws | single-line-comment @@ -572,7 +574,7 @@ escline := '\\' ws* (single-line-comment | newline | eof) newline := See Table (All line-break white_space) -ws := bom | unicode-space | multi-line-comment +ws := unicode-space | multi-line-comment bom := '\u{FEFF}' From fc1b59436abfe740490586e4f29bb16e7a42982b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 23:17:04 -0800 Subject: [PATCH 030/105] add support for dedented multi-line strings and raw strings --- CHANGELOG.md | 4 +++ SPEC.md | 80 ++++++++++++++++++++++++++++++++++++++++++++----- examples/ci.kdl | 5 ++++ 3 files changed, 82 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a6eee12..07f7256 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -55,6 +55,10 @@ * Around `=` for props (`x = 1`) * The BOM is now only allowed as the first character in a document. It was previously treated as generic whitespace. +* Multi-line strings are now automatically dedented, according to the + least-indented line in the body. Multiline strings and raw strings now must + have a newline immediately following their opening `"`, and a final newline + preceding the closing `"`. ### KQL diff --git a/SPEC.md b/SPEC.md index 3b971fb..518b236 100644 --- a/SPEC.md +++ b/SPEC.md @@ -319,9 +319,7 @@ node prop=(regex).* Strings in KDL represent textual [Values](#value). They are delimited by `"` on either side of any number of literal string characters except unescaped -`"` and `\`. This includes literal [Newline](#newline) characters, which means a -String Value can encompass multiple lines without behaving like a Newline for -[Node](#node) parsing purposes. +`"` and `\`. Strings _MUST_ be represented as UTF-8 values. @@ -329,6 +327,30 @@ Strings _MUST NOT_ include the code points for [disallowed literal code points](#disallowed-literal-code-points) directly. If needed, they can be specified with their corresponding `\u{}` escape. +#### Multi-line Strings + +Strings may span multiple lines with literal Newlines, in which case the +resulting String is "dedented" according to the line with the fewest number of +Whitespace characters preceding the first non-Whitespace character. That is, +the number of Whitespace characters in the least-indented line in the String +body is subtracted from the Whitespace of all other lines. + +Multi-line strings _MUST_ have a single [Newline](#newline) immediately +following their opening `"`, after which they may have any number of newlines. +Finally, there must be a Newline, followed by any number of Whitespace, before +the closing `"`. + +The first Newline, the last Newline, along with Whitespace following the last +Newline, are not included in the value of the String. The first and last +Newline can be the same character (that is, empty multi-line strings are +legal). + +Furthermore, any lines in the string body that only contain literal whitespace +are stripped to only contain the single Newline character. + +Strings with literal Newlines that do not immediately start with a Newline and +whose final `"` is not preceeded by whitespace and a Newline are illegal. + #### Escapes In addition to literal code points, a number of "escapes" are supported. @@ -366,8 +388,10 @@ For example, these strings are all semantically identical: "Hello\nWorld" -"Hello -World" +" + Hello + World +" ``` ##### Invalid escapes @@ -398,11 +422,49 @@ code-points](#disallowed-literal-code-points) as code points in their body. Unlike with Strings, these cannot simply be escaped, and are thus unrepresentable when using Raw Strings. +#### Multi-line Raw Strings + +Raw Strings may span multiple lines with literal newlines, in which case the +resulting string is "dedented" according to the line with the fewest number of +Whitespace characters preceding its first non-Whitespace character. That is, +the number of Whitespace characters in the least-indented line in the Raw +String body is subtracted from the Whitespace of all other lines. + +Multi-line strings _MUST_ have a single [Newline](#newline) immediately +following their opening `#"`, after which they may have any number of newlines. +Finally, there must be a Newline, followed by any number of Whitespace, before +the closing `"#`. + +The first Newline, the last Newline, along with Whitespace following the last +Newline, are not included in the value of the Raw String. The first and last +Newline can be the same character (that is, empty multi-line strings are +legal). + +Furthermore, any lines in the Raw String body that only contain literal +whitespace are stripped to only contain the single Newline character. + +Raw Strings with literal Newlines that do not immediately start with a Newline +and whose final `"#` is not preceeded by whitespace and a Newline are illegal. + #### Example ```kdl just-escapes #"\n will be literal"# quotes-and-escapes ##"hello\n\r\asd"#world"## + +multi-line #" + foo + This is the base indentation + bar + "# +``` + +The last example's string value will be: + +``` + foo +This is the base indentation + bar ``` ### Number @@ -548,13 +610,17 @@ value := type? optional-node-space (identifier | string | number | keyword) type := '(' optional-node-space identifier optional-node-space ')' string := raw-string | escaped-string -escaped-string := '"' string-character* '"' +escaped-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' +single-line-string-body := (string-character - newline)* +multi-line-string-body := string-character* string-character := '\' escape | [^\\"] - disallowed-literal-code-points escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] raw-string := '#' raw-string-quotes '#' | '#' raw-string '#' -raw-string-quotes := '"' (unicode - disallowed-literal-code-points) '"' +raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-string-body newline ws*) '"' +single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* +multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* number := decimal | hex | octal | binary diff --git a/examples/ci.kdl b/examples/ci.kdl index 3dccd83..aff2863 100644 --- a/examples/ci.kdl +++ b/examples/ci.kdl @@ -42,6 +42,11 @@ jobs { } step Clippy run="cargo clippy --all -- -D warnings" step "Run tests" run="cargo test --all --verbose" + step "Other Stuff" run=" + echo foo + echo bar + echo baz + " } } } From 8de7df6eaa7ec7121465b26478e033c2043098d8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 23:49:04 -0800 Subject: [PATCH 031/105] formatting --- SPEC.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/SPEC.md b/SPEC.md index 12d7a2d..d7d8b0d 100644 --- a/SPEC.md +++ b/SPEC.md @@ -93,9 +93,10 @@ foo 1 key="val" 3 { ### Identifier An Identifier is either a [Bare Identifier](#bare-identifier), which is an -unquoted string like `node` or `item`, a [String](#string), or a [Raw String](#raw-string). -There's no semantic difference between the kinds of identifier; this simply allows -for the use of quotes to have unusual identifiers that are inexpressible as bare identifiers. +unquoted string like `node` or `item`, a [String](#string), or a [Raw +String](#raw-string). There's no semantic difference between the kinds of +identifier; this simply allows for the use of quotes to have unusual +identifiers that are inexpressible as bare identifiers. ### Bare Identifier @@ -220,7 +221,7 @@ parent { child1; child2; } ### Value -A value is either: an [Identifier](#identifier), a [String](#string), a +A value is either: an [Identifier](#identifier), a [String](#string), a [Number](#number), a [Boolean](#boolean), or [Null](#null). Values _MUST_ be either [Arguments](#argument) or values of From a0d5030e3b44915ac95b7cb1f89e7e125fc65592 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 12 Dec 2023 23:49:26 -0800 Subject: [PATCH 032/105] Release 2.0 draft 1 --- SPEC.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index d7d8b0d..e51bd61 100644 --- a/SPEC.md +++ b/SPEC.md @@ -3,7 +3,8 @@ This is the semi-formal specification for KDL, including the intended data model and the grammar. -This document describes KDL version `1.0.0`. It was released on September 11, 2021. +This document describes KDL version `2.0.0-draft.1`. It was released on +2023-12-12. ## Introduction From 54df7f0cabd96abef58d071c3c4a7f76eb8b7391 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 13 Dec 2023 00:19:37 -0800 Subject: [PATCH 033/105] Update README --- README.md | 69 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 624313e..84a32d3 100644 --- a/README.md +++ b/README.md @@ -7,18 +7,18 @@ XML. It looks like this: ```kdl package { - name "my-pkg" + name my-pkg version "1.2.3" dependencies { // Nodes can have standalone values as well as // key/value pairs. - lodash "^3.2.1" optional=true alias="underscore" + lodash "^3.2.1" optional=#true alias=underscore } scripts { // "Raw" and multi-line strings are supported. - build r#" + build #" echo "foo" node -c "console.log('hello, world!');" echo "foo" > some-file.txt @@ -33,8 +33,8 @@ package { // "Slashdash" comments operate at the node level, // with just `/-`. /-this-is-commented { - this "entire" "node" { - "is" "gone" + this entire node { + is gone } } } @@ -51,7 +51,7 @@ Language](SCHEMA-SPEC.md) loosely based on JSON Schema. The language is based on [SDLang](https://sdlang.org), with a number of modifications and clarifications on its syntax and behavior. -The current version of the KDL spec is `1.0.0`. +The current version of the KDL spec is `2.0.0-draft.1`. [Play with it in your browser!](https://kdl-play.danini.dev/) @@ -116,7 +116,7 @@ bookmarks 12 15 188 1234 Nodes can have properties. ```kdl -author "Alex Monad" email="alex@example.com" active=true +author "Alex Monad" email=alex@example.com active=#true ``` And they can have nested child nodes, too! @@ -141,31 +141,38 @@ node1; node2; node3; KDL supports 4 data types: -* Strings: `"hello world"` +* Strings: `"hello world"` or just `foo` * Numbers: `123.45` -* Booleans: `true` and `false` -* Null: `null` +* Booleans: `#true` and `#false` +* Null: `#null` #### Strings -It supports two different formats for string input: escaped and raw. + +It supports three different formats for string input: identifiers, quoted, and raw. ```kdl -node "this\nhas\tescapes" -other r"C:\Users\zkat\" +node1 this-is-a-string +node2 "this\nhas\tescapes" +node3 #"C:\Users\zkat\raw\string"# ``` -Both types of string can be multiline as-is, without a different syntax: + +Both types of quoted string can be multiline as-is, without a different +syntax. Additionally, these multi-line strings will be "dedented" according to +the indentation of the least-indented line: ```kdl -string "my -multiline -value" +string " + my + multiline + value +" ``` -And for raw strings, you can add any number of # after the r and the last " to -disambiguate literal " characters: +Raw strings, you can add any number of `#`s before and after the opening and +closing `#` to disambiguate literal `#"` sequences: ```kdl -other-raw r#"hello"world"# +other-raw ##"hello"#world"## ``` #### Numbers @@ -209,7 +216,7 @@ comments can be nested. C style multiline */ -tag /*foo=true*/ bar=false +tag /*foo=#true*/ bar=#false /*/* hello @@ -221,13 +228,13 @@ comment out individual nodes, arguments, or children: ```kdl // This entire node and its children are all commented out. -/-mynode "foo" key=1 { +/-mynode foo key=1 { a b c } -mynode /-"commented" "not commented" /-key="value" /-{ +mynode /-commented "not commented" /-key=value /-{ a b } @@ -242,8 +249,8 @@ specific meanings. ```kdl numbers (u8)10 (i32)20 myfloat=(f32)1.5 { - strings (uuid)"123e4567-e89b-12d3-a456-426614174000" (date)"2021-02-03" filter=(regex)r"$\d+" - (author)person name="Alex" + strings (uuid)123e4567-e89b-12d3-a456-426614174000 (date)"2021-02-03" filter=(regex)#"$\d+"# + (author)person name=Alex } ``` @@ -256,21 +263,21 @@ title \ // Files must be utf8 encoded! -smile "😁" +smile 😁 // Instead of anonymous nodes, nodes and properties can be wrapped // in "" for arbitrary node names. -"!@#$@$%Q#$%~@!40" "1.2.3" "!!!!!"=true +"!@#$@$%\\/()[]Q#$%~@!40" "1.2.3" "#null"=#true -// The following is a legal bare identifier: -foo123~!@#$%^&*.:'|?+ "weeee" +// Identifiers are very flexible. The following is a legal bare identifier: +<@foo123~!$%^&*.:'|?+> // And you can also use unicode! -ノード お名前="☜(゚ヮ゚☜)" +ノード お名前=☜(゚ヮ゚☜) // kdl specifically allows properties and values to be // interspersed with each other, much like CLI commands. -foo bar=true "baz" quux=false 1 2 3 +foo bar=#true baz quux=#false 1 2 3 ``` ## Design Principles From 817a7dc0abee95de98f19810bf0f0a9f9cf0080c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 14 Dec 2023 19:12:24 -0800 Subject: [PATCH 034/105] fixes from review --- README.md | 2 +- SPEC.md | 22 +++++++++++++++------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 84a32d3..cbb8f03 100644 --- a/README.md +++ b/README.md @@ -249,7 +249,7 @@ specific meanings. ```kdl numbers (u8)10 (i32)20 myfloat=(f32)1.5 { - strings (uuid)123e4567-e89b-12d3-a456-426614174000 (date)"2021-02-03" filter=(regex)#"$\d+"# + strings (uuid)"123e4567-e89b-12d3-a456-426614174000" (date)"2021-02-03" filter=(regex)#"$\d+"# (author)person name=Alex } ``` diff --git a/SPEC.md b/SPEC.md index e51bd61..c460448 100644 --- a/SPEC.md +++ b/SPEC.md @@ -94,10 +94,9 @@ foo 1 key="val" 3 { ### Identifier An Identifier is either a [Bare Identifier](#bare-identifier), which is an -unquoted string like `node` or `item`, a [String](#string), or a [Raw -String](#raw-string). There's no semantic difference between the kinds of -identifier; this simply allows for the use of quotes to have unusual -identifiers that are inexpressible as bare identifiers. +unquoted string like `node` or `item`, a [String](#string), or a [Raw String](#raw-string). +There's no semantic difference between the kinds of identifier; this simply allows +for the use of quotes to have unusual identifiers that are inexpressible as bare identifiers. ### Bare Identifier @@ -335,7 +334,7 @@ specified with their corresponding `\u{}` escape. Strings may span multiple lines with literal Newlines, in which case the resulting String is "dedented" according to the line with the fewest number of Whitespace characters preceding the first non-Whitespace character. That is, -the number of Whitespace characters in the least-indented line in the String +the number of literal Whitespace characters in the least-indented line in the String body is subtracted from the Whitespace of all other lines. Multi-line strings _MUST_ have a single [Newline](#newline) immediately @@ -393,8 +392,8 @@ and all of that whitespace are discarded. For example, `"Hello World"` and `"Hello \ World"` are semantically identical. See [whitespace](#whitespace) and [newlines](#newlines) for how whitespace is defined. -Note that only literal whitespace is escaped; *escaped* whitespace is retained. -For example, these strings are all semantically identical: +Note that only literal whitespace is escaped; whitespace escapes (`\n` and +such) are retained. For example, these strings are all semantically identical: ```kdl "Hello\ \nWorld" @@ -437,8 +436,17 @@ unrepresentable when using Raw Strings. ```kdl just-escapes #"\n will be literal"# +``` + +The string contains the literal characters `\n will be literal`. + +```kdl quotes-and-escapes ##"hello\n\r\asd"#world"## +``` + +The string contains the literal characters `hello\n\r\asd"#world` +```kdl multi-line #" foo This is the base indentation From 9f061537c9c116e5511d451ba0b5fa68de036b26 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 14 Dec 2023 19:17:41 -0800 Subject: [PATCH 035/105] Add explicit attribution for logo --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index cbb8f03..f990669 100644 --- a/README.md +++ b/README.md @@ -404,3 +404,7 @@ microsyntax for losslessly encoding XML](XML-IN-KDL.md). This license applies to the text and assets _in this repository_. Implementations of this specification are not "derivative works", and thus are not bound by the restrictions of CC-BY-SA. + +The KDL logo design and files were generously contributed by Timothy Merritt +([@timmybytes](https://github.com/timmybytes)), and are also available under +the same license. From 56f399bf71f1cdc1c87611dab2dff6a4d604464b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 14 Dec 2023 19:25:12 -0800 Subject: [PATCH 036/105] Add \s to the list of escapes --- CHANGELOG.md | 2 ++ SPEC.md | 1 + 2 files changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 07f7256..8ef579f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,8 @@ ### Grammar * Solidus/Forward slash (`/`) is no longer an escaped character. +* Space (`U+0020`) can now be written into quoted strings with the `\s` + escape. * Single line comments (`//`) can now be immediately followed by a newline. * All literal whitespace following a `\` in a string is now discarded. * Vertical tabs (`U+000B`) are now considered to be whitespace. diff --git a/SPEC.md b/SPEC.md index c460448..dc8055e 100644 --- a/SPEC.md +++ b/SPEC.md @@ -381,6 +381,7 @@ interpreted as described in the following table: | Quotation Mark (Double Quote) | `\"` | `U+0022` | | Backspace | `\b` | `U+0008` | | Form Feed | `\f` | `U+000C` | +| Space | `\s` | `U+0020` | | Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, as long as it represents a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) | | Whitespace Escape | See below | N/A | From b51859edf36237068ecc0d8ac7e409b4a0d344f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sat, 16 Dec 2023 14:44:32 -0800 Subject: [PATCH 037/105] update tests Fixes: https://github.com/kdl-org/kdl/issues/359 --- CHANGELOG.md | 3 --- tests/test_cases/expected_kdl/all_escapes.kdl | 2 +- tests/test_cases/expected_kdl/all_node_fields.kdl | 2 +- tests/test_cases/expected_kdl/arg_and_prop_same_name.kdl | 2 +- .../{input/bare_arg.kdl => expected_kdl/arg_bare.kdl} | 0 tests/test_cases/expected_kdl/arg_false_type.kdl | 2 +- tests/test_cases/expected_kdl/arg_null_type.kdl | 2 +- tests/test_cases/expected_kdl/arg_raw_string_type.kdl | 2 +- tests/test_cases/expected_kdl/arg_string_type.kdl | 2 +- tests/test_cases/expected_kdl/arg_true_type.kdl | 2 +- tests/test_cases/expected_kdl/arg_type.kdl | 2 +- tests/test_cases/expected_kdl/bare_emoji.kdl | 2 +- tests/test_cases/expected_kdl/blank_prop_type.kdl | 2 +- tests/test_cases/expected_kdl/block_comment.kdl | 2 +- tests/test_cases/expected_kdl/block_comment_after_node.kdl | 2 +- tests/test_cases/expected_kdl/bom_initial.kdl | 1 + tests/test_cases/expected_kdl/boolean_arg.kdl | 2 +- tests/test_cases/expected_kdl/boolean_prop.kdl | 2 +- tests/test_cases/expected_kdl/chevrons_in_bare_id.kdl | 1 + tests/test_cases/expected_kdl/comma_in_bare_id.kdl | 1 + tests/test_cases/expected_kdl/comment_and_newline.kdl | 2 ++ tests/test_cases/expected_kdl/commented_arg.kdl | 2 +- tests/test_cases/expected_kdl/commented_child.kdl | 2 +- tests/test_cases/expected_kdl/commented_prop.kdl | 2 +- tests/test_cases/expected_kdl/dash_dash.kdl | 1 + tests/test_cases/expected_kdl/emoji.kdl | 2 +- tests/test_cases/expected_kdl/empty_quoted_node_id.kdl | 2 +- tests/test_cases/expected_kdl/escline_line_comment.kdl | 2 +- tests/test_cases/expected_kdl/multiline_comment.kdl | 2 +- tests/test_cases/expected_kdl/multiline_nodes.kdl | 2 +- tests/test_cases/expected_kdl/multiline_raw_string.kdl | 1 + tests/test_cases/expected_kdl/multiline_string.kdl | 2 +- tests/test_cases/expected_kdl/nested_block_comment.kdl | 2 +- tests/test_cases/expected_kdl/nested_comments.kdl | 2 +- .../expected_kdl/nested_multiline_block_comment.kdl | 2 +- tests/test_cases/expected_kdl/newlines_in_block_comment.kdl | 2 +- tests/test_cases/expected_kdl/node_false.kdl | 2 +- tests/test_cases/expected_kdl/node_true.kdl | 2 +- tests/test_cases/expected_kdl/null_prop.kdl | 2 +- tests/test_cases/expected_kdl/optional_child_semicolon.kdl | 5 +++++ tests/test_cases/expected_kdl/parse_all_arg_types.kdl | 2 +- tests/test_cases/expected_kdl/prop_false_type.kdl | 2 +- tests/test_cases/expected_kdl/prop_identifier_type.kdl | 2 ++ tests/test_cases/expected_kdl/prop_null_type.kdl | 2 +- tests/test_cases/expected_kdl/prop_raw_string_type.kdl | 2 +- tests/test_cases/expected_kdl/prop_string_type.kdl | 2 +- tests/test_cases/expected_kdl/prop_true_type.kdl | 2 +- tests/test_cases/expected_kdl/prop_type.kdl | 2 +- .../test_cases/expected_kdl/question_mark_before_number.kdl | 1 + tests/test_cases/expected_kdl/quoted_prop_name.kdl | 2 +- tests/test_cases/expected_kdl/quoted_prop_type.kdl | 2 +- tests/test_cases/expected_kdl/r_node.kdl | 2 +- tests/test_cases/expected_kdl/raw_arg_type.kdl | 2 +- tests/test_cases/expected_kdl/raw_prop_type.kdl | 2 +- tests/test_cases/expected_kdl/raw_string_arg.kdl | 5 ++--- tests/test_cases/expected_kdl/raw_string_prop.kdl | 5 ++--- tests/test_cases/expected_kdl/repeated_arg.kdl | 2 +- tests/test_cases/expected_kdl/same_args.kdl | 1 - tests/test_cases/expected_kdl/single_arg.kdl | 2 +- tests/test_cases/expected_kdl/single_prop.kdl | 2 +- .../expected_kdl/slashdash_arg_after_newline_esc.kdl | 2 +- tests/test_cases/expected_kdl/slashdash_prop.kdl | 2 +- tests/test_cases/expected_kdl/slashdash_repeated_prop.kdl | 2 +- tests/test_cases/expected_kdl/space_after_arg_type.kdl | 1 + tests/test_cases/expected_kdl/space_after_node_type.kdl | 1 + tests/test_cases/expected_kdl/space_after_prop_type.kdl | 1 + tests/test_cases/expected_kdl/space_around_prop_marker.kdl | 1 + tests/test_cases/expected_kdl/space_in_arg_type.kdl | 1 + tests/test_cases/expected_kdl/space_in_node_type.kdl | 1 + tests/test_cases/expected_kdl/space_in_prop_type.kdl | 1 + tests/test_cases/expected_kdl/string_arg.kdl | 2 +- .../expected_kdl/string_escaped_literal_whitespace.kdl | 1 + tests/test_cases/expected_kdl/string_prop.kdl | 2 +- tests/test_cases/expected_kdl/underscore_before_number.kdl | 1 + .../expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl | 2 +- tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl | 2 +- tests/test_cases/expected_kdl/vertical_tab_whitespace.kdl | 1 + tests/test_cases/input/all_escapes.kdl | 2 +- tests/test_cases/input/all_node_fields.kdl | 6 +++--- tests/test_cases/input/arg_and_prop_same_name.kdl | 2 +- tests/test_cases/input/arg_bare.kdl | 1 + tests/test_cases/input/arg_false_type.kdl | 2 +- tests/test_cases/input/arg_null_type.kdl | 2 +- tests/test_cases/input/arg_raw_string_type.kdl | 2 +- tests/test_cases/input/arg_string_type.kdl | 2 +- tests/test_cases/input/arg_true_type.kdl | 2 +- tests/test_cases/input/arg_type.kdl | 2 +- tests/test_cases/input/backslash_in_bare_id.kdl | 1 - tests/test_cases/input/bare_emoji.kdl | 2 +- tests/test_cases/input/blank_prop_type.kdl | 2 +- tests/test_cases/input/block_comment.kdl | 2 +- tests/test_cases/input/block_comment_after_node.kdl | 2 +- tests/test_cases/input/bom_initial.kdl | 1 + tests/test_cases/input/bom_later.kdl | 1 + tests/test_cases/input/boolean_arg.kdl | 2 +- tests/test_cases/input/boolean_prop.kdl | 2 +- tests/test_cases/input/brackets_in_bare_id.kdl | 2 +- tests/test_cases/input/chevrons_in_bare_id.kdl | 2 +- tests/test_cases/input/comma_in_bare_id.kdl | 2 +- tests/test_cases/input/comment_and_newline.kdl | 2 ++ tests/test_cases/input/commented_arg.kdl | 2 +- tests/test_cases/input/commented_child.kdl | 4 ++-- tests/test_cases/input/commented_prop.kdl | 2 +- tests/test_cases/input/crlf_between_nodes.kdl | 4 ++-- tests/test_cases/input/emoji.kdl | 2 +- tests/test_cases/input/empty_prop_type.kdl | 2 +- tests/test_cases/input/empty_quoted_node_id.kdl | 2 +- tests/test_cases/input/empty_quoted_prop_key.kdl | 2 +- tests/test_cases/input/eof_after_escape.kdl | 1 + tests/test_cases/input/err_backslash_in_bare_id.kdl | 1 + tests/test_cases/input/escline.kdl | 2 +- tests/test_cases/input/escline_comment_node.kdl | 3 --- tests/test_cases/input/escline_line_comment.kdl | 5 ++--- tests/test_cases/input/hash_in_id.kdl | 1 + tests/test_cases/input/just_space_in_prop_type.kdl | 2 +- tests/test_cases/input/multiline_comment.kdl | 2 +- tests/test_cases/input/multiline_nodes.kdl | 4 ++-- tests/test_cases/input/multiline_raw_string.kdl | 5 +++++ tests/test_cases/input/multiline_string.kdl | 5 +++-- tests/test_cases/input/nested_block_comment.kdl | 2 +- tests/test_cases/input/nested_comments.kdl | 2 +- tests/test_cases/input/nested_multiline_block_comment.kdl | 3 +-- tests/test_cases/input/newlines_in_block_comment.kdl | 2 +- tests/test_cases/input/node_false.kdl | 2 +- tests/test_cases/input/node_true.kdl | 2 +- tests/test_cases/input/null_arg.kdl | 2 +- tests/test_cases/input/null_prop.kdl | 2 +- tests/test_cases/input/only_line_comment_crlf.kdl | 2 +- tests/test_cases/input/optional_child_semicolon.kdl | 1 + tests/test_cases/input/parens_in_bare_id.kdl | 2 +- tests/test_cases/input/parse_all_arg_types.kdl | 2 +- tests/test_cases/input/prop_false_type.kdl | 2 +- tests/test_cases/input/prop_identifier_type.kdl | 2 ++ tests/test_cases/input/prop_null_type.kdl | 2 +- tests/test_cases/input/prop_raw_string_type.kdl | 2 +- tests/test_cases/input/prop_true_type.kdl | 2 +- tests/test_cases/input/prop_type.kdl | 2 +- tests/test_cases/input/question_mark_at_start_of_int.kdl | 1 - tests/test_cases/input/quote_in_bare_id.kdl | 2 +- tests/test_cases/input/quoted_prop_name.kdl | 2 +- tests/test_cases/input/quoted_prop_type.kdl | 2 +- tests/test_cases/input/raw_arg_type.kdl | 2 +- tests/test_cases/input/raw_node_name.kdl | 2 +- tests/test_cases/input/raw_prop_type.kdl | 2 +- tests/test_cases/input/raw_string_arg.kdl | 5 ++--- tests/test_cases/input/raw_string_backslash.kdl | 2 +- tests/test_cases/input/raw_string_hash_no_esc.kdl | 2 +- tests/test_cases/input/raw_string_just_backslash.kdl | 2 +- tests/test_cases/input/raw_string_just_quote.kdl | 2 +- tests/test_cases/input/raw_string_multiple_hash.kdl | 2 +- tests/test_cases/input/raw_string_newline.kdl | 4 ++-- tests/test_cases/input/raw_string_prop.kdl | 5 ++--- tests/test_cases/input/raw_string_quote.kdl | 2 +- tests/test_cases/input/repeated_arg.kdl | 2 +- tests/test_cases/input/same_args.kdl | 1 - tests/test_cases/input/single_arg.kdl | 2 +- tests/test_cases/input/single_prop.kdl | 2 +- tests/test_cases/input/slash_in_bare_id.kdl | 2 +- tests/test_cases/input/slashdash_arg_after_newline_esc.kdl | 2 +- tests/test_cases/input/slashdash_arg_before_newline_esc.kdl | 2 +- tests/test_cases/input/slashdash_full_node.kdl | 4 ++-- tests/test_cases/input/slashdash_prop.kdl | 2 +- tests/test_cases/input/slashdash_raw_prop_key.kdl | 2 +- tests/test_cases/input/slashdash_repeated_prop.kdl | 2 +- tests/test_cases/input/space_after_prop_type.kdl | 2 +- tests/test_cases/input/space_around_prop_marker.kdl | 1 + tests/test_cases/input/space_in_arg_type.kdl | 2 +- tests/test_cases/input/space_in_prop_type.kdl | 2 +- tests/test_cases/input/square_bracket_in_bare_id.kdl | 2 +- .../test_cases/input/string_escaped_literal_whitespace.kdl | 2 ++ tests/test_cases/input/trailing_crlf.kdl | 2 +- tests/test_cases/input/unbalanced_raw_hashes.kdl | 2 +- tests/test_cases/input/underscore_at_start_of_int.kdl | 1 - tests/test_cases/input/unicode_delete.kdl | 2 ++ tests/test_cases/input/unicode_fsi.kdl | 2 ++ tests/test_cases/input/unicode_lre.kdl | 2 ++ tests/test_cases/input/unicode_lri.kdl | 2 ++ tests/test_cases/input/unicode_lrm.kdl | 2 ++ tests/test_cases/input/unicode_lro.kdl | 2 ++ tests/test_cases/input/unicode_pdf.kdl | 2 ++ tests/test_cases/input/unicode_pdi.kdl | 2 ++ tests/test_cases/input/unicode_rle.kdl | 2 ++ tests/test_cases/input/unicode_rli.kdl | 2 ++ tests/test_cases/input/unicode_rlm.kdl | 2 ++ tests/test_cases/input/unicode_rlo.kdl | 2 ++ tests/test_cases/input/unicode_scalar_high.kdl | 2 ++ tests/test_cases/input/unicode_scalar_low.kdl | 2 ++ tests/test_cases/input/unicode_under_0x20.kdl | 2 ++ .../test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl | 2 +- tests/test_cases/input/unusual_chars_in_bare_id.kdl | 2 +- tests/test_cases/input/vertical_tab_whitespace.kdl | 1 + 191 files changed, 225 insertions(+), 166 deletions(-) rename tests/test_cases/{input/bare_arg.kdl => expected_kdl/arg_bare.kdl} (100%) create mode 100644 tests/test_cases/expected_kdl/bom_initial.kdl create mode 100644 tests/test_cases/expected_kdl/chevrons_in_bare_id.kdl create mode 100644 tests/test_cases/expected_kdl/comma_in_bare_id.kdl create mode 100644 tests/test_cases/expected_kdl/comment_and_newline.kdl create mode 100644 tests/test_cases/expected_kdl/dash_dash.kdl create mode 100644 tests/test_cases/expected_kdl/multiline_raw_string.kdl create mode 100644 tests/test_cases/expected_kdl/optional_child_semicolon.kdl create mode 100644 tests/test_cases/expected_kdl/prop_identifier_type.kdl create mode 100644 tests/test_cases/expected_kdl/question_mark_before_number.kdl delete mode 100644 tests/test_cases/expected_kdl/same_args.kdl create mode 100644 tests/test_cases/expected_kdl/space_after_arg_type.kdl create mode 100644 tests/test_cases/expected_kdl/space_after_node_type.kdl create mode 100644 tests/test_cases/expected_kdl/space_after_prop_type.kdl create mode 100644 tests/test_cases/expected_kdl/space_around_prop_marker.kdl create mode 100644 tests/test_cases/expected_kdl/space_in_arg_type.kdl create mode 100644 tests/test_cases/expected_kdl/space_in_node_type.kdl create mode 100644 tests/test_cases/expected_kdl/space_in_prop_type.kdl create mode 100644 tests/test_cases/expected_kdl/string_escaped_literal_whitespace.kdl create mode 100644 tests/test_cases/expected_kdl/underscore_before_number.kdl create mode 100644 tests/test_cases/expected_kdl/vertical_tab_whitespace.kdl create mode 100644 tests/test_cases/input/arg_bare.kdl delete mode 100644 tests/test_cases/input/backslash_in_bare_id.kdl create mode 100644 tests/test_cases/input/bom_initial.kdl create mode 100644 tests/test_cases/input/bom_later.kdl create mode 100644 tests/test_cases/input/comment_and_newline.kdl create mode 100644 tests/test_cases/input/eof_after_escape.kdl create mode 100644 tests/test_cases/input/err_backslash_in_bare_id.kdl delete mode 100644 tests/test_cases/input/escline_comment_node.kdl create mode 100644 tests/test_cases/input/hash_in_id.kdl create mode 100644 tests/test_cases/input/multiline_raw_string.kdl create mode 100644 tests/test_cases/input/optional_child_semicolon.kdl create mode 100644 tests/test_cases/input/prop_identifier_type.kdl delete mode 100644 tests/test_cases/input/question_mark_at_start_of_int.kdl delete mode 100644 tests/test_cases/input/same_args.kdl create mode 100644 tests/test_cases/input/space_around_prop_marker.kdl create mode 100644 tests/test_cases/input/string_escaped_literal_whitespace.kdl delete mode 100644 tests/test_cases/input/underscore_at_start_of_int.kdl create mode 100644 tests/test_cases/input/unicode_delete.kdl create mode 100644 tests/test_cases/input/unicode_fsi.kdl create mode 100644 tests/test_cases/input/unicode_lre.kdl create mode 100644 tests/test_cases/input/unicode_lri.kdl create mode 100644 tests/test_cases/input/unicode_lrm.kdl create mode 100644 tests/test_cases/input/unicode_lro.kdl create mode 100644 tests/test_cases/input/unicode_pdf.kdl create mode 100644 tests/test_cases/input/unicode_pdi.kdl create mode 100644 tests/test_cases/input/unicode_rle.kdl create mode 100644 tests/test_cases/input/unicode_rli.kdl create mode 100644 tests/test_cases/input/unicode_rlm.kdl create mode 100644 tests/test_cases/input/unicode_rlo.kdl create mode 100644 tests/test_cases/input/unicode_scalar_high.kdl create mode 100644 tests/test_cases/input/unicode_scalar_low.kdl create mode 100644 tests/test_cases/input/unicode_under_0x20.kdl create mode 100644 tests/test_cases/input/vertical_tab_whitespace.kdl diff --git a/CHANGELOG.md b/CHANGELOG.md index 8ef579f..211e461 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,9 +10,6 @@ * Single line comments (`//`) can now be immediately followed by a newline. * All literal whitespace following a `\` in a string is now discarded. * Vertical tabs (`U+000B`) are now considered to be whitespace. -* Identifiers can't start with `r#`, so they're easy to distinguish from raw - strings. (They already similarly can't start with a digit, or a sign+digit, - so they're easy to distinguish from numbers.) * The grammar syntax itself has been described, and some confusing definitions in the grammar have been fixed accordingly (mostly related to escaped characters). diff --git a/tests/test_cases/expected_kdl/all_escapes.kdl b/tests/test_cases/expected_kdl/all_escapes.kdl index 024cda2..5c49748 100644 --- a/tests/test_cases/expected_kdl/all_escapes.kdl +++ b/tests/test_cases/expected_kdl/all_escapes.kdl @@ -1 +1 @@ -node "\"\\\b\f\n\r\t" +node "\"\\\b\f\n\r\t\s" diff --git a/tests/test_cases/expected_kdl/all_node_fields.kdl b/tests/test_cases/expected_kdl/all_node_fields.kdl index fc8a9e4..9f4ceb5 100644 --- a/tests/test_cases/expected_kdl/all_node_fields.kdl +++ b/tests/test_cases/expected_kdl/all_node_fields.kdl @@ -1,3 +1,3 @@ -node "arg" prop="val" { +node arg prop=val { inner_node } diff --git a/tests/test_cases/expected_kdl/arg_and_prop_same_name.kdl b/tests/test_cases/expected_kdl/arg_and_prop_same_name.kdl index 27d9739..ee5ace5 100644 --- a/tests/test_cases/expected_kdl/arg_and_prop_same_name.kdl +++ b/tests/test_cases/expected_kdl/arg_and_prop_same_name.kdl @@ -1 +1 @@ -node "arg" arg="val" +node arg arg=val diff --git a/tests/test_cases/input/bare_arg.kdl b/tests/test_cases/expected_kdl/arg_bare.kdl similarity index 100% rename from tests/test_cases/input/bare_arg.kdl rename to tests/test_cases/expected_kdl/arg_bare.kdl diff --git a/tests/test_cases/expected_kdl/arg_false_type.kdl b/tests/test_cases/expected_kdl/arg_false_type.kdl index 895945d..92003d9 100644 --- a/tests/test_cases/expected_kdl/arg_false_type.kdl +++ b/tests/test_cases/expected_kdl/arg_false_type.kdl @@ -1 +1 @@ -node (type)false +node (type)#false diff --git a/tests/test_cases/expected_kdl/arg_null_type.kdl b/tests/test_cases/expected_kdl/arg_null_type.kdl index 476c5cd..cd66101 100644 --- a/tests/test_cases/expected_kdl/arg_null_type.kdl +++ b/tests/test_cases/expected_kdl/arg_null_type.kdl @@ -1 +1 @@ -node (type)null +node (type)#null diff --git a/tests/test_cases/expected_kdl/arg_raw_string_type.kdl b/tests/test_cases/expected_kdl/arg_raw_string_type.kdl index 2808d53..a4859b6 100644 --- a/tests/test_cases/expected_kdl/arg_raw_string_type.kdl +++ b/tests/test_cases/expected_kdl/arg_raw_string_type.kdl @@ -1 +1 @@ -node (type)"str" +node (type)str diff --git a/tests/test_cases/expected_kdl/arg_string_type.kdl b/tests/test_cases/expected_kdl/arg_string_type.kdl index 2808d53..a4859b6 100644 --- a/tests/test_cases/expected_kdl/arg_string_type.kdl +++ b/tests/test_cases/expected_kdl/arg_string_type.kdl @@ -1 +1 @@ -node (type)"str" +node (type)str diff --git a/tests/test_cases/expected_kdl/arg_true_type.kdl b/tests/test_cases/expected_kdl/arg_true_type.kdl index 6d1f9bc..20243a3 100644 --- a/tests/test_cases/expected_kdl/arg_true_type.kdl +++ b/tests/test_cases/expected_kdl/arg_true_type.kdl @@ -1 +1 @@ -node (type)true +node (type)#true diff --git a/tests/test_cases/expected_kdl/arg_type.kdl b/tests/test_cases/expected_kdl/arg_type.kdl index a0b84cf..79a093d 100644 --- a/tests/test_cases/expected_kdl/arg_type.kdl +++ b/tests/test_cases/expected_kdl/arg_type.kdl @@ -1 +1 @@ -node (type)"arg" +node (type)arg diff --git a/tests/test_cases/expected_kdl/bare_emoji.kdl b/tests/test_cases/expected_kdl/bare_emoji.kdl index 60707c8..c67d0b9 100644 --- a/tests/test_cases/expected_kdl/bare_emoji.kdl +++ b/tests/test_cases/expected_kdl/bare_emoji.kdl @@ -1 +1 @@ -😁 "happy!" +😁 happy! diff --git a/tests/test_cases/expected_kdl/blank_prop_type.kdl b/tests/test_cases/expected_kdl/blank_prop_type.kdl index c7b0e31..e00c6d2 100644 --- a/tests/test_cases/expected_kdl/blank_prop_type.kdl +++ b/tests/test_cases/expected_kdl/blank_prop_type.kdl @@ -1 +1 @@ -node key=("")true +node key=("")#true diff --git a/tests/test_cases/expected_kdl/block_comment.kdl b/tests/test_cases/expected_kdl/block_comment.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/block_comment.kdl +++ b/tests/test_cases/expected_kdl/block_comment.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/block_comment_after_node.kdl b/tests/test_cases/expected_kdl/block_comment_after_node.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/block_comment_after_node.kdl +++ b/tests/test_cases/expected_kdl/block_comment_after_node.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/bom_initial.kdl b/tests/test_cases/expected_kdl/bom_initial.kdl new file mode 100644 index 0000000..1b3db2c --- /dev/null +++ b/tests/test_cases/expected_kdl/bom_initial.kdl @@ -0,0 +1 @@ +node arg diff --git a/tests/test_cases/expected_kdl/boolean_arg.kdl b/tests/test_cases/expected_kdl/boolean_arg.kdl index 9c7928e..e0cdf1a 100644 --- a/tests/test_cases/expected_kdl/boolean_arg.kdl +++ b/tests/test_cases/expected_kdl/boolean_arg.kdl @@ -1 +1 @@ -node false true +node #false #true diff --git a/tests/test_cases/expected_kdl/boolean_prop.kdl b/tests/test_cases/expected_kdl/boolean_prop.kdl index 712b60b..f89da9b 100644 --- a/tests/test_cases/expected_kdl/boolean_prop.kdl +++ b/tests/test_cases/expected_kdl/boolean_prop.kdl @@ -1 +1 @@ -node prop1=true prop2=false +node prop1=#true prop2=#false diff --git a/tests/test_cases/expected_kdl/chevrons_in_bare_id.kdl b/tests/test_cases/expected_kdl/chevrons_in_bare_id.kdl new file mode 100644 index 0000000..58b2436 --- /dev/null +++ b/tests/test_cases/expected_kdl/chevrons_in_bare_id.kdl @@ -0,0 +1 @@ +foo123foo weeee diff --git a/tests/test_cases/expected_kdl/comma_in_bare_id.kdl b/tests/test_cases/expected_kdl/comma_in_bare_id.kdl new file mode 100644 index 0000000..86c78fd --- /dev/null +++ b/tests/test_cases/expected_kdl/comma_in_bare_id.kdl @@ -0,0 +1 @@ +foo123,bar weeee diff --git a/tests/test_cases/expected_kdl/comment_and_newline.kdl b/tests/test_cases/expected_kdl/comment_and_newline.kdl new file mode 100644 index 0000000..1c5b5f3 --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_and_newline.kdl @@ -0,0 +1,2 @@ +node1 +node2 diff --git a/tests/test_cases/expected_kdl/commented_arg.kdl b/tests/test_cases/expected_kdl/commented_arg.kdl index 226fd56..2e98005 100644 --- a/tests/test_cases/expected_kdl/commented_arg.kdl +++ b/tests/test_cases/expected_kdl/commented_arg.kdl @@ -1 +1 @@ -node "arg2" +node arg2 diff --git a/tests/test_cases/expected_kdl/commented_child.kdl b/tests/test_cases/expected_kdl/commented_child.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/commented_child.kdl +++ b/tests/test_cases/expected_kdl/commented_child.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/commented_prop.kdl b/tests/test_cases/expected_kdl/commented_prop.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/commented_prop.kdl +++ b/tests/test_cases/expected_kdl/commented_prop.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/dash_dash.kdl b/tests/test_cases/expected_kdl/dash_dash.kdl new file mode 100644 index 0000000..759ddc5 --- /dev/null +++ b/tests/test_cases/expected_kdl/dash_dash.kdl @@ -0,0 +1 @@ +node -- \ No newline at end of file diff --git a/tests/test_cases/expected_kdl/emoji.kdl b/tests/test_cases/expected_kdl/emoji.kdl index 3ed56e2..88df78a 100644 --- a/tests/test_cases/expected_kdl/emoji.kdl +++ b/tests/test_cases/expected_kdl/emoji.kdl @@ -1 +1 @@ -node "😀" +node 😀 diff --git a/tests/test_cases/expected_kdl/empty_quoted_node_id.kdl b/tests/test_cases/expected_kdl/empty_quoted_node_id.kdl index ebfa893..94694bc 100644 --- a/tests/test_cases/expected_kdl/empty_quoted_node_id.kdl +++ b/tests/test_cases/expected_kdl/empty_quoted_node_id.kdl @@ -1 +1 @@ -"" "arg" +"" arg diff --git a/tests/test_cases/expected_kdl/escline_line_comment.kdl b/tests/test_cases/expected_kdl/escline_line_comment.kdl index 8a5dc33..4d38bee 100644 --- a/tests/test_cases/expected_kdl/escline_line_comment.kdl +++ b/tests/test_cases/expected_kdl/escline_line_comment.kdl @@ -1 +1 @@ -node "arg" "arg2\n" +node arg arg2 diff --git a/tests/test_cases/expected_kdl/multiline_comment.kdl b/tests/test_cases/expected_kdl/multiline_comment.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/multiline_comment.kdl +++ b/tests/test_cases/expected_kdl/multiline_comment.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/multiline_nodes.kdl b/tests/test_cases/expected_kdl/multiline_nodes.kdl index bec6d05..7c27fb0 100644 --- a/tests/test_cases/expected_kdl/multiline_nodes.kdl +++ b/tests/test_cases/expected_kdl/multiline_nodes.kdl @@ -1 +1 @@ -node "arg1" "arg2" +node arg1 arg2 diff --git a/tests/test_cases/expected_kdl/multiline_raw_string.kdl b/tests/test_cases/expected_kdl/multiline_raw_string.kdl new file mode 100644 index 0000000..2bafe90 --- /dev/null +++ b/tests/test_cases/expected_kdl/multiline_raw_string.kdl @@ -0,0 +1 @@ +node "\nhey\neveryone\nhow goes?\n" diff --git a/tests/test_cases/expected_kdl/multiline_string.kdl b/tests/test_cases/expected_kdl/multiline_string.kdl index 021493e..2bafe90 100644 --- a/tests/test_cases/expected_kdl/multiline_string.kdl +++ b/tests/test_cases/expected_kdl/multiline_string.kdl @@ -1 +1 @@ -node " hey\neveryone\nhow goes?\n" +node "\nhey\neveryone\nhow goes?\n" diff --git a/tests/test_cases/expected_kdl/nested_block_comment.kdl b/tests/test_cases/expected_kdl/nested_block_comment.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/nested_block_comment.kdl +++ b/tests/test_cases/expected_kdl/nested_block_comment.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/nested_comments.kdl b/tests/test_cases/expected_kdl/nested_comments.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/nested_comments.kdl +++ b/tests/test_cases/expected_kdl/nested_comments.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/nested_multiline_block_comment.kdl b/tests/test_cases/expected_kdl/nested_multiline_block_comment.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/nested_multiline_block_comment.kdl +++ b/tests/test_cases/expected_kdl/nested_multiline_block_comment.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/newlines_in_block_comment.kdl b/tests/test_cases/expected_kdl/newlines_in_block_comment.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/newlines_in_block_comment.kdl +++ b/tests/test_cases/expected_kdl/newlines_in_block_comment.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/node_false.kdl b/tests/test_cases/expected_kdl/node_false.kdl index ef60c44..3bab782 100644 --- a/tests/test_cases/expected_kdl/node_false.kdl +++ b/tests/test_cases/expected_kdl/node_false.kdl @@ -1 +1 @@ -node false +node #false diff --git a/tests/test_cases/expected_kdl/node_true.kdl b/tests/test_cases/expected_kdl/node_true.kdl index 4b02a06..de00dcd 100644 --- a/tests/test_cases/expected_kdl/node_true.kdl +++ b/tests/test_cases/expected_kdl/node_true.kdl @@ -1 +1 @@ -node true +node #true diff --git a/tests/test_cases/expected_kdl/null_prop.kdl b/tests/test_cases/expected_kdl/null_prop.kdl index 85ef005..c463e98 100644 --- a/tests/test_cases/expected_kdl/null_prop.kdl +++ b/tests/test_cases/expected_kdl/null_prop.kdl @@ -1 +1 @@ -node prop=null +node prop=#null diff --git a/tests/test_cases/expected_kdl/optional_child_semicolon.kdl b/tests/test_cases/expected_kdl/optional_child_semicolon.kdl new file mode 100644 index 0000000..25eaa7d --- /dev/null +++ b/tests/test_cases/expected_kdl/optional_child_semicolon.kdl @@ -0,0 +1,5 @@ +node { + foo + bar + baz +} diff --git a/tests/test_cases/expected_kdl/parse_all_arg_types.kdl b/tests/test_cases/expected_kdl/parse_all_arg_types.kdl index 2e8552c..773df95 100644 --- a/tests/test_cases/expected_kdl/parse_all_arg_types.kdl +++ b/tests/test_cases/expected_kdl/parse_all_arg_types.kdl @@ -1 +1 @@ -node 1 1.0 1.0E+10 1.0E-10 1 7 2 "arg" "arg\\\\" true false null +node 1 1.0 1.0E+10 1.0E-10 1 7 2 arg arg "arg\\" #true #false #null diff --git a/tests/test_cases/expected_kdl/prop_false_type.kdl b/tests/test_cases/expected_kdl/prop_false_type.kdl index 3377323..eb544ef 100644 --- a/tests/test_cases/expected_kdl/prop_false_type.kdl +++ b/tests/test_cases/expected_kdl/prop_false_type.kdl @@ -1 +1 @@ -node key=(type)false +node key=(type)#false diff --git a/tests/test_cases/expected_kdl/prop_identifier_type.kdl b/tests/test_cases/expected_kdl/prop_identifier_type.kdl new file mode 100644 index 0000000..bf1b9a7 --- /dev/null +++ b/tests/test_cases/expected_kdl/prop_identifier_type.kdl @@ -0,0 +1,2 @@ +node key=(type)str + diff --git a/tests/test_cases/expected_kdl/prop_null_type.kdl b/tests/test_cases/expected_kdl/prop_null_type.kdl index bafaddc..1c25b6f 100644 --- a/tests/test_cases/expected_kdl/prop_null_type.kdl +++ b/tests/test_cases/expected_kdl/prop_null_type.kdl @@ -1 +1 @@ -node key=(type)null +node key=(type)#null diff --git a/tests/test_cases/expected_kdl/prop_raw_string_type.kdl b/tests/test_cases/expected_kdl/prop_raw_string_type.kdl index 50e2d2c..7df052b 100644 --- a/tests/test_cases/expected_kdl/prop_raw_string_type.kdl +++ b/tests/test_cases/expected_kdl/prop_raw_string_type.kdl @@ -1 +1 @@ -node key=(type)"str" +node key=(type)str diff --git a/tests/test_cases/expected_kdl/prop_string_type.kdl b/tests/test_cases/expected_kdl/prop_string_type.kdl index 50e2d2c..7df052b 100644 --- a/tests/test_cases/expected_kdl/prop_string_type.kdl +++ b/tests/test_cases/expected_kdl/prop_string_type.kdl @@ -1 +1 @@ -node key=(type)"str" +node key=(type)str diff --git a/tests/test_cases/expected_kdl/prop_true_type.kdl b/tests/test_cases/expected_kdl/prop_true_type.kdl index c4eebb6..01404b8 100644 --- a/tests/test_cases/expected_kdl/prop_true_type.kdl +++ b/tests/test_cases/expected_kdl/prop_true_type.kdl @@ -1 +1 @@ -node key=(type)true +node key=(type)#true diff --git a/tests/test_cases/expected_kdl/prop_type.kdl b/tests/test_cases/expected_kdl/prop_type.kdl index c4eebb6..01404b8 100644 --- a/tests/test_cases/expected_kdl/prop_type.kdl +++ b/tests/test_cases/expected_kdl/prop_type.kdl @@ -1 +1 @@ -node key=(type)true +node key=(type)#true diff --git a/tests/test_cases/expected_kdl/question_mark_before_number.kdl b/tests/test_cases/expected_kdl/question_mark_before_number.kdl new file mode 100644 index 0000000..532ef22 --- /dev/null +++ b/tests/test_cases/expected_kdl/question_mark_before_number.kdl @@ -0,0 +1 @@ +node ?15 \ No newline at end of file diff --git a/tests/test_cases/expected_kdl/quoted_prop_name.kdl b/tests/test_cases/expected_kdl/quoted_prop_name.kdl index 170a05a..8ee5e08 100644 --- a/tests/test_cases/expected_kdl/quoted_prop_name.kdl +++ b/tests/test_cases/expected_kdl/quoted_prop_name.kdl @@ -1 +1 @@ -node "0prop"="val" +node "0prop"=val diff --git a/tests/test_cases/expected_kdl/quoted_prop_type.kdl b/tests/test_cases/expected_kdl/quoted_prop_type.kdl index 0e2b920..beca5f2 100644 --- a/tests/test_cases/expected_kdl/quoted_prop_type.kdl +++ b/tests/test_cases/expected_kdl/quoted_prop_type.kdl @@ -1 +1 @@ -node key=("type/")true +node key=("type/")#true diff --git a/tests/test_cases/expected_kdl/r_node.kdl b/tests/test_cases/expected_kdl/r_node.kdl index 4a98807..282cc04 100644 --- a/tests/test_cases/expected_kdl/r_node.kdl +++ b/tests/test_cases/expected_kdl/r_node.kdl @@ -1 +1 @@ -r "arg" +r arg diff --git a/tests/test_cases/expected_kdl/raw_arg_type.kdl b/tests/test_cases/expected_kdl/raw_arg_type.kdl index 6d1f9bc..20243a3 100644 --- a/tests/test_cases/expected_kdl/raw_arg_type.kdl +++ b/tests/test_cases/expected_kdl/raw_arg_type.kdl @@ -1 +1 @@ -node (type)true +node (type)#true diff --git a/tests/test_cases/expected_kdl/raw_prop_type.kdl b/tests/test_cases/expected_kdl/raw_prop_type.kdl index c4eebb6..01404b8 100644 --- a/tests/test_cases/expected_kdl/raw_prop_type.kdl +++ b/tests/test_cases/expected_kdl/raw_prop_type.kdl @@ -1 +1 @@ -node key=(type)true +node key=(type)#true diff --git a/tests/test_cases/expected_kdl/raw_string_arg.kdl b/tests/test_cases/expected_kdl/raw_string_arg.kdl index a909993..24f8d65 100644 --- a/tests/test_cases/expected_kdl/raw_string_arg.kdl +++ b/tests/test_cases/expected_kdl/raw_string_arg.kdl @@ -1,3 +1,2 @@ -node_1 "arg\\n" -node_2 "\"arg\\n\"and stuff" -node_3 "#\"arg\\n\"#and stuff" +node_1 "\"arg\\n\"and #stuff" +node_2 "#\"arg\\n\"#and #stuff" diff --git a/tests/test_cases/expected_kdl/raw_string_prop.kdl b/tests/test_cases/expected_kdl/raw_string_prop.kdl index 0762d88..6a1b5ee 100644 --- a/tests/test_cases/expected_kdl/raw_string_prop.kdl +++ b/tests/test_cases/expected_kdl/raw_string_prop.kdl @@ -1,3 +1,2 @@ -node_1 prop="arg\\n" -node_2 prop="\"arg\"\\n" -node_3 prop="#\"arg\"#\\n" +node_1 prop="\"arg#\"\\n" +node_2 prop="#\"arg#\"#\\n" diff --git a/tests/test_cases/expected_kdl/repeated_arg.kdl b/tests/test_cases/expected_kdl/repeated_arg.kdl index 849fee0..6525757 100644 --- a/tests/test_cases/expected_kdl/repeated_arg.kdl +++ b/tests/test_cases/expected_kdl/repeated_arg.kdl @@ -1 +1 @@ -node "arg" "arg" +node arg arg diff --git a/tests/test_cases/expected_kdl/same_args.kdl b/tests/test_cases/expected_kdl/same_args.kdl deleted file mode 100644 index 6b8ae13..0000000 --- a/tests/test_cases/expected_kdl/same_args.kdl +++ /dev/null @@ -1 +0,0 @@ -node "whee" "whee" diff --git a/tests/test_cases/expected_kdl/single_arg.kdl b/tests/test_cases/expected_kdl/single_arg.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/single_arg.kdl +++ b/tests/test_cases/expected_kdl/single_arg.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/single_prop.kdl b/tests/test_cases/expected_kdl/single_prop.kdl index a0d0062..282aa3b 100644 --- a/tests/test_cases/expected_kdl/single_prop.kdl +++ b/tests/test_cases/expected_kdl/single_prop.kdl @@ -1 +1 @@ -node prop="val" +node prop=val diff --git a/tests/test_cases/expected_kdl/slashdash_arg_after_newline_esc.kdl b/tests/test_cases/expected_kdl/slashdash_arg_after_newline_esc.kdl index 226fd56..2e98005 100644 --- a/tests/test_cases/expected_kdl/slashdash_arg_after_newline_esc.kdl +++ b/tests/test_cases/expected_kdl/slashdash_arg_after_newline_esc.kdl @@ -1 +1 @@ -node "arg2" +node arg2 diff --git a/tests/test_cases/expected_kdl/slashdash_prop.kdl b/tests/test_cases/expected_kdl/slashdash_prop.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/slashdash_prop.kdl +++ b/tests/test_cases/expected_kdl/slashdash_prop.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/slashdash_repeated_prop.kdl b/tests/test_cases/expected_kdl/slashdash_repeated_prop.kdl index 82c6972..dce25a7 100644 --- a/tests/test_cases/expected_kdl/slashdash_repeated_prop.kdl +++ b/tests/test_cases/expected_kdl/slashdash_repeated_prop.kdl @@ -1 +1 @@ -node arg="correct" +node arg=correct diff --git a/tests/test_cases/expected_kdl/space_after_arg_type.kdl b/tests/test_cases/expected_kdl/space_after_arg_type.kdl new file mode 100644 index 0000000..51dcb98 --- /dev/null +++ b/tests/test_cases/expected_kdl/space_after_arg_type.kdl @@ -0,0 +1 @@ +node (type)10 diff --git a/tests/test_cases/expected_kdl/space_after_node_type.kdl b/tests/test_cases/expected_kdl/space_after_node_type.kdl new file mode 100644 index 0000000..c790643 --- /dev/null +++ b/tests/test_cases/expected_kdl/space_after_node_type.kdl @@ -0,0 +1 @@ +(type)node diff --git a/tests/test_cases/expected_kdl/space_after_prop_type.kdl b/tests/test_cases/expected_kdl/space_after_prop_type.kdl new file mode 100644 index 0000000..eb544ef --- /dev/null +++ b/tests/test_cases/expected_kdl/space_after_prop_type.kdl @@ -0,0 +1 @@ +node key=(type)#false diff --git a/tests/test_cases/expected_kdl/space_around_prop_marker.kdl b/tests/test_cases/expected_kdl/space_around_prop_marker.kdl new file mode 100644 index 0000000..30a026f --- /dev/null +++ b/tests/test_cases/expected_kdl/space_around_prop_marker.kdl @@ -0,0 +1 @@ +node foo=bar diff --git a/tests/test_cases/expected_kdl/space_in_arg_type.kdl b/tests/test_cases/expected_kdl/space_in_arg_type.kdl new file mode 100644 index 0000000..92003d9 --- /dev/null +++ b/tests/test_cases/expected_kdl/space_in_arg_type.kdl @@ -0,0 +1 @@ +node (type)#false diff --git a/tests/test_cases/expected_kdl/space_in_node_type.kdl b/tests/test_cases/expected_kdl/space_in_node_type.kdl new file mode 100644 index 0000000..c790643 --- /dev/null +++ b/tests/test_cases/expected_kdl/space_in_node_type.kdl @@ -0,0 +1 @@ +(type)node diff --git a/tests/test_cases/expected_kdl/space_in_prop_type.kdl b/tests/test_cases/expected_kdl/space_in_prop_type.kdl new file mode 100644 index 0000000..eb544ef --- /dev/null +++ b/tests/test_cases/expected_kdl/space_in_prop_type.kdl @@ -0,0 +1 @@ +node key=(type)#false diff --git a/tests/test_cases/expected_kdl/string_arg.kdl b/tests/test_cases/expected_kdl/string_arg.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/string_arg.kdl +++ b/tests/test_cases/expected_kdl/string_arg.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/string_escaped_literal_whitespace.kdl b/tests/test_cases/expected_kdl/string_escaped_literal_whitespace.kdl new file mode 100644 index 0000000..3169ad9 --- /dev/null +++ b/tests/test_cases/expected_kdl/string_escaped_literal_whitespace.kdl @@ -0,0 +1 @@ +node "Hello World Stuff" diff --git a/tests/test_cases/expected_kdl/string_prop.kdl b/tests/test_cases/expected_kdl/string_prop.kdl index a0d0062..282aa3b 100644 --- a/tests/test_cases/expected_kdl/string_prop.kdl +++ b/tests/test_cases/expected_kdl/string_prop.kdl @@ -1 +1 @@ -node prop="val" +node prop=val diff --git a/tests/test_cases/expected_kdl/underscore_before_number.kdl b/tests/test_cases/expected_kdl/underscore_before_number.kdl new file mode 100644 index 0000000..788656b --- /dev/null +++ b/tests/test_cases/expected_kdl/underscore_before_number.kdl @@ -0,0 +1 @@ +node _15 diff --git a/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl b/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl index d2dcd19..317e824 100644 --- a/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl +++ b/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl @@ -1 +1 @@ -foo123~!@#$%^&*.:'|?+ "weeee" +foo123~!@#$%^&*.:'|?+<>, weeee diff --git a/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl b/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl index d2dcd19..317e824 100644 --- a/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl +++ b/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl @@ -1 +1 @@ -foo123~!@#$%^&*.:'|?+ "weeee" +foo123~!@#$%^&*.:'|?+<>, weeee diff --git a/tests/test_cases/expected_kdl/vertical_tab_whitespace.kdl b/tests/test_cases/expected_kdl/vertical_tab_whitespace.kdl new file mode 100644 index 0000000..1b3db2c --- /dev/null +++ b/tests/test_cases/expected_kdl/vertical_tab_whitespace.kdl @@ -0,0 +1 @@ +node arg diff --git a/tests/test_cases/input/all_escapes.kdl b/tests/test_cases/input/all_escapes.kdl index 024cda2..5c49748 100644 --- a/tests/test_cases/input/all_escapes.kdl +++ b/tests/test_cases/input/all_escapes.kdl @@ -1 +1 @@ -node "\"\\\b\f\n\r\t" +node "\"\\\b\f\n\r\t\s" diff --git a/tests/test_cases/input/all_node_fields.kdl b/tests/test_cases/input/all_node_fields.kdl index 719a8d1..9f4ceb5 100644 --- a/tests/test_cases/input/all_node_fields.kdl +++ b/tests/test_cases/input/all_node_fields.kdl @@ -1,3 +1,3 @@ -node "arg" prop="val" { - inner_node -} \ No newline at end of file +node arg prop=val { + inner_node +} diff --git a/tests/test_cases/input/arg_and_prop_same_name.kdl b/tests/test_cases/input/arg_and_prop_same_name.kdl index b830f56..ee5ace5 100644 --- a/tests/test_cases/input/arg_and_prop_same_name.kdl +++ b/tests/test_cases/input/arg_and_prop_same_name.kdl @@ -1 +1 @@ -node "arg" arg="val" \ No newline at end of file +node arg arg=val diff --git a/tests/test_cases/input/arg_bare.kdl b/tests/test_cases/input/arg_bare.kdl new file mode 100644 index 0000000..ec2a21f --- /dev/null +++ b/tests/test_cases/input/arg_bare.kdl @@ -0,0 +1 @@ +node a \ No newline at end of file diff --git a/tests/test_cases/input/arg_false_type.kdl b/tests/test_cases/input/arg_false_type.kdl index 895945d..92003d9 100644 --- a/tests/test_cases/input/arg_false_type.kdl +++ b/tests/test_cases/input/arg_false_type.kdl @@ -1 +1 @@ -node (type)false +node (type)#false diff --git a/tests/test_cases/input/arg_null_type.kdl b/tests/test_cases/input/arg_null_type.kdl index 476c5cd..cd66101 100644 --- a/tests/test_cases/input/arg_null_type.kdl +++ b/tests/test_cases/input/arg_null_type.kdl @@ -1 +1 @@ -node (type)null +node (type)#null diff --git a/tests/test_cases/input/arg_raw_string_type.kdl b/tests/test_cases/input/arg_raw_string_type.kdl index 2808d53..c722312 100644 --- a/tests/test_cases/input/arg_raw_string_type.kdl +++ b/tests/test_cases/input/arg_raw_string_type.kdl @@ -1 +1 @@ -node (type)"str" +node (type)#"str"# diff --git a/tests/test_cases/input/arg_string_type.kdl b/tests/test_cases/input/arg_string_type.kdl index 1a141b2..2808d53 100644 --- a/tests/test_cases/input/arg_string_type.kdl +++ b/tests/test_cases/input/arg_string_type.kdl @@ -1 +1 @@ -node (type)"str" \ No newline at end of file +node (type)"str" diff --git a/tests/test_cases/input/arg_true_type.kdl b/tests/test_cases/input/arg_true_type.kdl index 6d1f9bc..20243a3 100644 --- a/tests/test_cases/input/arg_true_type.kdl +++ b/tests/test_cases/input/arg_true_type.kdl @@ -1 +1 @@ -node (type)true +node (type)#true diff --git a/tests/test_cases/input/arg_type.kdl b/tests/test_cases/input/arg_type.kdl index a0b84cf..79a093d 100644 --- a/tests/test_cases/input/arg_type.kdl +++ b/tests/test_cases/input/arg_type.kdl @@ -1 +1 @@ -node (type)"arg" +node (type)arg diff --git a/tests/test_cases/input/backslash_in_bare_id.kdl b/tests/test_cases/input/backslash_in_bare_id.kdl deleted file mode 100644 index 5615277..0000000 --- a/tests/test_cases/input/backslash_in_bare_id.kdl +++ /dev/null @@ -1 +0,0 @@ -foo123\bar "weeee" diff --git a/tests/test_cases/input/bare_emoji.kdl b/tests/test_cases/input/bare_emoji.kdl index 60707c8..c67d0b9 100644 --- a/tests/test_cases/input/bare_emoji.kdl +++ b/tests/test_cases/input/bare_emoji.kdl @@ -1 +1 @@ -😁 "happy!" +😁 happy! diff --git a/tests/test_cases/input/blank_prop_type.kdl b/tests/test_cases/input/blank_prop_type.kdl index 898f90d..e00c6d2 100644 --- a/tests/test_cases/input/blank_prop_type.kdl +++ b/tests/test_cases/input/blank_prop_type.kdl @@ -1 +1 @@ -node key=("")true \ No newline at end of file +node key=("")#true diff --git a/tests/test_cases/input/block_comment.kdl b/tests/test_cases/input/block_comment.kdl index e6eddb9..f6c39ac 100644 --- a/tests/test_cases/input/block_comment.kdl +++ b/tests/test_cases/input/block_comment.kdl @@ -1 +1 @@ -node /* comment */ "arg" \ No newline at end of file +node /* comment */ arg diff --git a/tests/test_cases/input/block_comment_after_node.kdl b/tests/test_cases/input/block_comment_after_node.kdl index e7777ed..071ff21 100644 --- a/tests/test_cases/input/block_comment_after_node.kdl +++ b/tests/test_cases/input/block_comment_after_node.kdl @@ -1 +1 @@ -node /* hey */ "arg" +node /* hey */ arg diff --git a/tests/test_cases/input/bom_initial.kdl b/tests/test_cases/input/bom_initial.kdl new file mode 100644 index 0000000..e52e8bf --- /dev/null +++ b/tests/test_cases/input/bom_initial.kdl @@ -0,0 +1 @@ +node arg diff --git a/tests/test_cases/input/bom_later.kdl b/tests/test_cases/input/bom_later.kdl new file mode 100644 index 0000000..6aeff8d --- /dev/null +++ b/tests/test_cases/input/bom_later.kdl @@ -0,0 +1 @@ +node arg diff --git a/tests/test_cases/input/boolean_arg.kdl b/tests/test_cases/input/boolean_arg.kdl index f099893..e0cdf1a 100644 --- a/tests/test_cases/input/boolean_arg.kdl +++ b/tests/test_cases/input/boolean_arg.kdl @@ -1 +1 @@ -node false true \ No newline at end of file +node #false #true diff --git a/tests/test_cases/input/boolean_prop.kdl b/tests/test_cases/input/boolean_prop.kdl index 61e3111..f89da9b 100644 --- a/tests/test_cases/input/boolean_prop.kdl +++ b/tests/test_cases/input/boolean_prop.kdl @@ -1 +1 @@ -node prop1=true prop2=false \ No newline at end of file +node prop1=#true prop2=#false diff --git a/tests/test_cases/input/brackets_in_bare_id.kdl b/tests/test_cases/input/brackets_in_bare_id.kdl index b0d39c5..ebb78d2 100644 --- a/tests/test_cases/input/brackets_in_bare_id.kdl +++ b/tests/test_cases/input/brackets_in_bare_id.kdl @@ -1 +1 @@ -foo123{bar}foo "weeee" +foo123{bar}foo weeee diff --git a/tests/test_cases/input/chevrons_in_bare_id.kdl b/tests/test_cases/input/chevrons_in_bare_id.kdl index 4b6610e..58b2436 100644 --- a/tests/test_cases/input/chevrons_in_bare_id.kdl +++ b/tests/test_cases/input/chevrons_in_bare_id.kdl @@ -1 +1 @@ -foo123foo "weeee" +foo123foo weeee diff --git a/tests/test_cases/input/comma_in_bare_id.kdl b/tests/test_cases/input/comma_in_bare_id.kdl index 656df91..86c78fd 100644 --- a/tests/test_cases/input/comma_in_bare_id.kdl +++ b/tests/test_cases/input/comma_in_bare_id.kdl @@ -1 +1 @@ -foo123,bar "weeee" +foo123,bar weeee diff --git a/tests/test_cases/input/comment_and_newline.kdl b/tests/test_cases/input/comment_and_newline.kdl new file mode 100644 index 0000000..d1bb77f --- /dev/null +++ b/tests/test_cases/input/comment_and_newline.kdl @@ -0,0 +1,2 @@ +node1 // +node2 diff --git a/tests/test_cases/input/commented_arg.kdl b/tests/test_cases/input/commented_arg.kdl index e389cd2..0e6157f 100644 --- a/tests/test_cases/input/commented_arg.kdl +++ b/tests/test_cases/input/commented_arg.kdl @@ -1 +1 @@ -node /- "arg1" "arg2" \ No newline at end of file +node /- arg1 arg2 diff --git a/tests/test_cases/input/commented_child.kdl b/tests/test_cases/input/commented_child.kdl index e13c479..8e873f7 100644 --- a/tests/test_cases/input/commented_child.kdl +++ b/tests/test_cases/input/commented_child.kdl @@ -1,3 +1,3 @@ -node "arg" /- { +node arg /- { inner_node -} \ No newline at end of file +} diff --git a/tests/test_cases/input/commented_prop.kdl b/tests/test_cases/input/commented_prop.kdl index acedc83..046fd9d 100644 --- a/tests/test_cases/input/commented_prop.kdl +++ b/tests/test_cases/input/commented_prop.kdl @@ -1 +1 @@ -node /- prop="val" "arg" \ No newline at end of file +node /- prop=val arg diff --git a/tests/test_cases/input/crlf_between_nodes.kdl b/tests/test_cases/input/crlf_between_nodes.kdl index 4d9cb21..148f7bc 100644 --- a/tests/test_cases/input/crlf_between_nodes.kdl +++ b/tests/test_cases/input/crlf_between_nodes.kdl @@ -1,2 +1,2 @@ -node1 -node2 \ No newline at end of file +node1 +node2 diff --git a/tests/test_cases/input/emoji.kdl b/tests/test_cases/input/emoji.kdl index 3ed56e2..88df78a 100644 --- a/tests/test_cases/input/emoji.kdl +++ b/tests/test_cases/input/emoji.kdl @@ -1 +1 @@ -node "😀" +node 😀 diff --git a/tests/test_cases/input/empty_prop_type.kdl b/tests/test_cases/input/empty_prop_type.kdl index 0515094..233480b 100644 --- a/tests/test_cases/input/empty_prop_type.kdl +++ b/tests/test_cases/input/empty_prop_type.kdl @@ -1 +1 @@ -node key=()false +node key=()#false diff --git a/tests/test_cases/input/empty_quoted_node_id.kdl b/tests/test_cases/input/empty_quoted_node_id.kdl index 2aeb594..94694bc 100644 --- a/tests/test_cases/input/empty_quoted_node_id.kdl +++ b/tests/test_cases/input/empty_quoted_node_id.kdl @@ -1 +1 @@ -"" "arg" \ No newline at end of file +"" arg diff --git a/tests/test_cases/input/empty_quoted_prop_key.kdl b/tests/test_cases/input/empty_quoted_prop_key.kdl index e6e1310..e541793 100644 --- a/tests/test_cases/input/empty_quoted_prop_key.kdl +++ b/tests/test_cases/input/empty_quoted_prop_key.kdl @@ -1 +1 @@ -node ""="empty" +node ""=empty diff --git a/tests/test_cases/input/eof_after_escape.kdl b/tests/test_cases/input/eof_after_escape.kdl new file mode 100644 index 0000000..eed8d72 --- /dev/null +++ b/tests/test_cases/input/eof_after_escape.kdl @@ -0,0 +1 @@ +node \ diff --git a/tests/test_cases/input/err_backslash_in_bare_id.kdl b/tests/test_cases/input/err_backslash_in_bare_id.kdl new file mode 100644 index 0000000..2ea1a4b --- /dev/null +++ b/tests/test_cases/input/err_backslash_in_bare_id.kdl @@ -0,0 +1 @@ +foo123\bar weeee diff --git a/tests/test_cases/input/escline.kdl b/tests/test_cases/input/escline.kdl index 9010e07..bcd1a1a 100644 --- a/tests/test_cases/input/escline.kdl +++ b/tests/test_cases/input/escline.kdl @@ -1,2 +1,2 @@ node \ - "arg" \ No newline at end of file + arg diff --git a/tests/test_cases/input/escline_comment_node.kdl b/tests/test_cases/input/escline_comment_node.kdl deleted file mode 100644 index 030c245..0000000 --- a/tests/test_cases/input/escline_comment_node.kdl +++ /dev/null @@ -1,3 +0,0 @@ -node1 - \// hey - node2 \ No newline at end of file diff --git a/tests/test_cases/input/escline_line_comment.kdl b/tests/test_cases/input/escline_line_comment.kdl index 31f19fd..dc81b72 100644 --- a/tests/test_cases/input/escline_line_comment.kdl +++ b/tests/test_cases/input/escline_line_comment.kdl @@ -1,4 +1,3 @@ node \ // comment - "arg" \// comment - "arg2 -" \ No newline at end of file + arg \// comment + arg2 diff --git a/tests/test_cases/input/hash_in_id.kdl b/tests/test_cases/input/hash_in_id.kdl new file mode 100644 index 0000000..e1119be --- /dev/null +++ b/tests/test_cases/input/hash_in_id.kdl @@ -0,0 +1 @@ +foo#bar weee diff --git a/tests/test_cases/input/just_space_in_prop_type.kdl b/tests/test_cases/input/just_space_in_prop_type.kdl index a00603c..e42645f 100644 --- a/tests/test_cases/input/just_space_in_prop_type.kdl +++ b/tests/test_cases/input/just_space_in_prop_type.kdl @@ -1 +1 @@ -node key=()0x10 +node key=( )0x10 diff --git a/tests/test_cases/input/multiline_comment.kdl b/tests/test_cases/input/multiline_comment.kdl index 26485bc..5fbb80b 100644 --- a/tests/test_cases/input/multiline_comment.kdl +++ b/tests/test_cases/input/multiline_comment.kdl @@ -1,4 +1,4 @@ node /* some comments -*/ "arg" \ No newline at end of file +*/ arg diff --git a/tests/test_cases/input/multiline_nodes.kdl b/tests/test_cases/input/multiline_nodes.kdl index 3dc907e..eae83d1 100644 --- a/tests/test_cases/input/multiline_nodes.kdl +++ b/tests/test_cases/input/multiline_nodes.kdl @@ -1,3 +1,3 @@ node \ - "arg1" \// comment - "arg2" \ No newline at end of file + arg1 \// comment + arg2 diff --git a/tests/test_cases/input/multiline_raw_string.kdl b/tests/test_cases/input/multiline_raw_string.kdl new file mode 100644 index 0000000..eaa212e --- /dev/null +++ b/tests/test_cases/input/multiline_raw_string.kdl @@ -0,0 +1,5 @@ +node #" +hey +everyone +how goes? +"# diff --git a/tests/test_cases/input/multiline_string.kdl b/tests/test_cases/input/multiline_string.kdl index 603cddd..e3a6cc1 100644 --- a/tests/test_cases/input/multiline_string.kdl +++ b/tests/test_cases/input/multiline_string.kdl @@ -1,4 +1,5 @@ -node " hey +node " +hey everyone how goes? -" \ No newline at end of file +" diff --git a/tests/test_cases/input/nested_block_comment.kdl b/tests/test_cases/input/nested_block_comment.kdl index d7f765c..d9966a9 100644 --- a/tests/test_cases/input/nested_block_comment.kdl +++ b/tests/test_cases/input/nested_block_comment.kdl @@ -1 +1 @@ -node /* hi /* there */ everyone */ "arg" \ No newline at end of file +node /* hi /* there */ everyone */ arg diff --git a/tests/test_cases/input/nested_comments.kdl b/tests/test_cases/input/nested_comments.kdl index 8b3aad6..7541c39 100644 --- a/tests/test_cases/input/nested_comments.kdl +++ b/tests/test_cases/input/nested_comments.kdl @@ -1 +1 @@ -node /*/* nested */*/ "arg" \ No newline at end of file +node /*/* nested */*/ arg diff --git a/tests/test_cases/input/nested_multiline_block_comment.kdl b/tests/test_cases/input/nested_multiline_block_comment.kdl index 9d8e0ca..f1087e1 100644 --- a/tests/test_cases/input/nested_multiline_block_comment.kdl +++ b/tests/test_cases/input/nested_multiline_block_comment.kdl @@ -3,5 +3,4 @@ hey /* how's */ it going - */ "arg" - \ No newline at end of file + */ arg diff --git a/tests/test_cases/input/newlines_in_block_comment.kdl b/tests/test_cases/input/newlines_in_block_comment.kdl index a5cd2b1..690461b 100644 --- a/tests/test_cases/input/newlines_in_block_comment.kdl +++ b/tests/test_cases/input/newlines_in_block_comment.kdl @@ -1,3 +1,3 @@ node /* hey so I was thinking -about newts */ "arg" \ No newline at end of file +about newts */ arg diff --git a/tests/test_cases/input/node_false.kdl b/tests/test_cases/input/node_false.kdl index ef60c44..3bab782 100644 --- a/tests/test_cases/input/node_false.kdl +++ b/tests/test_cases/input/node_false.kdl @@ -1 +1 @@ -node false +node #false diff --git a/tests/test_cases/input/node_true.kdl b/tests/test_cases/input/node_true.kdl index 4b02a06..de00dcd 100644 --- a/tests/test_cases/input/node_true.kdl +++ b/tests/test_cases/input/node_true.kdl @@ -1 +1 @@ -node true +node #true diff --git a/tests/test_cases/input/null_arg.kdl b/tests/test_cases/input/null_arg.kdl index a5ce001..bed8dbf 100644 --- a/tests/test_cases/input/null_arg.kdl +++ b/tests/test_cases/input/null_arg.kdl @@ -1 +1 @@ -node null \ No newline at end of file +node #null diff --git a/tests/test_cases/input/null_prop.kdl b/tests/test_cases/input/null_prop.kdl index 847256f..c463e98 100644 --- a/tests/test_cases/input/null_prop.kdl +++ b/tests/test_cases/input/null_prop.kdl @@ -1 +1 @@ -node prop=null \ No newline at end of file +node prop=#null diff --git a/tests/test_cases/input/only_line_comment_crlf.kdl b/tests/test_cases/input/only_line_comment_crlf.kdl index fef83a9..b1653b8 100644 --- a/tests/test_cases/input/only_line_comment_crlf.kdl +++ b/tests/test_cases/input/only_line_comment_crlf.kdl @@ -1 +1 @@ -// comment +// comment diff --git a/tests/test_cases/input/optional_child_semicolon.kdl b/tests/test_cases/input/optional_child_semicolon.kdl new file mode 100644 index 0000000..5381491 --- /dev/null +++ b/tests/test_cases/input/optional_child_semicolon.kdl @@ -0,0 +1 @@ +node {foo;bar;baz} diff --git a/tests/test_cases/input/parens_in_bare_id.kdl b/tests/test_cases/input/parens_in_bare_id.kdl index 92459d8..ff9b439 100644 --- a/tests/test_cases/input/parens_in_bare_id.kdl +++ b/tests/test_cases/input/parens_in_bare_id.kdl @@ -1 +1 @@ -foo123(bar)foo "weeee" +foo123(bar)foo weeee diff --git a/tests/test_cases/input/parse_all_arg_types.kdl b/tests/test_cases/input/parse_all_arg_types.kdl index 30b9072..92dffb1 100644 --- a/tests/test_cases/input/parse_all_arg_types.kdl +++ b/tests/test_cases/input/parse_all_arg_types.kdl @@ -1 +1 @@ -node 1 1.0 1.0e10 1.0e-10 0x01 0o07 0b10 "arg" r"arg\\" true false null \ No newline at end of file +node 1 1.0 1.0e10 1.0e-10 0x01 0o07 0b10 arg "arg" #"arg\"# #true #false #null diff --git a/tests/test_cases/input/prop_false_type.kdl b/tests/test_cases/input/prop_false_type.kdl index 3377323..eb544ef 100644 --- a/tests/test_cases/input/prop_false_type.kdl +++ b/tests/test_cases/input/prop_false_type.kdl @@ -1 +1 @@ -node key=(type)false +node key=(type)#false diff --git a/tests/test_cases/input/prop_identifier_type.kdl b/tests/test_cases/input/prop_identifier_type.kdl new file mode 100644 index 0000000..bf1b9a7 --- /dev/null +++ b/tests/test_cases/input/prop_identifier_type.kdl @@ -0,0 +1,2 @@ +node key=(type)str + diff --git a/tests/test_cases/input/prop_null_type.kdl b/tests/test_cases/input/prop_null_type.kdl index bafaddc..1c25b6f 100644 --- a/tests/test_cases/input/prop_null_type.kdl +++ b/tests/test_cases/input/prop_null_type.kdl @@ -1 +1 @@ -node key=(type)null +node key=(type)#null diff --git a/tests/test_cases/input/prop_raw_string_type.kdl b/tests/test_cases/input/prop_raw_string_type.kdl index a038cfa..6822ab3 100644 --- a/tests/test_cases/input/prop_raw_string_type.kdl +++ b/tests/test_cases/input/prop_raw_string_type.kdl @@ -1 +1 @@ -node key=(type)r"str" +node key=(type)#"str"# diff --git a/tests/test_cases/input/prop_true_type.kdl b/tests/test_cases/input/prop_true_type.kdl index c4eebb6..01404b8 100644 --- a/tests/test_cases/input/prop_true_type.kdl +++ b/tests/test_cases/input/prop_true_type.kdl @@ -1 +1 @@ -node key=(type)true +node key=(type)#true diff --git a/tests/test_cases/input/prop_type.kdl b/tests/test_cases/input/prop_type.kdl index d69294f..01404b8 100644 --- a/tests/test_cases/input/prop_type.kdl +++ b/tests/test_cases/input/prop_type.kdl @@ -1 +1 @@ -node key=(type)true \ No newline at end of file +node key=(type)#true diff --git a/tests/test_cases/input/question_mark_at_start_of_int.kdl b/tests/test_cases/input/question_mark_at_start_of_int.kdl deleted file mode 100644 index ba82916..0000000 --- a/tests/test_cases/input/question_mark_at_start_of_int.kdl +++ /dev/null @@ -1 +0,0 @@ -node ?10 \ No newline at end of file diff --git a/tests/test_cases/input/quote_in_bare_id.kdl b/tests/test_cases/input/quote_in_bare_id.kdl index 405f763..0d8a664 100644 --- a/tests/test_cases/input/quote_in_bare_id.kdl +++ b/tests/test_cases/input/quote_in_bare_id.kdl @@ -1 +1 @@ -foo123"bar "weeee" +foo123"bar weeee diff --git a/tests/test_cases/input/quoted_prop_name.kdl b/tests/test_cases/input/quoted_prop_name.kdl index 73ec6dd..8ee5e08 100644 --- a/tests/test_cases/input/quoted_prop_name.kdl +++ b/tests/test_cases/input/quoted_prop_name.kdl @@ -1 +1 @@ -node "0prop"="val" \ No newline at end of file +node "0prop"=val diff --git a/tests/test_cases/input/quoted_prop_type.kdl b/tests/test_cases/input/quoted_prop_type.kdl index 0e2b920..beca5f2 100644 --- a/tests/test_cases/input/quoted_prop_type.kdl +++ b/tests/test_cases/input/quoted_prop_type.kdl @@ -1 +1 @@ -node key=("type/")true +node key=("type/")#true diff --git a/tests/test_cases/input/raw_arg_type.kdl b/tests/test_cases/input/raw_arg_type.kdl index c5739b1..20243a3 100644 --- a/tests/test_cases/input/raw_arg_type.kdl +++ b/tests/test_cases/input/raw_arg_type.kdl @@ -1 +1 @@ -node (type)true \ No newline at end of file +node (type)#true diff --git a/tests/test_cases/input/raw_node_name.kdl b/tests/test_cases/input/raw_node_name.kdl index 0d38371..f2705c7 100644 --- a/tests/test_cases/input/raw_node_name.kdl +++ b/tests/test_cases/input/raw_node_name.kdl @@ -1 +1 @@ -r"\node" \ No newline at end of file +#"\node"# diff --git a/tests/test_cases/input/raw_prop_type.kdl b/tests/test_cases/input/raw_prop_type.kdl index d69294f..01404b8 100644 --- a/tests/test_cases/input/raw_prop_type.kdl +++ b/tests/test_cases/input/raw_prop_type.kdl @@ -1 +1 @@ -node key=(type)true \ No newline at end of file +node key=(type)#true diff --git a/tests/test_cases/input/raw_string_arg.kdl b/tests/test_cases/input/raw_string_arg.kdl index 6b7581f..cf4a86c 100644 --- a/tests/test_cases/input/raw_string_arg.kdl +++ b/tests/test_cases/input/raw_string_arg.kdl @@ -1,3 +1,2 @@ -node_1 r"arg\n" -node_2 r#""arg\n"and stuff"# -node_3 r##"#"arg\n"#and stuff"## \ No newline at end of file +node_1 r#""arg\n"and #stuff"# +node_2 r##"#"arg\n"#and #stuff"## diff --git a/tests/test_cases/input/raw_string_backslash.kdl b/tests/test_cases/input/raw_string_backslash.kdl index 0f7ca45..0405248 100644 --- a/tests/test_cases/input/raw_string_backslash.kdl +++ b/tests/test_cases/input/raw_string_backslash.kdl @@ -1 +1 @@ -node r"\n" +node #"\n"# diff --git a/tests/test_cases/input/raw_string_hash_no_esc.kdl b/tests/test_cases/input/raw_string_hash_no_esc.kdl index c8fa3c4..ce24c79 100644 --- a/tests/test_cases/input/raw_string_hash_no_esc.kdl +++ b/tests/test_cases/input/raw_string_hash_no_esc.kdl @@ -1 +1 @@ -node r"#" +node #"#"# diff --git a/tests/test_cases/input/raw_string_just_backslash.kdl b/tests/test_cases/input/raw_string_just_backslash.kdl index 9aefa73..f4e1cac 100644 --- a/tests/test_cases/input/raw_string_just_backslash.kdl +++ b/tests/test_cases/input/raw_string_just_backslash.kdl @@ -1 +1 @@ -node r"\" +node #"\"# diff --git a/tests/test_cases/input/raw_string_just_quote.kdl b/tests/test_cases/input/raw_string_just_quote.kdl index b8333ca..e81bf12 100644 --- a/tests/test_cases/input/raw_string_just_quote.kdl +++ b/tests/test_cases/input/raw_string_just_quote.kdl @@ -1 +1 @@ -node r#"""# +node #"""# diff --git a/tests/test_cases/input/raw_string_multiple_hash.kdl b/tests/test_cases/input/raw_string_multiple_hash.kdl index e6d054c..6317f36 100644 --- a/tests/test_cases/input/raw_string_multiple_hash.kdl +++ b/tests/test_cases/input/raw_string_multiple_hash.kdl @@ -1 +1 @@ -node r###""#"##"### +node ###""#"##"### diff --git a/tests/test_cases/input/raw_string_newline.kdl b/tests/test_cases/input/raw_string_newline.kdl index ef39d3c..0cc85c0 100644 --- a/tests/test_cases/input/raw_string_newline.kdl +++ b/tests/test_cases/input/raw_string_newline.kdl @@ -1,4 +1,4 @@ -node r" +node #" hello world -" +"# diff --git a/tests/test_cases/input/raw_string_prop.kdl b/tests/test_cases/input/raw_string_prop.kdl index a6c352a..cc59232 100644 --- a/tests/test_cases/input/raw_string_prop.kdl +++ b/tests/test_cases/input/raw_string_prop.kdl @@ -1,3 +1,2 @@ -node_1 prop=r"arg\n" -node_2 prop=r#""arg"\n"# -node_3 prop=r##"#"arg"#\n"## \ No newline at end of file +node_1 prop=#""arg#"\n"# +node_2 prop=##"#"arg#"#\n"## diff --git a/tests/test_cases/input/raw_string_quote.kdl b/tests/test_cases/input/raw_string_quote.kdl index cd7419c..004b62f 100644 --- a/tests/test_cases/input/raw_string_quote.kdl +++ b/tests/test_cases/input/raw_string_quote.kdl @@ -1 +1 @@ -node r#"a"b"# \ No newline at end of file +node #"a"b"# diff --git a/tests/test_cases/input/repeated_arg.kdl b/tests/test_cases/input/repeated_arg.kdl index beab120..6525757 100644 --- a/tests/test_cases/input/repeated_arg.kdl +++ b/tests/test_cases/input/repeated_arg.kdl @@ -1 +1 @@ -node "arg" "arg" \ No newline at end of file +node arg arg diff --git a/tests/test_cases/input/same_args.kdl b/tests/test_cases/input/same_args.kdl deleted file mode 100644 index c412de8..0000000 --- a/tests/test_cases/input/same_args.kdl +++ /dev/null @@ -1 +0,0 @@ -node "whee" "whee" \ No newline at end of file diff --git a/tests/test_cases/input/single_arg.kdl b/tests/test_cases/input/single_arg.kdl index e5161d1..1b3db2c 100644 --- a/tests/test_cases/input/single_arg.kdl +++ b/tests/test_cases/input/single_arg.kdl @@ -1 +1 @@ -node "arg" \ No newline at end of file +node arg diff --git a/tests/test_cases/input/single_prop.kdl b/tests/test_cases/input/single_prop.kdl index 4c29c14..282aa3b 100644 --- a/tests/test_cases/input/single_prop.kdl +++ b/tests/test_cases/input/single_prop.kdl @@ -1 +1 @@ -node prop="val" \ No newline at end of file +node prop=val diff --git a/tests/test_cases/input/slash_in_bare_id.kdl b/tests/test_cases/input/slash_in_bare_id.kdl index 1139c88..d26d325 100644 --- a/tests/test_cases/input/slash_in_bare_id.kdl +++ b/tests/test_cases/input/slash_in_bare_id.kdl @@ -1 +1 @@ -foo123/bar "weeee" +foo123/bar weeee diff --git a/tests/test_cases/input/slashdash_arg_after_newline_esc.kdl b/tests/test_cases/input/slashdash_arg_after_newline_esc.kdl index 059b3e1..5a4a9fd 100644 --- a/tests/test_cases/input/slashdash_arg_after_newline_esc.kdl +++ b/tests/test_cases/input/slashdash_arg_after_newline_esc.kdl @@ -1,2 +1,2 @@ node \ - /- "arg" "arg2" + /- arg arg2 diff --git a/tests/test_cases/input/slashdash_arg_before_newline_esc.kdl b/tests/test_cases/input/slashdash_arg_before_newline_esc.kdl index f58e4a7..70206aa 100644 --- a/tests/test_cases/input/slashdash_arg_before_newline_esc.kdl +++ b/tests/test_cases/input/slashdash_arg_before_newline_esc.kdl @@ -1,2 +1,2 @@ node /- \ - "arg" + arg diff --git a/tests/test_cases/input/slashdash_full_node.kdl b/tests/test_cases/input/slashdash_full_node.kdl index de2eb2a..f52f18b 100644 --- a/tests/test_cases/input/slashdash_full_node.kdl +++ b/tests/test_cases/input/slashdash_full_node.kdl @@ -1,2 +1,2 @@ -/- node 1.0 "a" b="b -" \ No newline at end of file +/- node 1.0 "a" b=" +b" diff --git a/tests/test_cases/input/slashdash_prop.kdl b/tests/test_cases/input/slashdash_prop.kdl index 3d7b806..2b81f5f 100644 --- a/tests/test_cases/input/slashdash_prop.kdl +++ b/tests/test_cases/input/slashdash_prop.kdl @@ -1 +1 @@ -node /- key="value" "arg" +node /- key=value arg diff --git a/tests/test_cases/input/slashdash_raw_prop_key.kdl b/tests/test_cases/input/slashdash_raw_prop_key.kdl index c9ad5ad..9b0978b 100644 --- a/tests/test_cases/input/slashdash_raw_prop_key.kdl +++ b/tests/test_cases/input/slashdash_raw_prop_key.kdl @@ -1 +1 @@ -node /- key="value" +node /- key=value diff --git a/tests/test_cases/input/slashdash_repeated_prop.kdl b/tests/test_cases/input/slashdash_repeated_prop.kdl index b427175..c94411a 100644 --- a/tests/test_cases/input/slashdash_repeated_prop.kdl +++ b/tests/test_cases/input/slashdash_repeated_prop.kdl @@ -1 +1 @@ -node arg="correct" /- arg="wrong" +node arg=correct /- arg=wrong diff --git a/tests/test_cases/input/space_after_prop_type.kdl b/tests/test_cases/input/space_after_prop_type.kdl index a891dfd..023a75c 100644 --- a/tests/test_cases/input/space_after_prop_type.kdl +++ b/tests/test_cases/input/space_after_prop_type.kdl @@ -1 +1 @@ -node key=(type) false +node key=(type) #false diff --git a/tests/test_cases/input/space_around_prop_marker.kdl b/tests/test_cases/input/space_around_prop_marker.kdl new file mode 100644 index 0000000..52150d8 --- /dev/null +++ b/tests/test_cases/input/space_around_prop_marker.kdl @@ -0,0 +1 @@ +node foo = bar diff --git a/tests/test_cases/input/space_in_arg_type.kdl b/tests/test_cases/input/space_in_arg_type.kdl index 2f9ca24..e2fb065 100644 --- a/tests/test_cases/input/space_in_arg_type.kdl +++ b/tests/test_cases/input/space_in_arg_type.kdl @@ -1 +1 @@ -node (type )false +node (type )#false diff --git a/tests/test_cases/input/space_in_prop_type.kdl b/tests/test_cases/input/space_in_prop_type.kdl index 4e9c750..0a18c97 100644 --- a/tests/test_cases/input/space_in_prop_type.kdl +++ b/tests/test_cases/input/space_in_prop_type.kdl @@ -1 +1 @@ -node key=(type )false +node key=(type )#false diff --git a/tests/test_cases/input/square_bracket_in_bare_id.kdl b/tests/test_cases/input/square_bracket_in_bare_id.kdl index 2dd54e9..62f34e2 100644 --- a/tests/test_cases/input/square_bracket_in_bare_id.kdl +++ b/tests/test_cases/input/square_bracket_in_bare_id.kdl @@ -1 +1 @@ -foo123[bar]foo "weeee" +foo123[bar]foo weeee diff --git a/tests/test_cases/input/string_escaped_literal_whitespace.kdl b/tests/test_cases/input/string_escaped_literal_whitespace.kdl new file mode 100644 index 0000000..1f12126 --- /dev/null +++ b/tests/test_cases/input/string_escaped_literal_whitespace.kdl @@ -0,0 +1,2 @@ +node "Hello \ +World \ Stuff" diff --git a/tests/test_cases/input/trailing_crlf.kdl b/tests/test_cases/input/trailing_crlf.kdl index 64f5a0a..aff78f7 100644 --- a/tests/test_cases/input/trailing_crlf.kdl +++ b/tests/test_cases/input/trailing_crlf.kdl @@ -1 +1 @@ -node +node diff --git a/tests/test_cases/input/unbalanced_raw_hashes.kdl b/tests/test_cases/input/unbalanced_raw_hashes.kdl index 7deb72f..d0213f2 100644 --- a/tests/test_cases/input/unbalanced_raw_hashes.kdl +++ b/tests/test_cases/input/unbalanced_raw_hashes.kdl @@ -1 +1 @@ -node r##"foo"# +node ##"foo"# diff --git a/tests/test_cases/input/underscore_at_start_of_int.kdl b/tests/test_cases/input/underscore_at_start_of_int.kdl deleted file mode 100644 index b854b60..0000000 --- a/tests/test_cases/input/underscore_at_start_of_int.kdl +++ /dev/null @@ -1 +0,0 @@ -node _15 \ No newline at end of file diff --git a/tests/test_cases/input/unicode_delete.kdl b/tests/test_cases/input/unicode_delete.kdl new file mode 100644 index 0000000..3fb52ed --- /dev/null +++ b/tests/test_cases/input/unicode_delete.kdl @@ -0,0 +1,2 @@ +// 0x007F (Delete) +node1 arg diff --git a/tests/test_cases/input/unicode_fsi.kdl b/tests/test_cases/input/unicode_fsi.kdl new file mode 100644 index 0000000..7aece14 --- /dev/null +++ b/tests/test_cases/input/unicode_fsi.kdl @@ -0,0 +1,2 @@ +// 0x2068 +node1 ⁨arg diff --git a/tests/test_cases/input/unicode_lre.kdl b/tests/test_cases/input/unicode_lre.kdl new file mode 100644 index 0000000..33342ae --- /dev/null +++ b/tests/test_cases/input/unicode_lre.kdl @@ -0,0 +1,2 @@ +// 0x202A +node1 ‪arg diff --git a/tests/test_cases/input/unicode_lri.kdl b/tests/test_cases/input/unicode_lri.kdl new file mode 100644 index 0000000..adec826 --- /dev/null +++ b/tests/test_cases/input/unicode_lri.kdl @@ -0,0 +1,2 @@ +// 0x2066 +node1⁦arg diff --git a/tests/test_cases/input/unicode_lrm.kdl b/tests/test_cases/input/unicode_lrm.kdl new file mode 100644 index 0000000..ff37cad --- /dev/null +++ b/tests/test_cases/input/unicode_lrm.kdl @@ -0,0 +1,2 @@ +// 0x200E +node ‎arg diff --git a/tests/test_cases/input/unicode_lro.kdl b/tests/test_cases/input/unicode_lro.kdl new file mode 100644 index 0000000..b084ded --- /dev/null +++ b/tests/test_cases/input/unicode_lro.kdl @@ -0,0 +1,2 @@ +// 0x202D +node ‭arg diff --git a/tests/test_cases/input/unicode_pdf.kdl b/tests/test_cases/input/unicode_pdf.kdl new file mode 100644 index 0000000..9b94fad --- /dev/null +++ b/tests/test_cases/input/unicode_pdf.kdl @@ -0,0 +1,2 @@ +// 0x202C +node ‬arg diff --git a/tests/test_cases/input/unicode_pdi.kdl b/tests/test_cases/input/unicode_pdi.kdl new file mode 100644 index 0000000..d92d2d7 --- /dev/null +++ b/tests/test_cases/input/unicode_pdi.kdl @@ -0,0 +1,2 @@ +// 0x2069 +node ⁩arg diff --git a/tests/test_cases/input/unicode_rle.kdl b/tests/test_cases/input/unicode_rle.kdl new file mode 100644 index 0000000..3b46610 --- /dev/null +++ b/tests/test_cases/input/unicode_rle.kdl @@ -0,0 +1,2 @@ +// 0x202B +node1 ‫arg diff --git a/tests/test_cases/input/unicode_rli.kdl b/tests/test_cases/input/unicode_rli.kdl new file mode 100644 index 0000000..92902ed --- /dev/null +++ b/tests/test_cases/input/unicode_rli.kdl @@ -0,0 +1,2 @@ +// 0x2067 +node1 ⁧arg diff --git a/tests/test_cases/input/unicode_rlm.kdl b/tests/test_cases/input/unicode_rlm.kdl new file mode 100644 index 0000000..bfa63c8 --- /dev/null +++ b/tests/test_cases/input/unicode_rlm.kdl @@ -0,0 +1,2 @@ +// 0x200F +node ‏arg diff --git a/tests/test_cases/input/unicode_rlo.kdl b/tests/test_cases/input/unicode_rlo.kdl new file mode 100644 index 0000000..98c848b --- /dev/null +++ b/tests/test_cases/input/unicode_rlo.kdl @@ -0,0 +1,2 @@ +// 0x202E +node ‮arg diff --git a/tests/test_cases/input/unicode_scalar_high.kdl b/tests/test_cases/input/unicode_scalar_high.kdl new file mode 100644 index 0000000..fb1abb4 --- /dev/null +++ b/tests/test_cases/input/unicode_scalar_high.kdl @@ -0,0 +1,2 @@ +// 0xDFFF (last code point before 0xE000) +node �arg diff --git a/tests/test_cases/input/unicode_scalar_low.kdl b/tests/test_cases/input/unicode_scalar_low.kdl new file mode 100644 index 0000000..010d0b1 --- /dev/null +++ b/tests/test_cases/input/unicode_scalar_low.kdl @@ -0,0 +1,2 @@ +// 0xD800 (first code point after 0xD7FF) +node �arg diff --git a/tests/test_cases/input/unicode_under_0x20.kdl b/tests/test_cases/input/unicode_under_0x20.kdl new file mode 100644 index 0000000..967a87a --- /dev/null +++ b/tests/test_cases/input/unicode_under_0x20.kdl @@ -0,0 +1,2 @@ +// 0x0019 +node1 arg diff --git a/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl b/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl index e37de20..9281f70 100644 --- a/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl +++ b/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl @@ -1 +1 @@ -"foo123~!@#$%^&*.:'|?+" "weeee" \ No newline at end of file +"foo123~!@#$%^&*.:'|?+<>," weeee diff --git a/tests/test_cases/input/unusual_chars_in_bare_id.kdl b/tests/test_cases/input/unusual_chars_in_bare_id.kdl index d2dcd19..317e824 100644 --- a/tests/test_cases/input/unusual_chars_in_bare_id.kdl +++ b/tests/test_cases/input/unusual_chars_in_bare_id.kdl @@ -1 +1 @@ -foo123~!@#$%^&*.:'|?+ "weeee" +foo123~!@#$%^&*.:'|?+<>, weeee diff --git a/tests/test_cases/input/vertical_tab_whitespace.kdl b/tests/test_cases/input/vertical_tab_whitespace.kdl new file mode 100644 index 0000000..507d3a0 --- /dev/null +++ b/tests/test_cases/input/vertical_tab_whitespace.kdl @@ -0,0 +1 @@ +node arg From 50d378f1db1f8d8a3b87872ecd017d54557f7e31 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sat, 16 Dec 2023 15:50:40 -0800 Subject: [PATCH 038/105] update readme a bit --- README.md | 52 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index f990669..83913c7 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,9 @@ # The KDL Document Language -KDL is a small, pleasing document language with xml-like semantics that looks -like you're invoking a bunch of CLI commands! It's meant to be used both as a -serialization format and a configuration language, much like JSON, YAML, or -XML. It looks like this: +KDL is a small, pleasant document language with XML-like node semantics that +looks like you're invoking a bunch of CLI commands! It's meant to be used both +as a serialization format and a configuration language, much like JSON, YAML, +or XML. It looks like this: ```kdl package { @@ -17,7 +17,7 @@ package { } scripts { - // "Raw" and multi-line strings are supported. + // "Raw" and dedented multi-line strings are supported. build #" echo "foo" node -c "console.log('hello, world!');" @@ -100,7 +100,7 @@ entirety, but in the future, may be required to in order to be included here. ### Basics -A KDL node is a node name, followed by zero or more "arguments", and +A KDL node is a node name string, followed by zero or more "arguments", and children. ```kdl @@ -113,7 +113,7 @@ You can also have multiple values in a single node! bookmarks 12 15 188 1234 ``` -Nodes can have properties. +Nodes can have properties, with string keys. ```kdl author "Alex Monad" email=alex@example.com active=#true @@ -141,7 +141,7 @@ node1; node2; node3; KDL supports 4 data types: -* Strings: `"hello world"` or just `foo` +* Strings: `unquoted`, `"hello world"`, or `#"hello world"#` * Numbers: `123.45` * Booleans: `#true` and `#false` * Null: `#null` @@ -156,9 +156,18 @@ node2 "this\nhas\tescapes" node3 #"C:\Users\zkat\raw\string"# ``` +You don't have to quote strings unless they contain whitespace, or if any the +following apply: + * The string contains `[]{}()\/#=";`. + * The string contains whitespace. + * The string is one of `true`, `false`, or `null`. + * The strings starts with a digit, or `+`/`-` and a digit. + +In essence, if it can get confused for other KDL syntax, it needs quotes. + Both types of quoted string can be multiline as-is, without a different syntax. Additionally, these multi-line strings will be "dedented" according to -the indentation of the least-indented line: +the common indentation that all lines share: ```kdl string " @@ -168,8 +177,21 @@ string " " ``` -Raw strings, you can add any number of `#`s before and after the opening and -closing `#` to disambiguate literal `#"` sequences: +Raw strings, which do not support `\` escapes and can be used when you want +certain kinds of strings to look nicer without having to escape a lot: + +```kdl +exec #" + echo "foo" + echo "bar" + cd C:\path\to\dir +"# + +regex #"\d{3} "[^/"]+""# +``` + +You can add any number of `#`s before and after the opening and +closing `#` to disambiguate literal closing `#"` sequences: ```kdl other-raw ##"hello"#world"## @@ -177,7 +199,7 @@ other-raw ##"hello"#world"## #### Numbers -There's 4 ways to represent numbers in KDL. KDL does not prescribe any +There are 4 ways to represent numbers in KDL. KDL does not prescribe any representation for these numbers, and it's entirely up to individual implementations whether to represent all numbers with a single type, or to have different representations for different forms. @@ -265,9 +287,9 @@ title \ // Files must be utf8 encoded! smile 😁 -// Instead of anonymous nodes, nodes and properties can be wrapped -// in "" for arbitrary node names. -"!@#$@$%\\/()[]Q#$%~@!40" "1.2.3" "#null"=#true +// Node names and property keys are just strings, so you can write them like +// quoted or raw strings, too! +"illegal{}[]/\\=#;identifier" #"1.2.3"# "#false"=#true // Identifiers are very flexible. The following is a legal bare identifier: <@foo123~!$%^&*.:'|?+> From 90cd0b1bb90d99ab71a396844b2fc9c2d3fa3ab7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sat, 16 Dec 2023 16:09:13 -0800 Subject: [PATCH 039/105] make unicodey equals signs valid property assignment characters --- CHANGELOG.md | 4 ++++ README.md | 8 +++++--- SPEC.md | 20 ++++++++++++++++--- .../expected_kdl/unicode_equals_signs.kdl | 1 + .../test_cases/input/unicode_equals_signs.kdl | 4 ++++ 5 files changed, 31 insertions(+), 6 deletions(-) create mode 100644 tests/test_cases/expected_kdl/unicode_equals_signs.kdl create mode 100644 tests/test_cases/input/unicode_equals_signs.kdl diff --git a/CHANGELOG.md b/CHANGELOG.md index 211e461..1eb927b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -58,6 +58,10 @@ least-indented line in the body. Multiline strings and raw strings now must have a newline immediately following their opening `"`, and a final newline preceding the closing `"`. +* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY + EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for + properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare + identifiers. ### KQL diff --git a/README.md b/README.md index 83913c7..f7e18b3 100644 --- a/README.md +++ b/README.md @@ -158,10 +158,12 @@ node3 #"C:\Users\zkat\raw\string"# You don't have to quote strings unless they contain whitespace, or if any the following apply: - * The string contains `[]{}()\/#=";`. + * The string contains `[]{}()\/#";`. * The string contains whitespace. * The string is one of `true`, `false`, or `null`. * The strings starts with a digit, or `+`/`-` and a digit. + * The string contains an equals sign (including unicode equals signs `﹦`, + `=`, and `🟰`). In essence, if it can get confused for other KDL syntax, it needs quotes. @@ -294,8 +296,8 @@ smile 😁 // Identifiers are very flexible. The following is a legal bare identifier: <@foo123~!$%^&*.:'|?+> -// And you can also use unicode! -ノード お名前=☜(゚ヮ゚☜) +// And you can also use unicode, even for the equals sign! +ノード お名前=☜(゚ヮ゚☜) // kdl specifically allows properties and values to be // interspersed with each other, much like CLI commands. diff --git a/SPEC.md b/SPEC.md index dc8055e..6ae5856 100644 --- a/SPEC.md +++ b/SPEC.md @@ -137,7 +137,8 @@ negative number. The following characters cannot be used anywhere in a [Bare Identifier](#identifier): -* Any of `(){}[]/\="#;` +* Any of `(){}[]/\"#;` +* Any [Equals Sign](#equals-sign) * Any [Whitespace](#whitespace) or [Newline](#newline). * Any [disallowed literal code points](#disallowed-literal-code-points) in KDL documents. @@ -163,7 +164,8 @@ my-node 1 2 \ // comments are ok after \ ### Property A Property is a key/value pair attached to a [Node](#node). A Property is -composed of an [Identifier](#identifier), followed immediately by a `=`, and then a [Value](#value). +composed of an [Identifier](#identifier), followed immediately by an [equals +sign](#equals-sign), and then a [Value](#value). Properties should be interpreted left-to-right, with rightmost properties with identical names overriding earlier properties. That is: @@ -181,6 +183,17 @@ still be spec-compliant. Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and make it act as plain whitespace, even if it spreads across multiple lines. +#### Equals Sign + +Any of the following characters may be used as equals signs in properties: + +| Name | Character | Code Point | +|----|-----|----| +| EQUALS SIGN | `=` | `U+003D` | +| SMALL EQUALS SIGN | `﹦` | `U+FE66` | +| FULLWIDTH EQUALS SIGN | `=` | `U+FF1D` | +| HEAVY EQUALS SIGN | `🟰` | `U+1F7F0` | + ### Argument An Argument is a bare [Value](#value) attached to a [Node](#node), with no @@ -600,9 +613,10 @@ numberish-ident := sign ((identifier-char - digit) identifier-char*)? identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points keyword := '#' (boolean | 'null') -prop := identifier optional-node-space '=' optional-node-space value +prop := identifier optional-node-space equals-sign optional-node-space value value := type? optional-node-space (identifier | string | number | keyword) type := '(' optional-node-space identifier optional-node-space ')' +equals-sign := See Table (Equals Sign) string := raw-string | escaped-string escaped-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' diff --git a/tests/test_cases/expected_kdl/unicode_equals_signs.kdl b/tests/test_cases/expected_kdl/unicode_equals_signs.kdl new file mode 100644 index 0000000..4ab6443 --- /dev/null +++ b/tests/test_cases/expected_kdl/unicode_equals_signs.kdl @@ -0,0 +1 @@ +node p1=val1 p2=val2 p3=val3 diff --git a/tests/test_cases/input/unicode_equals_signs.kdl b/tests/test_cases/input/unicode_equals_signs.kdl new file mode 100644 index 0000000..37d8e02 --- /dev/null +++ b/tests/test_cases/input/unicode_equals_signs.kdl @@ -0,0 +1,4 @@ +node \ + p1﹦val1 \ // U+FE66 + p2=val2 \ // U+FF1D + p3🟰val3 // U+1F7F0 From 0022536fc7a5e2acac822bd310bf7cd7359ac5a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sat, 16 Dec 2023 16:09:57 -0800 Subject: [PATCH 040/105] small rewording --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f7e18b3..f369cc1 100644 --- a/README.md +++ b/README.md @@ -158,7 +158,7 @@ node3 #"C:\Users\zkat\raw\string"# You don't have to quote strings unless they contain whitespace, or if any the following apply: - * The string contains `[]{}()\/#";`. + * The string contains any of `[]{}()\/#";`. * The string contains whitespace. * The string is one of `true`, `false`, or `null`. * The strings starts with a digit, or `+`/`-` and a digit. From 39b9fac0d330854c91dda0f6483f3503ef027a12 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sat, 16 Dec 2023 20:39:18 -0800 Subject: [PATCH 041/105] fix stray quote --- examples/nuget.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/nuget.kdl b/examples/nuget.kdl index 0319999..033f8da 100644 --- a/examples/nuget.kdl +++ b/examples/nuget.kdl @@ -82,7 +82,7 @@ Project { Reference Include=System.Net.Http Reference Include=System.Xml Reference Include=System.Xml.Linq - Reference Include=NuGet.Core" { + Reference Include=NuGet.Core { HintPath #"$(SolutionPackagesFolder)nuget.core\2.14.0-rtm-832\lib\net40-Client\NuGet.Core.dll"# Aliases CoreV2 } From 055de4e1beb2db21c928e2b5801a781a3c154502 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sat, 16 Dec 2023 21:44:25 -0800 Subject: [PATCH 042/105] better organization of how we talk about identifiers/strings and comments --- SPEC.md | 214 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 119 insertions(+), 95 deletions(-) diff --git a/SPEC.md b/SPEC.md index 6ae5856..6fee66c 100644 --- a/SPEC.md +++ b/SPEC.md @@ -50,8 +50,8 @@ baz ### Node Being a node-oriented language means that the real core component of any KDL -document is the "node". Every node must have a name, which is an -[Identifier](#identifier). +document is the "node". Every node must have a name, which must be a +[String](#string). The name may be preceded by a [Type Annotation](#type-annotation) to further clarify its type, particularly in relation to its parent node. (For example, @@ -75,9 +75,9 @@ By contrast, Property order _SHOULD NOT_ matter to implementations. [Children](#children-block) should be used if an order-sensitive key/value data structure must be represented in KDL. -Nodes _MAY_ be prefixed with `/-` to "comment out" the entire node, including -its properties, arguments, and children, and make it act as plain whitespace, -even if it spreads across multiple lines. +Nodes _MAY_ be prefixed with [Slashdash](#slashdash-comments) to "comment out" +the entire node, including its properties, arguments, and children, and make +it act as plain whitespace, even if it spreads across multiple lines. Finally, a node is terminated by either a [Newline](#newline), a semicolon (`;`) or the end of the file/stream (an `EOF`). @@ -85,64 +85,12 @@ or the end of the file/stream (an `EOF`). #### Example ```kdl -foo 1 key="val" 3 { +foo 1 key=val 3 { bar (role)baz 1 2 } ``` -### Identifier - -An Identifier is either a [Bare Identifier](#bare-identifier), which is an -unquoted string like `node` or `item`, a [String](#string), or a [Raw String](#raw-string). -There's no semantic difference between the kinds of identifier; this simply allows -for the use of quotes to have unusual identifiers that are inexpressible as bare identifiers. - -### Bare Identifier - -A Bare Identifier is composed of any [Unicode Scalar -Value](https://unicode.org/glossary/#unicode_scalar_value) other than -[non-initial characters](#non-initial-characters), followed by any number of -Unicode Scalar Values other than [non-identifier -characters](#non-identifier-characters), so long as this doesn't produce -something confusable for a [Number](#number). For example, both a -[Number](#number) and an Identifier can start with `-`, but when an Identifier -starts with `-` the second character cannot be a digit. This is precicely -specified in the [Full Grammar](#full-grammar) below. - -When Identifiers are used as the values in [Arguments](#argument) and -[Properties](#property), they are treated as strings, just like they are with -node names and property keys. - -Bare Identifiers are terminated by [Whitespace](#whitespace) or -[Newlines](#newline). - -The literal identifiers `true`, `false`, and `null` are illegal Bare Identifiers, -and _MUST_ be treated as a syntax error. - -### Non-initial characters - -The following characters cannot be the first character in a -[Bare Identifier](#identifier): - -* Any decimal digit (0-9) -* Any [non-identifier characters](#non-identifier-characters) - -Additionally, the `-` character can only be used as an initial character if -the second character is *not* a digit. This allows identifiers to look like -`--this`, and removes the ambiguity of having an identifier look like a -negative number. - -### Non-identifier characters - -The following characters cannot be used anywhere in a [Bare Identifier](#identifier): - -* Any of `(){}[]/\"#;` -* Any [Equals Sign](#equals-sign) -* Any [Whitespace](#whitespace) or [Newline](#newline). -* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL - documents. - ### Line Continuation Line continuations allow [Nodes](#node) to be spread across multiple lines. @@ -164,7 +112,7 @@ my-node 1 2 \ // comments are ok after \ ### Property A Property is a key/value pair attached to a [Node](#node). A Property is -composed of an [Identifier](#identifier), followed immediately by an [equals +composed of a [String](#string), followed immediately by an [equals sign](#equals-sign), and then a [Value](#value). Properties should be interpreted left-to-right, with rightmost properties with @@ -234,11 +182,12 @@ parent { child1; child2; } ### Value -A value is either: an [Identifier](#identifier), a [String](#string), a -[Number](#number), a [Boolean](#boolean), or [Null](#null). +A value is either: a [String](#string), a [Number](#number), a +[Boolean](#boolean), or [Null](#null). Values _MUST_ be either [Arguments](#argument) or values of -[Properties](#property). +[Properties](#property). Only [String](#string) values may be used as +[Node](#node) names or [Property](#property) keys. Values (both as arguments and as properties) _MAY_ be prefixed by a single [Type Annotation](#type-annotation). @@ -251,7 +200,7 @@ or as a _context-specific elaboration_ of the more generic type the node name indicates. Type annotations are written as a set of `(` and `)` with a single -[Identifier](#identifier) in it. It may contain Whitespace after the `(` and before +[String](#string) in it. It may contain Whitespace after the `(` and before the `)`, and may be separated from its target by Whitespace. KDL does not specify any restrictions on what implementations might do with @@ -331,40 +280,64 @@ node prop=(regex).* ### String -Strings in KDL represent textual [Values](#value), or unusual identifiers. A -String is either a [Quoted String](#quoted-string) or a -[Raw String](#raw-string). Quoted Strings may include escaped characters, while -Raw Strings always contain only the literal characters that are present. +Strings in KDL represent textual UTF-8 [Values](#value). A String is either an +[Identifier String](#identifier-string), a [Quoted String](#quoted-string) or +a [Raw String](#raw-string). Quoted Strings may include escaped characters, +while Raw Strings always contain only the literal characters that are present. +Identifier Strings don't user delimiters. Strings _MUST_ be represented as UTF-8 values. -Strings _MUST NOT_ include the code points for [disallowed literal -code points](#disallowed-literal-code-points) directly. If needed, they can be -specified with their corresponding `\u{}` escape. +Strings _MUST NOT_ include the code points for [disallowed literal code +points](#disallowed-literal-code-points) directly. Quoted Strings may include +these code points as _values_ by representing them with their corresponding +`\u{...}` escape. -### Multi-line Strings +### Identifier String -Strings may span multiple lines with literal Newlines, in which case the -resulting String is "dedented" according to the line with the fewest number of -Whitespace characters preceding the first non-Whitespace character. That is, -the number of literal Whitespace characters in the least-indented line in the String -body is subtracted from the Whitespace of all other lines. +An Identifier String (sometimes referred to as just an "identifier") is +composed of any [Unicode Scalar +Value](https://unicode.org/glossary/#unicode_scalar_value) other than +[non-initial characters](#non-initial-characters), followed by any number of +Unicode Scalar Values other than [non-identifier +characters](#non-identifier-characters), so long as this doesn't produce +something confusable for a [Number](#number). For example, both a +[Number](#number) and an Identifier can start with `-`, but when an Identifier +starts with `-` the second character cannot be a digit. This is precicely +specified in the [Full Grammar](#full-grammar) below. -Multi-line strings _MUST_ have a single [Newline](#newline) immediately -following their opening `"`, after which they may have any number of newlines. -Finally, there must be a Newline, followed by any number of Whitespace, before -the closing `"`. +When Identifiers are used as the values in [Arguments](#argument) and +[Properties](#property), they are treated as strings, just like they are with +node names and property keys. -The first Newline, the last Newline, along with Whitespace following the last -Newline, are not included in the value of the String. The first and last -Newline can be the same character (that is, empty multi-line strings are -legal). +Identifier Strings are terminated by [Whitespace](#whitespace) or +[Newlines](#newline). -Furthermore, any lines in the string body that only contain literal whitespace -are stripped to only contain the single Newline character. +The literal identifiers `true`, `false`, and `null` are illegal Identifier +Strings, and _MUST_ be treated as a syntax error. -Strings with literal Newlines that do not immediately start with a Newline and -whose final `"` is not preceeded by whitespace and a Newline are illegal. +#### Non-initial characters + +The following characters cannot be the first character in an +[Identifier String](#identifier-string): + +* Any decimal digit (0-9) +* Any [non-identifier characters](#non-identifier-characters) + +Additionally, the `-` character can only be used as an initial character if +the second character is *not* a digit. This allows identifiers to look like +`--this`, and removes the ambiguity of having an identifier look like a +negative number. + +#### Non-identifier characters + +The following characters cannot be used anywhere in a [Identifier String](#identifier-string): + +* Any of `(){}[]/\"#;` +* Any [Equals Sign](#equals-sign) +* Any [Whitespace](#whitespace) or [Newline](#newline). +* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL + documents. ### Quoted String @@ -377,7 +350,8 @@ purposes. Like Strings, Quoted Strings _MUST NOT_ include any of the [disallowed literal code-points](#disallowed-literal-code-points) as code points in their body. -Quoted Strings also follow the Multi-line rules specified in [String](#string). +Quoted Strings also follow the Multi-line rules specified in [Multi-line +String](#multi-line-strings). #### Escapes @@ -441,9 +415,9 @@ a `"` followed by a _matching_ number of `#` characters. This means that the string sequence `"` or `"#` and such must not match the closing `"` with the same or more `#` characters as the opening `#`, in the body of the string. -Like Strings, Raw Strings _MUST NOT_ include any of the [disallowed literal -code-points](#disallowed-literal-code-points) as code points in their body. -Unlike with Strings, these cannot simply be escaped, and are thus +Like other Strings, Raw Strings _MUST NOT_ include any of the [disallowed +literal code-points](#disallowed-literal-code-points) as code points in their +body. Unlike with Quoted Strings, these cannot simply be escaped, and are thus unrepresentable when using Raw Strings. #### Example @@ -476,6 +450,31 @@ This is the base indentation bar ``` +### Multi-line Strings + +Quoted and Raw Strings may span multiple lines with literal Newlines, in which +case the resulting String is "dedented" according to the line with the fewest +number of Whitespace characters preceding the first non-Whitespace character. +That is, the number of literal Whitespace characters in the least-indented +line in the String body is subtracted from the Whitespace of all other lines. + +Multi-line strings _MUST_ have a single [Newline](#newline) immediately +following their opening `"`, after which they may have any number of newlines. +Finally, there must be a Newline, followed by any number of Whitespace, before +the closing `"`. + +The first Newline, the last Newline, along with Whitespace following the last +Newline, are not included in the value of the String. The first and last +Newline can be the same character (that is, empty multi-line strings are +legal). + +Furthermore, any lines in the string body that only contain literal whitespace +are stripped to only contain the single Newline character. + +Strings with literal Newlines that do not immediately start with a Newline and +whose final `"` is not preceeded by whitespace and a Newline are illegal. + + ### Number Numbers in KDL represent numerical [Values](#value). There is no logical distinction in KDL @@ -545,6 +544,11 @@ space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt): | Medium Mathematical Space | `U+205F` | | Ideographic Space | `U+3000` | +#### Single-line comments + +Any text after `//`, until the next literal [Newline](#newline) is "commented +out", and is considered to be [Whitespace](#whitespace). + #### Multi-line comments In addition to single-line comments using `//`, comments can also be started @@ -552,6 +556,23 @@ with `/*` and ended with `*/`. These comments can span multiple lines. They are allowed in all positions where [Whitespace](#whitespace) is allowed and can be nested. +#### Slashdash comments + +Finally, a special kind of comment called a "slashdash", denoted by `/-`, can +be used to comment out entire _components_ of a KDL document logically, and +have those elements be treated as whitespace. + +Slashdash comments can be used before: + +* A [Node](#node) name (or its type annotation): the entire Node is + treated as Whitespace, including all props, args, and children. +* A node [Argument](#argument) (or its type annotation), in which case + the Argument value is treated as Whitespace. +* A [Property](#property) key, in which case the entire property, both + key and value, is treated as Whitespace. +* A [Children Block](#children-block), in which case the entire block, + including all children within, is treated as Whitespace. + ### Newline The following characters [should be treated as new @@ -574,10 +595,13 @@ Note that for the purpose of new lines, CRLF is considered _a single newline_. The following code points may not appear literally anywhere in the document. They may be represented in Strings (but not Raw Strings) using `\u{}`. -* Any codepoint with hexadecimal value `0x20` or below (various control characters). +* Any codepoint with hexadecimal value `0x20` or below (various control + characters). * `0x7F` (the Delete control character). -* Any codepoint that is not a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value). -* `0x2066-2069` and `0x202A-202E`, the [unicode "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) +* Any codepoint that is not a [Unicode Scalar + Value](https://unicode.org/glossary/#unicode_scalar_value). +* `0x2066-2069` and `0x202A-202E`, the [unicode "direction control" + characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) ## Full Grammar From 511ab6b6ff25499cb79fee73d71af9ba0a402237 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 17 Dec 2023 10:01:42 -0800 Subject: [PATCH 043/105] missed a spot --- .../expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl | 2 +- tests/test_cases/input/unusual_chars_in_bare_id.kdl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl b/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl index 317e824..8321632 100644 --- a/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl +++ b/tests/test_cases/expected_kdl/unusual_bare_id_chars_in_quoted_id.kdl @@ -1 +1 @@ -foo123~!@#$%^&*.:'|?+<>, weeee +foo123~!@$%^&*.:'|?+<>, weeee diff --git a/tests/test_cases/input/unusual_chars_in_bare_id.kdl b/tests/test_cases/input/unusual_chars_in_bare_id.kdl index 317e824..8321632 100644 --- a/tests/test_cases/input/unusual_chars_in_bare_id.kdl +++ b/tests/test_cases/input/unusual_chars_in_bare_id.kdl @@ -1 +1 @@ -foo123~!@#$%^&*.:'|?+<>, weeee +foo123~!@$%^&*.:'|?+<>, weeee From d4333322d9b7ef84246d2b03b84e7d98e9e5f699 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 17 Dec 2023 13:24:00 -0800 Subject: [PATCH 044/105] Add LRM/RLM to the direction control char list --- SPEC.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 6fee66c..70661eb 100644 --- a/SPEC.md +++ b/SPEC.md @@ -600,7 +600,8 @@ They may be represented in Strings (but not Raw Strings) using `\u{}`. * `0x7F` (the Delete control character). * Any codepoint that is not a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value). -* `0x2066-2069` and `0x202A-202E`, the [unicode "direction control" +* `0x2066-2069`, `0x202A-202E`, `0x200E`, and `0x200F`, the [unicode + "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) ## Full Grammar From d53d99ff2e1519dab0bc9a636b863ed14288a05b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 17 Dec 2023 13:37:37 -0800 Subject: [PATCH 045/105] test fixes --- tests/test_cases/expected_kdl/all_escapes.kdl | 2 +- tests/test_cases/expected_kdl/arg_bare.kdl | 2 +- tests/test_cases/expected_kdl/dash_dash.kdl | 2 +- tests/test_cases/expected_kdl/empty_quoted_prop_key.kdl | 2 +- tests/test_cases/expected_kdl/escline.kdl | 2 +- tests/test_cases/expected_kdl/null_arg.kdl | 2 +- tests/test_cases/expected_kdl/prop_identifier_type.kdl | 1 - tests/test_cases/expected_kdl/question_mark_before_number.kdl | 2 +- tests/test_cases/input/arg_bare.kdl | 2 +- tests/test_cases/input/dash_dash.kdl | 2 +- tests/test_cases/input/prop_identifier_type.kdl | 1 - tests/test_cases/input/question_mark_before_number.kdl | 2 +- tests/test_cases/input/raw_string_arg.kdl | 4 ++-- 13 files changed, 12 insertions(+), 14 deletions(-) diff --git a/tests/test_cases/expected_kdl/all_escapes.kdl b/tests/test_cases/expected_kdl/all_escapes.kdl index 5c49748..de0d0a0 100644 --- a/tests/test_cases/expected_kdl/all_escapes.kdl +++ b/tests/test_cases/expected_kdl/all_escapes.kdl @@ -1 +1 @@ -node "\"\\\b\f\n\r\t\s" +node "\"\\\b\f\n\r\t " diff --git a/tests/test_cases/expected_kdl/arg_bare.kdl b/tests/test_cases/expected_kdl/arg_bare.kdl index ec2a21f..2fa9785 100644 --- a/tests/test_cases/expected_kdl/arg_bare.kdl +++ b/tests/test_cases/expected_kdl/arg_bare.kdl @@ -1 +1 @@ -node a \ No newline at end of file +node a diff --git a/tests/test_cases/expected_kdl/dash_dash.kdl b/tests/test_cases/expected_kdl/dash_dash.kdl index 759ddc5..9f6111a 100644 --- a/tests/test_cases/expected_kdl/dash_dash.kdl +++ b/tests/test_cases/expected_kdl/dash_dash.kdl @@ -1 +1 @@ -node -- \ No newline at end of file +node -- diff --git a/tests/test_cases/expected_kdl/empty_quoted_prop_key.kdl b/tests/test_cases/expected_kdl/empty_quoted_prop_key.kdl index e6e1310..e541793 100644 --- a/tests/test_cases/expected_kdl/empty_quoted_prop_key.kdl +++ b/tests/test_cases/expected_kdl/empty_quoted_prop_key.kdl @@ -1 +1 @@ -node ""="empty" +node ""=empty diff --git a/tests/test_cases/expected_kdl/escline.kdl b/tests/test_cases/expected_kdl/escline.kdl index b3a0426..1b3db2c 100644 --- a/tests/test_cases/expected_kdl/escline.kdl +++ b/tests/test_cases/expected_kdl/escline.kdl @@ -1 +1 @@ -node "arg" +node arg diff --git a/tests/test_cases/expected_kdl/null_arg.kdl b/tests/test_cases/expected_kdl/null_arg.kdl index c0e6cb5..bed8dbf 100644 --- a/tests/test_cases/expected_kdl/null_arg.kdl +++ b/tests/test_cases/expected_kdl/null_arg.kdl @@ -1 +1 @@ -node null +node #null diff --git a/tests/test_cases/expected_kdl/prop_identifier_type.kdl b/tests/test_cases/expected_kdl/prop_identifier_type.kdl index bf1b9a7..7df052b 100644 --- a/tests/test_cases/expected_kdl/prop_identifier_type.kdl +++ b/tests/test_cases/expected_kdl/prop_identifier_type.kdl @@ -1,2 +1 @@ node key=(type)str - diff --git a/tests/test_cases/expected_kdl/question_mark_before_number.kdl b/tests/test_cases/expected_kdl/question_mark_before_number.kdl index 532ef22..7745a9e 100644 --- a/tests/test_cases/expected_kdl/question_mark_before_number.kdl +++ b/tests/test_cases/expected_kdl/question_mark_before_number.kdl @@ -1 +1 @@ -node ?15 \ No newline at end of file +node ?15 diff --git a/tests/test_cases/input/arg_bare.kdl b/tests/test_cases/input/arg_bare.kdl index ec2a21f..2fa9785 100644 --- a/tests/test_cases/input/arg_bare.kdl +++ b/tests/test_cases/input/arg_bare.kdl @@ -1 +1 @@ -node a \ No newline at end of file +node a diff --git a/tests/test_cases/input/dash_dash.kdl b/tests/test_cases/input/dash_dash.kdl index 759ddc5..9f6111a 100644 --- a/tests/test_cases/input/dash_dash.kdl +++ b/tests/test_cases/input/dash_dash.kdl @@ -1 +1 @@ -node -- \ No newline at end of file +node -- diff --git a/tests/test_cases/input/prop_identifier_type.kdl b/tests/test_cases/input/prop_identifier_type.kdl index bf1b9a7..7df052b 100644 --- a/tests/test_cases/input/prop_identifier_type.kdl +++ b/tests/test_cases/input/prop_identifier_type.kdl @@ -1,2 +1 @@ node key=(type)str - diff --git a/tests/test_cases/input/question_mark_before_number.kdl b/tests/test_cases/input/question_mark_before_number.kdl index 532ef22..7745a9e 100644 --- a/tests/test_cases/input/question_mark_before_number.kdl +++ b/tests/test_cases/input/question_mark_before_number.kdl @@ -1 +1 @@ -node ?15 \ No newline at end of file +node ?15 diff --git a/tests/test_cases/input/raw_string_arg.kdl b/tests/test_cases/input/raw_string_arg.kdl index cf4a86c..05cf37e 100644 --- a/tests/test_cases/input/raw_string_arg.kdl +++ b/tests/test_cases/input/raw_string_arg.kdl @@ -1,2 +1,2 @@ -node_1 r#""arg\n"and #stuff"# -node_2 r##"#"arg\n"#and #stuff"## +node_1 #""arg\n"and #stuff"# +node_2 ##"#"arg\n"#and #stuff"## From 057e8c894dffb9dfbc5596406d55f5c4d4768eca Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 13:04:35 -0800 Subject: [PATCH 046/105] Rewrite intro paragraph for strings to make their usage clearer. --- SPEC.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/SPEC.md b/SPEC.md index 70661eb..ab30920 100644 --- a/SPEC.md +++ b/SPEC.md @@ -281,10 +281,8 @@ node prop=(regex).* ### String Strings in KDL represent textual UTF-8 [Values](#value). A String is either an -[Identifier String](#identifier-string), a [Quoted String](#quoted-string) or -a [Raw String](#raw-string). Quoted Strings may include escaped characters, -while Raw Strings always contain only the literal characters that are present. -Identifier Strings don't user delimiters. +[Identifier String](#identifier-string) (like `foo`), a [Quoted String](#quoted-string) (like `"foo"`) or +a [Raw String](#raw-string) (like `#"foo"#`). Identifier Strings let you write short, "single-word" strings with a minimum of syntax; Quoted Strings let you write strings with whitespace (including newlines!) or escapes; Raw Strings let you write strings with whitespace *but without escapes*, allowing you to not worry about the string's content containing anything that might look like an escape. Strings _MUST_ be represented as UTF-8 values. From 419995ff19d764351747fadafbf0198b65b128ef Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 13:04:44 -0800 Subject: [PATCH 047/105] typos --- SPEC.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/SPEC.md b/SPEC.md index ab30920..937ccba 100644 --- a/SPEC.md +++ b/SPEC.md @@ -301,7 +301,7 @@ Unicode Scalar Values other than [non-identifier characters](#non-identifier-characters), so long as this doesn't produce something confusable for a [Number](#number). For example, both a [Number](#number) and an Identifier can start with `-`, but when an Identifier -starts with `-` the second character cannot be a digit. This is precicely +starts with `-` the second character cannot be a digit. This is precisely specified in the [Full Grammar](#full-grammar) below. When Identifiers are used as the values in [Arguments](#argument) and @@ -345,7 +345,7 @@ string characters except unescaped `"` and `\`. This includes literal multiple lines without behaving like a Newline for [Node](#node) parsing purposes. -Like Strings, Quoted Strings _MUST NOT_ include any of the [disallowed literal +Like Identifier Strings, Quoted Strings _MUST NOT_ include any of the [disallowed literal code-points](#disallowed-literal-code-points) as code points in their body. Quoted Strings also follow the Multi-line rules specified in [Multi-line From 6d359d2e4c37b8df481c35110bbb3b291857a191 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 13:05:10 -0800 Subject: [PATCH 048/105] Remove now-irrelevant comment about idents acting like strings (they *are* strings now). --- SPEC.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/SPEC.md b/SPEC.md index 937ccba..4aadf4a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -304,10 +304,6 @@ something confusable for a [Number](#number). For example, both a starts with `-` the second character cannot be a digit. This is precisely specified in the [Full Grammar](#full-grammar) below. -When Identifiers are used as the values in [Arguments](#argument) and -[Properties](#property), they are treated as strings, just like they are with -node names and property keys. - Identifier Strings are terminated by [Whitespace](#whitespace) or [Newlines](#newline). From b635470ab20f63cab2f9995b68b43b86e59cc57a Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 13:06:22 -0800 Subject: [PATCH 049/105] be more specific --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 4aadf4a..39e5801 100644 --- a/SPEC.md +++ b/SPEC.md @@ -349,7 +349,7 @@ String](#multi-line-strings). #### Escapes -In addition to literal code points, a number of "escapes" are supported. +In addition to literal code points, a number of "escapes" are supported in Quoted Strings. "Escapes" are the character `\` followed by another character, and are interpreted as described in the following table: From 491cc46f89df0261b494ccc89c2bc7b243467bb2 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 13:16:55 -0800 Subject: [PATCH 050/105] Fix the disallowed low ASCIIs --- SPEC.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/SPEC.md b/SPEC.md index 39e5801..adef091 100644 --- a/SPEC.md +++ b/SPEC.md @@ -589,8 +589,9 @@ Note that for the purpose of new lines, CRLF is considered _a single newline_. The following code points may not appear literally anywhere in the document. They may be represented in Strings (but not Raw Strings) using `\u{}`. -* Any codepoint with hexadecimal value `0x20` or below (various control - characters). +* The codepoints `U+0000`-`U+0009`, + the codepoint `U+000B`, + or the codepoints `U+000E`-`U+001F` (various control characters). * `0x7F` (the Delete control character). * Any codepoint that is not a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value). From 6d091fd49329a6d03d25de9488f39173840cc1f8 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 13:18:01 -0800 Subject: [PATCH 051/105] Use consistent codepoint spelling --- SPEC.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/SPEC.md b/SPEC.md index adef091..23fcfc3 100644 --- a/SPEC.md +++ b/SPEC.md @@ -589,13 +589,13 @@ Note that for the purpose of new lines, CRLF is considered _a single newline_. The following code points may not appear literally anywhere in the document. They may be represented in Strings (but not Raw Strings) using `\u{}`. -* The codepoints `U+0000`-`U+0009`, +* The codepoints `U+0000-0009`, the codepoint `U+000B`, - or the codepoints `U+000E`-`U+001F` (various control characters). -* `0x7F` (the Delete control character). + or the codepoints `U+000E-001F` (various control characters). +* `U+007F` (the Delete control character). * Any codepoint that is not a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value). -* `0x2066-2069`, `0x202A-202E`, `0x200E`, and `0x200F`, the [unicode +* `U+2066-2069`, `U+202A-202E`, `U+200E`, and `U+200F`, the [unicode "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) From f02ba59c0c806972f8108481d723c0cd2c345b8b Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 14:19:45 -0800 Subject: [PATCH 052/105] Make multi-line ws prefix determined by the last line. --- SPEC.md | 97 ++++++++++++++----- .../expected_kdl/escaped_whitespace.kdl | 2 +- .../expected_kdl/raw_string_newline.kdl | 2 +- tests/test_cases/input/escaped_whitespace.kdl | 8 +- 4 files changed, 81 insertions(+), 28 deletions(-) diff --git a/SPEC.md b/SPEC.md index 23fcfc3..e6bd0f5 100644 --- a/SPEC.md +++ b/SPEC.md @@ -388,7 +388,7 @@ such) are retained. For example, these strings are all semantically identical: " Hello World -" + " ``` ##### Invalid escapes @@ -428,12 +428,42 @@ quotes-and-escapes ##"hello\n\r\asd"#world"## The string contains the literal characters `hello\n\r\asd"#world` + +### Multi-line Strings + +When a Quoted or Raw String spans multiple lines with literal, non-escaped Newlines, +it follows a special multi-line syntax +that automatically "dedents" the string, +allowing its value to be indented to a visually matching level if desired. + +A Multi-line string _MUST_ start with a [Newline](#newline) +immediately following its opening `"`. +Its final line, preceding the closing `"`, +_MUST_ contain only whitespace. +All in-between lines that contain non-whitespace characters +_MUST_ start with the exact same whitespace as the final line +(precisely matching codepoints, not merely counting characters). + +The value of the Multi-line String omits the first and last Newline, +the Whitespace of the last line, +the matching Whitespace prefix on all intermediate lines, +and all Whitespace on intermediate Whitespace-only lines. +The first and last Newline can be the same character +(that is, empty multi-line strings are legal). + +Strings with literal Newlines that do not immediately start with a Newline and +whose final `"` is not preceeded by optional whitespace and a Newline are illegal. + +In other words, the final line specifies the whitespace prefix that will be removed from all other lines. + +#### Example + ```kdl -multi-line #" +multi-line " foo This is the base indentation - bar - "# + bar + " ``` The last example's string value will be: @@ -444,29 +474,52 @@ This is the base indentation bar ``` -### Multi-line Strings +Equivalent to `" foo\nThis is the base indentation\n bar"`. -Quoted and Raw Strings may span multiple lines with literal Newlines, in which -case the resulting String is "dedented" according to the line with the fewest -number of Whitespace characters preceding the first non-Whitespace character. -That is, the number of literal Whitespace characters in the least-indented -line in the String body is subtracted from the Whitespace of all other lines. +--------- -Multi-line strings _MUST_ have a single [Newline](#newline) immediately -following their opening `"`, after which they may have any number of newlines. -Finally, there must be a Newline, followed by any number of Whitespace, before -the closing `"`. +If the last line wasn't indented as far, +it won't dedent the rest of the lines as much: -The first Newline, the last Newline, along with Whitespace following the last -Newline, are not included in the value of the String. The first and last -Newline can be the same character (that is, empty multi-line strings are -legal). +```kdl +multi-line " + foo + This is no longer on the left edge + bar + " +``` -Furthermore, any lines in the string body that only contain literal whitespace -are stripped to only contain the single Newline character. +This example's string value will be: -Strings with literal Newlines that do not immediately start with a Newline and -whose final `"` is not preceeded by whitespace and a Newline are illegal. +``` + foo + This is no longer on the left edge + bar +``` + +Equivalent to `" foo\n This is no longer on the left edge\n bar"`. + +----------- + +Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value: + +```kdl +multi-line " + Indented a bit + + A second indented paragraph. + " +``` + +This example's string value will be: + +``` +Indented a bit. + +A second indented paragraph. +``` + +Equivalent to `"Indented a bit.\n\nA second indented paragraph."` ### Number diff --git a/tests/test_cases/expected_kdl/escaped_whitespace.kdl b/tests/test_cases/expected_kdl/escaped_whitespace.kdl index a97d10a..45dd408 100644 --- a/tests/test_cases/expected_kdl/escaped_whitespace.kdl +++ b/tests/test_cases/expected_kdl/escaped_whitespace.kdl @@ -1 +1 @@ -node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" +node "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" "Hello\n\tWorld" diff --git a/tests/test_cases/expected_kdl/raw_string_newline.kdl b/tests/test_cases/expected_kdl/raw_string_newline.kdl index d738029..fd38cb0 100644 --- a/tests/test_cases/expected_kdl/raw_string_newline.kdl +++ b/tests/test_cases/expected_kdl/raw_string_newline.kdl @@ -1 +1 @@ -node "\nhello\nworld\n" +node "hello\nworld" diff --git a/tests/test_cases/input/escaped_whitespace.kdl b/tests/test_cases/input/escaped_whitespace.kdl index 1f2e67c..797784a 100644 --- a/tests/test_cases/input/escaped_whitespace.kdl +++ b/tests/test_cases/input/escaped_whitespace.kdl @@ -1,13 +1,13 @@ // All of these strings are the same node \ "Hello\n\tWorld" \ - "Hello - World" \ + " + Hello + World + " \ "Hello\n\ \tWorld" \ "Hello\n\ \tWorld" \ - "Hello -\ \tWorld" \ "Hello\n\t\ World" From 935d054d134e4d6346b0927bc5dd9c0472656221 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 14:28:25 -0800 Subject: [PATCH 053/105] Fix more multiline tests --- tests/test_cases/expected_kdl/multiline_raw_string.kdl | 2 +- tests/test_cases/expected_kdl/multiline_string.kdl | 2 +- tests/test_cases/input/slashdash_full_node.kdl | 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/test_cases/expected_kdl/multiline_raw_string.kdl b/tests/test_cases/expected_kdl/multiline_raw_string.kdl index 2bafe90..3c31c47 100644 --- a/tests/test_cases/expected_kdl/multiline_raw_string.kdl +++ b/tests/test_cases/expected_kdl/multiline_raw_string.kdl @@ -1 +1 @@ -node "\nhey\neveryone\nhow goes?\n" +node "hey\neveryone\nhow goes?" diff --git a/tests/test_cases/expected_kdl/multiline_string.kdl b/tests/test_cases/expected_kdl/multiline_string.kdl index 2bafe90..3c31c47 100644 --- a/tests/test_cases/expected_kdl/multiline_string.kdl +++ b/tests/test_cases/expected_kdl/multiline_string.kdl @@ -1 +1 @@ -node "\nhey\neveryone\nhow goes?\n" +node "hey\neveryone\nhow goes?" diff --git a/tests/test_cases/input/slashdash_full_node.kdl b/tests/test_cases/input/slashdash_full_node.kdl index f52f18b..4df7b55 100644 --- a/tests/test_cases/input/slashdash_full_node.kdl +++ b/tests/test_cases/input/slashdash_full_node.kdl @@ -1,2 +1,3 @@ /- node 1.0 "a" b=" -b" +b +" From 1294f9733d82a543c1a81b3dac37a915d3d0bf03 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 14:31:49 -0800 Subject: [PATCH 054/105] Fix tests about # in an ident string --- tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl | 2 +- tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl b/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl index 317e824..8321632 100644 --- a/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl +++ b/tests/test_cases/expected_kdl/unusual_chars_in_bare_id.kdl @@ -1 +1 @@ -foo123~!@#$%^&*.:'|?+<>, weeee +foo123~!@$%^&*.:'|?+<>, weeee diff --git a/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl b/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl index 9281f70..d3262b8 100644 --- a/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl +++ b/tests/test_cases/input/unusual_bare_id_chars_in_quoted_id.kdl @@ -1 +1 @@ -"foo123~!@#$%^&*.:'|?+<>," weeee +"foo123~!@$%^&*.:'|?+<>," weeee From 094a615f82121eb7fed3a17a9e6a4140f95e1cb8 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 14:36:48 -0800 Subject: [PATCH 055/105] Tests are invalid (contained U+FFFD, not surrogates) and are in general untestable since you can't represent surrogates in UTF-8, which KDL must be encoded in. --- tests/test_cases/input/unicode_scalar_high.kdl | 2 -- tests/test_cases/input/unicode_scalar_low.kdl | 2 -- 2 files changed, 4 deletions(-) delete mode 100644 tests/test_cases/input/unicode_scalar_high.kdl delete mode 100644 tests/test_cases/input/unicode_scalar_low.kdl diff --git a/tests/test_cases/input/unicode_scalar_high.kdl b/tests/test_cases/input/unicode_scalar_high.kdl deleted file mode 100644 index fb1abb4..0000000 --- a/tests/test_cases/input/unicode_scalar_high.kdl +++ /dev/null @@ -1,2 +0,0 @@ -// 0xDFFF (last code point before 0xE000) -node �arg diff --git a/tests/test_cases/input/unicode_scalar_low.kdl b/tests/test_cases/input/unicode_scalar_low.kdl deleted file mode 100644 index 010d0b1..0000000 --- a/tests/test_cases/input/unicode_scalar_low.kdl +++ /dev/null @@ -1,2 +0,0 @@ -// 0xD800 (first code point after 0xD7FF) -node �arg From c273d249b6c77f86d4f7395eb2c93197e3725c4a Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 14:41:01 -0800 Subject: [PATCH 056/105] Dang it, forgot to save README when fixing multiline earlier. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f369cc1..53b669c 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ package { echo "foo" node -c "console.log('hello, world!');" echo "foo" > some-file.txt - "# + "# } // `\` breaks up a single node across multiple lines. From de37e11a2971e514f81cff4bf28d6365a8039cf4 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Tue, 26 Dec 2023 14:58:49 -0800 Subject: [PATCH 057/105] Comments are now allowed in and around types (along with other types of ws) --- tests/test_cases/expected_kdl/comment_after_arg_type.kdl | 1 + tests/test_cases/expected_kdl/comment_after_node_type.kdl | 1 + tests/test_cases/expected_kdl/comment_after_prop_type.kdl | 1 + tests/test_cases/expected_kdl/comment_in_arg_type.kdl | 1 + tests/test_cases/expected_kdl/comment_in_node_type.kdl | 1 + tests/test_cases/expected_kdl/comment_in_prop_type.kdl | 1 + tests/test_cases/input/comment_after_arg_type.kdl | 2 +- tests/test_cases/input/comment_after_node_type.kdl | 2 +- tests/test_cases/input/comment_after_prop_type.kdl | 2 +- tests/test_cases/input/comment_in_arg_type.kdl | 2 +- tests/test_cases/input/comment_in_node_type.kdl | 2 +- tests/test_cases/input/comment_in_prop_type.kdl | 2 +- 12 files changed, 12 insertions(+), 6 deletions(-) create mode 100644 tests/test_cases/expected_kdl/comment_after_arg_type.kdl create mode 100644 tests/test_cases/expected_kdl/comment_after_node_type.kdl create mode 100644 tests/test_cases/expected_kdl/comment_after_prop_type.kdl create mode 100644 tests/test_cases/expected_kdl/comment_in_arg_type.kdl create mode 100644 tests/test_cases/expected_kdl/comment_in_node_type.kdl create mode 100644 tests/test_cases/expected_kdl/comment_in_prop_type.kdl diff --git a/tests/test_cases/expected_kdl/comment_after_arg_type.kdl b/tests/test_cases/expected_kdl/comment_after_arg_type.kdl new file mode 100644 index 0000000..51dcb98 --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_after_arg_type.kdl @@ -0,0 +1 @@ +node (type)10 diff --git a/tests/test_cases/expected_kdl/comment_after_node_type.kdl b/tests/test_cases/expected_kdl/comment_after_node_type.kdl new file mode 100644 index 0000000..c790643 --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_after_node_type.kdl @@ -0,0 +1 @@ +(type)node diff --git a/tests/test_cases/expected_kdl/comment_after_prop_type.kdl b/tests/test_cases/expected_kdl/comment_after_prop_type.kdl new file mode 100644 index 0000000..843551b --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_after_prop_type.kdl @@ -0,0 +1 @@ +node key=(type)10 diff --git a/tests/test_cases/expected_kdl/comment_in_arg_type.kdl b/tests/test_cases/expected_kdl/comment_in_arg_type.kdl new file mode 100644 index 0000000..51dcb98 --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_in_arg_type.kdl @@ -0,0 +1 @@ +node (type)10 diff --git a/tests/test_cases/expected_kdl/comment_in_node_type.kdl b/tests/test_cases/expected_kdl/comment_in_node_type.kdl new file mode 100644 index 0000000..c790643 --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_in_node_type.kdl @@ -0,0 +1 @@ +(type)node diff --git a/tests/test_cases/expected_kdl/comment_in_prop_type.kdl b/tests/test_cases/expected_kdl/comment_in_prop_type.kdl new file mode 100644 index 0000000..843551b --- /dev/null +++ b/tests/test_cases/expected_kdl/comment_in_prop_type.kdl @@ -0,0 +1 @@ +node key=(type)10 diff --git a/tests/test_cases/input/comment_after_arg_type.kdl b/tests/test_cases/input/comment_after_arg_type.kdl index f88b7c1..d493f6e 100644 --- a/tests/test_cases/input/comment_after_arg_type.kdl +++ b/tests/test_cases/input/comment_after_arg_type.kdl @@ -1 +1 @@ -node (type)/*huh*/10 +node (type)/*hey*/10 diff --git a/tests/test_cases/input/comment_after_node_type.kdl b/tests/test_cases/input/comment_after_node_type.kdl index 55ab980..a5939b4 100644 --- a/tests/test_cases/input/comment_after_node_type.kdl +++ b/tests/test_cases/input/comment_after_node_type.kdl @@ -1 +1 @@ -(type)/*huh*/node +(type)/*hey*/node diff --git a/tests/test_cases/input/comment_after_prop_type.kdl b/tests/test_cases/input/comment_after_prop_type.kdl index c9b1858..6805673 100644 --- a/tests/test_cases/input/comment_after_prop_type.kdl +++ b/tests/test_cases/input/comment_after_prop_type.kdl @@ -1 +1 @@ -node key=(type)/*huh*/10 +node key=(type)/*hey*/10 diff --git a/tests/test_cases/input/comment_in_arg_type.kdl b/tests/test_cases/input/comment_in_arg_type.kdl index 39742ac..1166a43 100644 --- a/tests/test_cases/input/comment_in_arg_type.kdl +++ b/tests/test_cases/input/comment_in_arg_type.kdl @@ -1 +1 @@ -node (type/*huh*/)10 +node (type/*hey*/)10 diff --git a/tests/test_cases/input/comment_in_node_type.kdl b/tests/test_cases/input/comment_in_node_type.kdl index 8cda2e5..7cc9b26 100644 --- a/tests/test_cases/input/comment_in_node_type.kdl +++ b/tests/test_cases/input/comment_in_node_type.kdl @@ -1 +1 @@ -(type/*huh*/)node +(type/*hey*/)node diff --git a/tests/test_cases/input/comment_in_prop_type.kdl b/tests/test_cases/input/comment_in_prop_type.kdl index 10adb3b..0587da9 100644 --- a/tests/test_cases/input/comment_in_prop_type.kdl +++ b/tests/test_cases/input/comment_in_prop_type.kdl @@ -1 +1 @@ -node key=(type/*huh*/)10 +node key=(type/*hey*/)10 From 24cd2141d3a08a772b751055b9be2090236ca4da Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Wed, 3 Jan 2024 17:08:49 -0800 Subject: [PATCH 058/105] Disallow idents like '.1' to avoid footguns --- SPEC.md | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/SPEC.md b/SPEC.md index e6bd0f5..5a3a90d 100644 --- a/SPEC.md +++ b/SPEC.md @@ -298,18 +298,23 @@ composed of any [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) other than [non-initial characters](#non-initial-characters), followed by any number of Unicode Scalar Values other than [non-identifier -characters](#non-identifier-characters), so long as this doesn't produce -something confusable for a [Number](#number). For example, both a -[Number](#number) and an Identifier can start with `-`, but when an Identifier -starts with `-` the second character cannot be a digit. This is precisely -specified in the [Full Grammar](#full-grammar) below. +characters](#non-identifier-characters). + +A handful of patterns are disallowed, to avoid confusion with other values: + +* idents that appear to start with a [Number](#number) + (like `1.0v2` or `-1em`) + or the "almost a number" pattern of a decimal point without a leading digit + (like `.1`) +* idents that are the language keywords (`true`, `false`, and `null`) without their leading `#` + +Identifiers that match these patterns _MUST_ be treated as a syntax error; +such values can only be written as quoted or raw strings. +The precise details of the identifier syntax is specified in the [Full Grammar](#full-grammar) below. Identifier Strings are terminated by [Whitespace](#whitespace) or [Newlines](#newline). -The literal identifiers `true`, `false`, and `null` are illegal Identifier -Strings, and _MUST_ be treated as a syntax error. - #### Non-initial characters The following characters cannot be the first character in an @@ -540,6 +545,11 @@ There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary. * They may optionally include a decimal separator `.`, followed by more digits, which may again be separated by `_`. * They may optionally be followed by `E` or `e`, an optional `-` or `+`, and more digits, to represent an exponent value. +Note that, similar to JSON and some other languages, +numbers without an integer digit (such as `.1`) are illegal. +They must be written with at least one integer digit, like `0.1`. +(These patterns are also disallowed from [Identifier Strings](#identifier-string), to avoid confusion.) + ### Boolean A boolean [Value](#value) is either the symbol `#true` or `#false`. These @@ -680,9 +690,10 @@ node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof identifier := string | bare-identifier -bare-identifier := (unambiguous-ident - boolean - 'null') | numberish-ident -unambiguous-ident := (identifier-char - digit - sign) identifier-char* -numberish-ident := sign ((identifier-char - digit) identifier-char*)? +bare-identifier := (unambiguous-ident - boolean - 'null') | numberish-ident | dotted-ident +unambiguous-ident := (identifier-char - digit - sign - '.') identifier-char* +numberish-ident := sign ((identifier-char - digit - '.') identifier-char*)? +dotted-ident := '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points keyword := '#' (boolean | 'null') From bc2b995bfe5138e3a6fc888f61f45fc726a9ff99 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Wed, 3 Jan 2024 17:14:23 -0800 Subject: [PATCH 059/105] Rename/rearrange the string productions to match the spec text better. --- SPEC.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/SPEC.md b/SPEC.md index 5a3a90d..09cb10f 100644 --- a/SPEC.md +++ b/SPEC.md @@ -682,28 +682,28 @@ node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node required-node-space := node-space* plain-node-space+ optional-node-space := node-space* -base-node := type? optional-node-space identifier (required-node-space node-prop-or-arg)* (required-node-space node-children)? +base-node := type? optional-node-space string (required-node-space node-prop-or-arg)* (required-node-space node-children)? node := base-node optional-node-space node-terminator final-node := base-node optional-node-space node-terminator? node-prop-or-arg := prop | value node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof -identifier := string | bare-identifier -bare-identifier := (unambiguous-ident - boolean - 'null') | numberish-ident | dotted-ident +keyword := '#' (boolean | 'null') +prop := string optional-node-space equals-sign optional-node-space value +value := type? optional-node-space (string | number | keyword) +type := '(' optional-node-space string optional-node-space ')' +equals-sign := See Table (Equals Sign) + +string := identifier-string | quoted-string | raw-string + +identifier-string := (unambiguous-ident - boolean - 'null') | numberish-ident | dotted-ident unambiguous-ident := (identifier-char - digit - sign - '.') identifier-char* numberish-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points -keyword := '#' (boolean | 'null') -prop := identifier optional-node-space equals-sign optional-node-space value -value := type? optional-node-space (identifier | string | number | keyword) -type := '(' optional-node-space identifier optional-node-space ')' -equals-sign := See Table (Equals Sign) - -string := raw-string | escaped-string -escaped-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' +quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' single-line-string-body := (string-character - newline)* multi-line-string-body := string-character* string-character := '\' escape | [^\\"] - disallowed-literal-code-points From 1f28fb0e832e0ec7f7aa1f0fb7908290da436f50 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Wed, 3 Jan 2024 17:19:03 -0800 Subject: [PATCH 060/105] [editorial] Move keyword production to a better spot. Rephrase bool/keyword to include the # directly. Explicitly spell out the disallowed keywordish idents, and move where they appear. Rename numberish-ident to signed-ident (it's not numberish at all, is the point). --- SPEC.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/SPEC.md b/SPEC.md index 09cb10f..8bb74c2 100644 --- a/SPEC.md +++ b/SPEC.md @@ -689,7 +689,6 @@ node-prop-or-arg := prop | value node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof -keyword := '#' (boolean | 'null') prop := string optional-node-space equals-sign optional-node-space value value := type? optional-node-space (string | number | keyword) type := '(' optional-node-space string optional-node-space ')' @@ -697,9 +696,9 @@ equals-sign := See Table (Equals Sign) string := identifier-string | quoted-string | raw-string -identifier-string := (unambiguous-ident - boolean - 'null') | numberish-ident | dotted-ident -unambiguous-ident := (identifier-char - digit - sign - '.') identifier-char* -numberish-ident := sign ((identifier-char - digit - '.') identifier-char*)? +identifier-string := unambiguous-ident | signed-ident | dotted-ident +unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' +signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points @@ -727,7 +726,9 @@ hex := sign? '0x' hex-digit (hex-digit | '_')* octal := sign? '0o' [0-7] [0-7_]* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* -boolean := 'true' | 'false' +keyword := boolean | '#null' + +boolean := '#true' | '#false' escline := '\\' ws* (single-line-comment | newline | eof) From 1d6809ee469f761eec20d564005cf3c59e5e8da1 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Wed, 3 Jan 2024 17:25:34 -0800 Subject: [PATCH 061/105] Whoops, missed allowing '+.' --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 8bb74c2..21a5cab 100644 --- a/SPEC.md +++ b/SPEC.md @@ -699,7 +699,7 @@ string := identifier-string | quoted-string | raw-string identifier-string := unambiguous-ident | signed-ident | dotted-ident unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? -dotted-ident := '.' ((identifier-char - digit) identifier-char*)? +dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' From af91cc63192c1f7de1b43faab6b7826d3c6b97c8 Mon Sep 17 00:00:00 2001 From: Tab Atkins-Bittner Date: Thu, 4 Jan 2024 11:12:10 -0800 Subject: [PATCH 062/105] Add tests for .1 and general 'ident ambiguous with a number' cases. --- tests/test_cases/expected_kdl/bare_ident_dot.kdl | 1 + tests/test_cases/expected_kdl/bare_ident_sign.kdl | 1 + tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl | 1 + tests/test_cases/input/bare_ident_dot.kdl | 1 + tests/test_cases/input/bare_ident_numeric.kdl | 1 + tests/test_cases/input/bare_ident_numeric_dot.kdl | 1 + tests/test_cases/input/bare_ident_numeric_sign.kdl | 1 + tests/test_cases/input/bare_ident_sign.kdl | 1 + tests/test_cases/input/bare_ident_sign_dot.kdl | 1 + tests/test_cases/input/no_integer_digit.kdl | 1 + 10 files changed, 10 insertions(+) create mode 100644 tests/test_cases/expected_kdl/bare_ident_dot.kdl create mode 100644 tests/test_cases/expected_kdl/bare_ident_sign.kdl create mode 100644 tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl create mode 100644 tests/test_cases/input/bare_ident_dot.kdl create mode 100644 tests/test_cases/input/bare_ident_numeric.kdl create mode 100644 tests/test_cases/input/bare_ident_numeric_dot.kdl create mode 100644 tests/test_cases/input/bare_ident_numeric_sign.kdl create mode 100644 tests/test_cases/input/bare_ident_sign.kdl create mode 100644 tests/test_cases/input/bare_ident_sign_dot.kdl create mode 100644 tests/test_cases/input/no_integer_digit.kdl diff --git a/tests/test_cases/expected_kdl/bare_ident_dot.kdl b/tests/test_cases/expected_kdl/bare_ident_dot.kdl new file mode 100644 index 0000000..5c32f67 --- /dev/null +++ b/tests/test_cases/expected_kdl/bare_ident_dot.kdl @@ -0,0 +1 @@ +node . \ No newline at end of file diff --git a/tests/test_cases/expected_kdl/bare_ident_sign.kdl b/tests/test_cases/expected_kdl/bare_ident_sign.kdl new file mode 100644 index 0000000..b609706 --- /dev/null +++ b/tests/test_cases/expected_kdl/bare_ident_sign.kdl @@ -0,0 +1 @@ +node + \ No newline at end of file diff --git a/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl b/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl new file mode 100644 index 0000000..d50adcf --- /dev/null +++ b/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl @@ -0,0 +1 @@ +node +. \ No newline at end of file diff --git a/tests/test_cases/input/bare_ident_dot.kdl b/tests/test_cases/input/bare_ident_dot.kdl new file mode 100644 index 0000000..5c32f67 --- /dev/null +++ b/tests/test_cases/input/bare_ident_dot.kdl @@ -0,0 +1 @@ +node . \ No newline at end of file diff --git a/tests/test_cases/input/bare_ident_numeric.kdl b/tests/test_cases/input/bare_ident_numeric.kdl new file mode 100644 index 0000000..053af21 --- /dev/null +++ b/tests/test_cases/input/bare_ident_numeric.kdl @@ -0,0 +1 @@ +node 0n \ No newline at end of file diff --git a/tests/test_cases/input/bare_ident_numeric_dot.kdl b/tests/test_cases/input/bare_ident_numeric_dot.kdl new file mode 100644 index 0000000..b97afcf --- /dev/null +++ b/tests/test_cases/input/bare_ident_numeric_dot.kdl @@ -0,0 +1 @@ +node .0n \ No newline at end of file diff --git a/tests/test_cases/input/bare_ident_numeric_sign.kdl b/tests/test_cases/input/bare_ident_numeric_sign.kdl new file mode 100644 index 0000000..6cadc35 --- /dev/null +++ b/tests/test_cases/input/bare_ident_numeric_sign.kdl @@ -0,0 +1 @@ +node +0n \ No newline at end of file diff --git a/tests/test_cases/input/bare_ident_sign.kdl b/tests/test_cases/input/bare_ident_sign.kdl new file mode 100644 index 0000000..b609706 --- /dev/null +++ b/tests/test_cases/input/bare_ident_sign.kdl @@ -0,0 +1 @@ +node + \ No newline at end of file diff --git a/tests/test_cases/input/bare_ident_sign_dot.kdl b/tests/test_cases/input/bare_ident_sign_dot.kdl new file mode 100644 index 0000000..d50adcf --- /dev/null +++ b/tests/test_cases/input/bare_ident_sign_dot.kdl @@ -0,0 +1 @@ +node +. \ No newline at end of file diff --git a/tests/test_cases/input/no_integer_digit.kdl b/tests/test_cases/input/no_integer_digit.kdl new file mode 100644 index 0000000..bac8026 --- /dev/null +++ b/tests/test_cases/input/no_integer_digit.kdl @@ -0,0 +1 @@ +node .1 \ No newline at end of file From 29495006bc9eec4c508d60cd3b7de1d1f2773768 Mon Sep 17 00:00:00 2001 From: Corey Powell Date: Sat, 6 Jan 2024 17:50:37 -0500 Subject: [PATCH 063/105] KDL V2 Test Fixes (#368) * Add newlines to some of the bare ident tests that were missing them * eof_after_escape.kdl is valid according to spec --- tests/test_cases/expected_kdl/bare_ident_dot.kdl | 2 +- tests/test_cases/expected_kdl/bare_ident_sign.kdl | 2 +- tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl | 2 +- tests/test_cases/expected_kdl/eof_after_escape.kdl | 1 + 4 files changed, 4 insertions(+), 3 deletions(-) create mode 100644 tests/test_cases/expected_kdl/eof_after_escape.kdl diff --git a/tests/test_cases/expected_kdl/bare_ident_dot.kdl b/tests/test_cases/expected_kdl/bare_ident_dot.kdl index 5c32f67..4ea1fa6 100644 --- a/tests/test_cases/expected_kdl/bare_ident_dot.kdl +++ b/tests/test_cases/expected_kdl/bare_ident_dot.kdl @@ -1 +1 @@ -node . \ No newline at end of file +node . diff --git a/tests/test_cases/expected_kdl/bare_ident_sign.kdl b/tests/test_cases/expected_kdl/bare_ident_sign.kdl index b609706..34594d7 100644 --- a/tests/test_cases/expected_kdl/bare_ident_sign.kdl +++ b/tests/test_cases/expected_kdl/bare_ident_sign.kdl @@ -1 +1 @@ -node + \ No newline at end of file +node + diff --git a/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl b/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl index d50adcf..a37a5c3 100644 --- a/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl +++ b/tests/test_cases/expected_kdl/bare_ident_sign_dot.kdl @@ -1 +1 @@ -node +. \ No newline at end of file +node +. diff --git a/tests/test_cases/expected_kdl/eof_after_escape.kdl b/tests/test_cases/expected_kdl/eof_after_escape.kdl new file mode 100644 index 0000000..64f5a0a --- /dev/null +++ b/tests/test_cases/expected_kdl/eof_after_escape.kdl @@ -0,0 +1 @@ +node From c15b5c27983add3df69adca1e4b2ca02130d79b4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 6 Feb 2024 13:54:19 -0800 Subject: [PATCH 064/105] make note of .1/+.1 illegality in the changelog --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1eb927b..5c20068 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -62,6 +62,8 @@ EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare identifiers. +* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and + conflicts with numbers. ### KQL From 172c67b602583d9ddb803e20316e951383fb3daa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 6 Feb 2024 13:56:50 -0800 Subject: [PATCH 065/105] Release 2.0.0 draft 2 --- README.md | 2 +- SPEC.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 53b669c..7ba002f 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ Language](SCHEMA-SPEC.md) loosely based on JSON Schema. The language is based on [SDLang](https://sdlang.org), with a number of modifications and clarifications on its syntax and behavior. -The current version of the KDL spec is `2.0.0-draft.1`. +The current version of the KDL spec is `2.0.0-draft.2`. [Play with it in your browser!](https://kdl-play.danini.dev/) diff --git a/SPEC.md b/SPEC.md index 21a5cab..dc1321c 100644 --- a/SPEC.md +++ b/SPEC.md @@ -3,8 +3,8 @@ This is the semi-formal specification for KDL, including the intended data model and the grammar. -This document describes KDL version `2.0.0-draft.1`. It was released on -2023-12-12. +This document describes KDL version `2.0.0-draft.2`. It was released on +2024-02-06. ## Introduction From 522ce8591e79848253cf91d314bf99b1f5f76934 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 7 Feb 2024 11:20:56 -0800 Subject: [PATCH 066/105] clarify multi-line strings further --- CHANGELOG.md | 8 +-- SPEC.md | 65 +++++++++++++------ .../multiline_raw_string_indented.kdl | 2 + .../multiline_string_indented.kdl | 2 + .../input/multiline_raw_string_indented.kdl | 5 ++ ...ng_non_matching_prefix_character_error.kdl | 5 ++ ...string_non_matching_prefix_count_error.kdl | 5 ++ .../input/multiline_string_indented.kdl | 5 ++ ...ng_non_matching_prefix_character_error.kdl | 5 ++ ...string_non_matching_prefix_count_error.kdl | 5 ++ 10 files changed, 82 insertions(+), 25 deletions(-) create mode 100644 tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl create mode 100644 tests/test_cases/expected_kdl/multiline_string_indented.kdl create mode 100644 tests/test_cases/input/multiline_raw_string_indented.kdl create mode 100644 tests/test_cases/input/multiline_raw_string_non_matching_prefix_character_error.kdl create mode 100644 tests/test_cases/input/multiline_raw_string_non_matching_prefix_count_error.kdl create mode 100644 tests/test_cases/input/multiline_string_indented.kdl create mode 100644 tests/test_cases/input/multiline_string_non_matching_prefix_character_error.kdl create mode 100644 tests/test_cases/input/multiline_string_non_matching_prefix_count_error.kdl diff --git a/CHANGELOG.md b/CHANGELOG.md index 5c20068..0d1e364 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -54,10 +54,10 @@ * Around `=` for props (`x = 1`) * The BOM is now only allowed as the first character in a document. It was previously treated as generic whitespace. -* Multi-line strings are now automatically dedented, according to the - least-indented line in the body. Multiline strings and raw strings now must - have a newline immediately following their opening `"`, and a final newline - preceding the closing `"`. +* Multi-line strings are now automatically dedented, according to the common + whitespace matching the whitespace prefix of the closing line. Multiline + strings and raw strings now must have a newline immediately following their + opening `"`, and a final newline plus whitespace preceding the closing `"`. * SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare diff --git a/SPEC.md b/SPEC.md index dc1321c..dacb03d 100644 --- a/SPEC.md +++ b/SPEC.md @@ -436,30 +436,31 @@ The string contains the literal characters `hello\n\r\asd"#world` ### Multi-line Strings -When a Quoted or Raw String spans multiple lines with literal, non-escaped Newlines, -it follows a special multi-line syntax -that automatically "dedents" the string, -allowing its value to be indented to a visually matching level if desired. - -A Multi-line string _MUST_ start with a [Newline](#newline) -immediately following its opening `"`. -Its final line, preceding the closing `"`, -_MUST_ contain only whitespace. -All in-between lines that contain non-whitespace characters -_MUST_ start with the exact same whitespace as the final line -(precisely matching codepoints, not merely counting characters). - -The value of the Multi-line String omits the first and last Newline, -the Whitespace of the last line, -the matching Whitespace prefix on all intermediate lines, -and all Whitespace on intermediate Whitespace-only lines. -The first and last Newline can be the same character -(that is, empty multi-line strings are legal). +When a Quoted or Raw String spans multiple lines with literal, non-escaped +Newlines, it follows a special multi-line syntax that automatically "dedents" +the string, allowing its value to be indented to a visually matching level if +desired. + +A Multi-line string _MUST_ start with a [Newline](#newline) immediately +following its opening `"`. Its final line _MUST_ contain only whitespace, +followed by a single closing `"`. All in-between lines that contain +non-whitespace characters _MUST_ start with the exact same whitespace as the +final line (precisely matching codepoints, not merely counting characters). + +The value of the Multi-line String omits the first and last Newline, the +Whitespace of the last line, and the matching Whitespace prefix on all +intermediate lines. The first and last Newline can be the same character (that +is, empty multi-line strings are legal). Strings with literal Newlines that do not immediately start with a Newline and -whose final `"` is not preceeded by optional whitespace and a Newline are illegal. +whose final `"` is not preceeded by optional whitespace and a Newline are +illegal. -In other words, the final line specifies the whitespace prefix that will be removed from all other lines. +In other words, the final line specifies the whitespace prefix that will be +removed from all other lines. + +It is a syntax error for any body lines of the multi-line string to not match +the whitespace prefix of the last line with the final quote. #### Example @@ -526,6 +527,28 @@ A second indented paragraph. Equivalent to `"Indented a bit.\n\nA second indented paragraph."` +----------- + +The following yield syntax errors: + +```kdl +multi-line " + closing quote with non-whitespace prefix" +``` + +```kdl +multi-line "stuff + " +``` + +```kdl +// Every line must share the exact same prefix as the closing line. +multi-line "[\n] +[tab]a[\n] +[space][space]b[\n] +[space][tab][\n] +[tab]" +``` ### Number diff --git a/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl b/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl new file mode 100644 index 0000000..e7638d8 --- /dev/null +++ b/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl @@ -0,0 +1,2 @@ +node " hey\n everyone\n how goes?" + diff --git a/tests/test_cases/expected_kdl/multiline_string_indented.kdl b/tests/test_cases/expected_kdl/multiline_string_indented.kdl new file mode 100644 index 0000000..e7638d8 --- /dev/null +++ b/tests/test_cases/expected_kdl/multiline_string_indented.kdl @@ -0,0 +1,2 @@ +node " hey\n everyone\n how goes?" + diff --git a/tests/test_cases/input/multiline_raw_string_indented.kdl b/tests/test_cases/input/multiline_raw_string_indented.kdl new file mode 100644 index 0000000..67ef76d --- /dev/null +++ b/tests/test_cases/input/multiline_raw_string_indented.kdl @@ -0,0 +1,5 @@ +node #" + hey + everyone + how goes? + "# diff --git a/tests/test_cases/input/multiline_raw_string_non_matching_prefix_character_error.kdl b/tests/test_cases/input/multiline_raw_string_non_matching_prefix_character_error.kdl new file mode 100644 index 0000000..c5650e9 --- /dev/null +++ b/tests/test_cases/input/multiline_raw_string_non_matching_prefix_character_error.kdl @@ -0,0 +1,5 @@ +node #" + hey + everyone + how goes? + "# diff --git a/tests/test_cases/input/multiline_raw_string_non_matching_prefix_count_error.kdl b/tests/test_cases/input/multiline_raw_string_non_matching_prefix_count_error.kdl new file mode 100644 index 0000000..c0f4f56 --- /dev/null +++ b/tests/test_cases/input/multiline_raw_string_non_matching_prefix_count_error.kdl @@ -0,0 +1,5 @@ +node #" + hey + everyone + how goes? + "# diff --git a/tests/test_cases/input/multiline_string_indented.kdl b/tests/test_cases/input/multiline_string_indented.kdl new file mode 100644 index 0000000..ce9ca16 --- /dev/null +++ b/tests/test_cases/input/multiline_string_indented.kdl @@ -0,0 +1,5 @@ +node " + hey + everyone + how goes? + " diff --git a/tests/test_cases/input/multiline_string_non_matching_prefix_character_error.kdl b/tests/test_cases/input/multiline_string_non_matching_prefix_character_error.kdl new file mode 100644 index 0000000..1c2ca85 --- /dev/null +++ b/tests/test_cases/input/multiline_string_non_matching_prefix_character_error.kdl @@ -0,0 +1,5 @@ +node " + hey + everyone + how goes? + " diff --git a/tests/test_cases/input/multiline_string_non_matching_prefix_count_error.kdl b/tests/test_cases/input/multiline_string_non_matching_prefix_count_error.kdl new file mode 100644 index 0000000..86a2867 --- /dev/null +++ b/tests/test_cases/input/multiline_string_non_matching_prefix_count_error.kdl @@ -0,0 +1,5 @@ +node " + hey + everyone + how goes? + " From 35ac19b85417e32dcf5c0e84acab59b43ab06eb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 7 Feb 2024 11:36:59 -0800 Subject: [PATCH 067/105] fix stray legacy bool in example --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index dacb03d..c62b506 100644 --- a/SPEC.md +++ b/SPEC.md @@ -582,7 +582,7 @@ approximation thereof. #### Example ```kdl -my-node true value=#false +my-node #true value=#false ``` ### Null From 2d4bcd0b51077741c26b37c6607acc12bcdef32c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 7 Feb 2024 11:38:03 -0800 Subject: [PATCH 068/105] Release 2.0.0 draft 3 --- CHANGELOG.md | 2 +- README.md | 2 +- SPEC.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0d1e364..20f1da1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # KDL Changelog -## 2.0.0 (2022-08-28) +## 2.0.0 (2024-02-07) ### Grammar diff --git a/README.md b/README.md index 7ba002f..d961c77 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ Language](SCHEMA-SPEC.md) loosely based on JSON Schema. The language is based on [SDLang](https://sdlang.org), with a number of modifications and clarifications on its syntax and behavior. -The current version of the KDL spec is `2.0.0-draft.2`. +The current version of the KDL spec is `2.0.0-draft.3`. [Play with it in your browser!](https://kdl-play.danini.dev/) diff --git a/SPEC.md b/SPEC.md index c62b506..78da7b5 100644 --- a/SPEC.md +++ b/SPEC.md @@ -3,8 +3,8 @@ This is the semi-formal specification for KDL, including the intended data model and the grammar. -This document describes KDL version `2.0.0-draft.2`. It was released on -2024-02-06. +This document describes KDL version `2.0.0-draft.3`. It was released on +2024-02-07. ## Introduction From f767472cab9e2dd3bfa48e9aa449c1c27e7b0361 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 7 Feb 2024 13:06:09 -0800 Subject: [PATCH 069/105] small readme improvements --- README.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index d961c77..b9175e7 100644 --- a/README.md +++ b/README.md @@ -156,20 +156,20 @@ node2 "this\nhas\tescapes" node3 #"C:\Users\zkat\raw\string"# ``` -You don't have to quote strings unless they contain whitespace, or if any the -following apply: - * The string contains any of `[]{}()\/#";`. +You don't have to quote strings unless any the following apply: * The string contains whitespace. + * The string contains any of `[]{}()\/#";`. * The string is one of `true`, `false`, or `null`. - * The strings starts with a digit, or `+`/`-` and a digit. + * The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit. * The string contains an equals sign (including unicode equals signs `﹦`, `=`, and `🟰`). -In essence, if it can get confused for other KDL syntax, it needs quotes. +In essence, if it can get confused for other KDL or KQL syntax, it needs +quotes. Both types of quoted string can be multiline as-is, without a different -syntax. Additionally, these multi-line strings will be "dedented" according to -the common indentation that all lines share: +syntax. Additionally, common indentation shared with the line containing the +closing quote will be stripped/dedented: ```kdl string " @@ -196,7 +196,7 @@ You can add any number of `#`s before and after the opening and closing `#` to disambiguate literal closing `#"` sequences: ```kdl -other-raw ##"hello"#world"## +other-raw ##"hello#"world"## ``` #### Numbers @@ -248,7 +248,7 @@ hello ``` On top of that, KDL supports `/-` "slashdash" comments, which can be used to -comment out individual nodes, arguments, or children: +comment out individual nodes, arguments, or child blocks: ```kdl // This entire node and its children are all commented out. From 40d8c83aca4fe12bf4912efb87503a04721442b7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 7 Feb 2024 16:07:54 -0800 Subject: [PATCH 070/105] unicode character support clarifications --- SPEC.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/SPEC.md b/SPEC.md index 78da7b5..32df285 100644 --- a/SPEC.md +++ b/SPEC.md @@ -675,13 +675,12 @@ Note that for the purpose of new lines, CRLF is considered _a single newline_. The following code points may not appear literally anywhere in the document. They may be represented in Strings (but not Raw Strings) using `\u{}`. -* The codepoints `U+0000-0009`, - the codepoint `U+000B`, - or the codepoints `U+000E-001F` (various control characters). +* The codepoints `U+0000-0008` or the codepoints `U+000E-001F` (various + control characters). * `U+007F` (the Delete control character). * Any codepoint that is not a [Unicode Scalar - Value](https://unicode.org/glossary/#unicode_scalar_value). -* `U+2066-2069`, `U+202A-202E`, `U+200E`, and `U+200F`, the [unicode + Value](https://unicode.org/glossary/#unicode_scalar_value) (`U+D800-DFFF`). +* `U+200E-200F`, `U+202A-202E`, and `U+2066-2069`, the [unicode "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) @@ -723,7 +722,7 @@ identifier-string := unambiguous-ident | signed-ident | dotted-ident unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? -identifier-char := unicode - line-space - [\\/(){};\[\]="#] - disallowed-literal-code-points +identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' single-line-string-body := (string-character - newline)* @@ -763,7 +762,9 @@ bom := '\u{FEFF}' disallowed-literal-code-points := See Table (Disallowed Literal Code Points) -unicode-space := See Table (All White_Space unicode characters which are not `newline`) +equals-sign := See Table ([Equals Sign](#equals-sign)) + +unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`) single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block From b1163e1f9110f6b89ec35ffc997a3080ff057553 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 8 Feb 2024 09:35:26 -0800 Subject: [PATCH 071/105] more small fixes --- CHANGELOG.md | 5 +++-- SPEC.md | 5 ++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 20f1da1..d51357b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,8 +15,9 @@ characters). * `,`, `<`, and `>` are now legal identifier characters. They were previously reserved for KQL but this is no longer necessary. -* Code points under `0x20`, code points above `0x10FFFF`, Delete control - character (`0x7F`), and the [unicode "direction control" +* Code points under `0x20` (except newline and whitespace code points), code + points above `0x10FFFF`, Delete control character (`0x7F`), and the [unicode + "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) are now completely banned from appearing literally in KDL documents. They can now only be represented in regular strings, and there's no facilities to diff --git a/SPEC.md b/SPEC.md index 32df285..c598a2f 100644 --- a/SPEC.md +++ b/SPEC.md @@ -714,7 +714,8 @@ node-terminator := single-line-comment | newline | ';' | eof prop := string optional-node-space equals-sign optional-node-space value value := type? optional-node-space (string | number | keyword) type := '(' optional-node-space string optional-node-space ')' -equals-sign := See Table (Equals Sign) + +equals-sign := See Table ([Equals Sign](#equals-sign)) string := identifier-string | quoted-string | raw-string @@ -762,8 +763,6 @@ bom := '\u{FEFF}' disallowed-literal-code-points := See Table (Disallowed Literal Code Points) -equals-sign := See Table ([Equals Sign](#equals-sign)) - unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`) single-line-comment := '//' ^newline* (newline | eof) From f81fcfada59e6de099e89791b456ac48264165c8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 8 Feb 2024 09:47:42 -0800 Subject: [PATCH 072/105] minor reword --- SPEC.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/SPEC.md b/SPEC.md index c598a2f..4940da1 100644 --- a/SPEC.md +++ b/SPEC.md @@ -342,12 +342,13 @@ The following characters cannot be used anywhere in a [Identifier String](#ident A Quoted String is delimited by `"` on either side of any number of literal string characters except unescaped `"` and `\`. This includes literal -[Newline](#newline) characters, which means a String Value can encompass -multiple lines without behaving like a Newline for [Node](#node) parsing -purposes. +[Newline](#newline) characters, which means a single String Value can span +multiple lines, following specific [Multi-line String](#multi-line-strings) +rules. -Like Identifier Strings, Quoted Strings _MUST NOT_ include any of the [disallowed literal -code-points](#disallowed-literal-code-points) as code points in their body. +Like Identifier Strings, Quoted Strings _MUST NOT_ include any of the +[disallowed literal code-points](#disallowed-literal-code-points) as code +points in their body. Quoted Strings also follow the Multi-line rules specified in [Multi-line String](#multi-line-strings). From f0f9589636fe667268fbfce3b58e37aaec63fce7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 8 Feb 2024 11:14:54 -0800 Subject: [PATCH 073/105] example tweaks --- examples/Cargo.kdl | 2 +- examples/ci.kdl | 8 ++++---- examples/website.kdl | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/examples/Cargo.kdl b/examples/Cargo.kdl index f3465b4..caec020 100644 --- a/examples/Cargo.kdl +++ b/examples/Cargo.kdl @@ -1,7 +1,7 @@ package { name kdl version "0.0.0" - description "kat's document language" + description "The kdl document language" authors "Kat Marchán " license-file LICENSE.md edition "2018" diff --git a/examples/ci.kdl b/examples/ci.kdl index aff2863..00d49cd 100644 --- a/examples/ci.kdl +++ b/examples/ci.kdl @@ -19,8 +19,8 @@ jobs { components rustfmt override #true } - step rustfmt run="cargo fmt --all -- --check" - step docs run="cargo doc --no-deps" + step rustfmt { run cargo fmt --all -- --check } + step docs { run cargo doc --no-deps } } } build_and_test "Build & Test" { @@ -40,8 +40,8 @@ jobs { components clippy override #true } - step Clippy run="cargo clippy --all -- -D warnings" - step "Run tests" run="cargo test --all --verbose" + step Clippy { run cargo clippy --all -- -D warnings } + step "Run tests" { run cargo test --all --verbose } step "Other Stuff" run=" echo foo echo bar diff --git a/examples/website.kdl b/examples/website.kdl index b8faafe..d2c7dc5 100644 --- a/examples/website.kdl +++ b/examples/website.kdl @@ -6,13 +6,13 @@ html lang=en { meta \ name=description \ content="kdl is a document language, mostly based on SDLang, with xml-like semantics that looks like you're invoking a bunch of CLI commands!" - title "kdl - Kat's Document Language" + title "kdl - The KDL Document Language" link rel=stylesheet href="/styles/global.css" } body { main { header class="py-10 bg-gray-300" { - h1 class="text-4xl text-center" "kdl - Kat's Document Language" + h1 class="text-4xl text-center" "kdl - The KDL Document Language" } section class=kdl-section id=description { p { From 793a9d4ce7f64c6acdf54680f93b836b69d58ee8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 8 Feb 2024 11:24:40 -0800 Subject: [PATCH 074/105] normalize literal newlines in multiline strings Fixes: https://github.com/kdl-org/kdl/issues/360 --- CHANGELOG.md | 2 ++ SPEC.md | 12 ++++++++++++ 2 files changed, 14 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index d51357b..9f203cd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -65,6 +65,8 @@ identifiers. * `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and conflicts with numbers. +* Multi-line strings' literal Newline sequences are now normalized to single + `LF`s. ### KQL diff --git a/SPEC.md b/SPEC.md index 4940da1..a850913 100644 --- a/SPEC.md +++ b/SPEC.md @@ -463,6 +463,18 @@ removed from all other lines. It is a syntax error for any body lines of the multi-line string to not match the whitespace prefix of the last line with the final quote. +#### Newline Normalization + +Literal Newline sequences in Multi-line Strings must be normalized to a single +`U+000A` (`LF`) during deserialization. This means, for example, that `CR LF` +becomes a single `LF` during parsing. + +This normalization does not apply to non-literal Newlines entered using escape +sequences. + +For clarity: this normalization is for individual sequences. That is, the +literal sequence `CRLF CRLF` becomes `LF LF`, not `LF`. + #### Example ```kdl From abae1f9a3908d133e563ecfc54581e88534372f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 8 Feb 2024 16:16:42 -0800 Subject: [PATCH 075/105] more fixes --- QUERY-SPEC.md | 10 +++++----- SPEC.md | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 5fcb4ee..d67f9b2 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -107,15 +107,15 @@ Then the following queries are valid: For rules that are not defined in this grammar, see [the KDL grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar). ``` +query-str := bom? query query := selector q-ws* "||" q-ws* query | selector selector := filter q-ws* selector-operator q-ws* selector | filter selector-operator := ">>" | ">" | "++" | "+" -filter := matcher+ -matcher := "top()"| "()" | identifier | type | accessor-matcher -accessor-matcher := "[" (comparison | accessor)? "]" +filter := ( "top(" q-ws* ")" | "(" q-ws* ")" | type ) string? accessor-matcher* +accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]" comparison := accessor q-ws* matcher-operator q-ws* (type | identifier | string | number | keyword) -accessor := "val(" number ")" | "prop(" identifier ")" | "name()" | "tag()" | "values()" | "props()" | identifier +accessor := "val(" q-ws* integer q-ws* ")" | "prop(" q-ws* identifier q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | identifier matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*=" -q-ws := bom | unicode-space +q-ws := unicode-space ``` diff --git a/SPEC.md b/SPEC.md index a850913..39a9f28 100644 --- a/SPEC.md +++ b/SPEC.md @@ -445,7 +445,7 @@ desired. A Multi-line string _MUST_ start with a [Newline](#newline) immediately following its opening `"`. Its final line _MUST_ contain only whitespace, followed by a single closing `"`. All in-between lines that contain -non-whitespace characters _MUST_ start with the exact same whitespace as the +non-newline characters _MUST_ start with the exact same whitespace as the final line (precisely matching codepoints, not merely counting characters). The value of the Multi-line String omits the first and last Newline, the @@ -738,7 +738,7 @@ signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign -quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline ws*) '"' +quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"' single-line-string-body := (string-character - newline)* multi-line-string-body := string-character* string-character := '\' escape | [^\\"] - disallowed-literal-code-points @@ -746,7 +746,7 @@ escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] raw-string := '#' raw-string-quotes '#' | '#' raw-string '#' -raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-string-body newline ws*) '"' +raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-string-body newline unicode-space*) '"' single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* From 7ab86588c08cf1965ebfa4e9e169c3108661c14e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Sun, 11 Feb 2024 21:05:23 -0800 Subject: [PATCH 076/105] iterate a bit on KQL --- CHANGELOG.md | 3 +++ QUERY-SPEC.md | 16 ++++++++++------ 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9f203cd..2404944 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -74,4 +74,7 @@ spaces for that purpose. * The "any sibling" selector is now `++` instead of `~`, for consistency with the new descendant selector. +* Some parsing logic around the grammar has changed. +* Multi- and single-line comments are now supported, as well as line + continuations with `\`. * Map operators have been removed entirely. diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index d67f9b2..114a75c 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -104,18 +104,22 @@ Then the following queries are valid: ## Full Grammar -For rules that are not defined in this grammar, see [the KDL grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar). +Rules that are not defined in this grammar are prefixed with `$`, see [the KDL +grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar) for +what they expand to. ``` -query-str := bom? query +query-str := $bom? query query := selector q-ws* "||" q-ws* query | selector selector := filter q-ws* selector-operator q-ws* selector | filter selector-operator := ">>" | ">" | "++" | "+" -filter := ( "top(" q-ws* ")" | "(" q-ws* ")" | type ) string? accessor-matcher* +filter := "top(" q-ws* ")" | matchers +matchers := type-matcher $string? accessor-matcher* | $string accessor-matcher* | accessor-matcher+ +type-matcher := "(" q-ws* ")" | $type accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]" -comparison := accessor q-ws* matcher-operator q-ws* (type | identifier | string | number | keyword) -accessor := "val(" q-ws* integer q-ws* ")" | "prop(" q-ws* identifier q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | identifier +comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword) +accessor := "val(" q-ws* $integer q-ws* ")" | "prop(" q-ws* $string q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | $string matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*=" -q-ws := unicode-space +q-ws := $plain-node-space ``` From ec7880d4a59a6e2aed30204ee2f6498e113c7b88 Mon Sep 17 00:00:00 2001 From: wackbyte Date: Mon, 12 Feb 2024 13:53:38 -0500 Subject: [PATCH 077/105] Fix broken formatting in grammar language example (#375) --- SPEC.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/SPEC.md b/SPEC.md index 39a9f28..8acbec1 100644 --- a/SPEC.md +++ b/SPEC.md @@ -800,7 +800,7 @@ Specifically: * `[]` are used for regex-style character matches, where any character between the brackets will be a single match. `\` is used to escape `\`, `[`, and `]`. They also support character ranges (`0-9`), and negation (`^`) -* `-` is used for "except for" or "minus" whatever follows it. For example, `a - - `'x'` means "any `a`, except something that matches the literal `'x'`". +* `-` is used for "except for" or "minus" whatever follows it. For example, + `a - 'x'` means "any `a`, except something that matches the literal `'x'`". * The prefix `^` means "something that does not match" whatever follows it. For example, `^foo` means "must not match `foo`". From 921211782f036c066983070db0462d9924eed30a Mon Sep 17 00:00:00 2001 From: wackbyte Date: Mon, 12 Feb 2024 13:54:07 -0500 Subject: [PATCH 078/105] Remove extra indent in CI example (#376) --- examples/ci.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ci.kdl b/examples/ci.kdl index 00d49cd..d2fcf0e 100644 --- a/examples/ci.kdl +++ b/examples/ci.kdl @@ -45,7 +45,7 @@ jobs { step "Other Stuff" run=" echo foo echo bar - echo baz + echo baz " } } From 631ec14059f4832848f192eea4698ba3149c2c4b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Mon, 12 Feb 2024 22:58:27 -0800 Subject: [PATCH 079/105] allow /- at the very beginning of a document --- SPEC.md | 2 +- tests/test_cases/expected_kdl/initial_slashdash.kdl | 1 + tests/test_cases/input/initial_slashdash.kdl | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-) create mode 100644 tests/test_cases/expected_kdl/initial_slashdash.kdl create mode 100644 tests/test_cases/input/initial_slashdash.kdl diff --git a/SPEC.md b/SPEC.md index 8acbec1..c589769 100644 --- a/SPEC.md +++ b/SPEC.md @@ -706,7 +706,7 @@ language syntax](#grammar-language) is defined below. ``` document := bom? nodes -nodes := (line-space* node)* line-space* +nodes := ('/-' plain-node-space* node)? (line-space* node)* line-space* plain-line-space := newline | ws | single-line-comment plain-node-space := ws* escline ws* | ws+ diff --git a/tests/test_cases/expected_kdl/initial_slashdash.kdl b/tests/test_cases/expected_kdl/initial_slashdash.kdl new file mode 100644 index 0000000..d74a990 --- /dev/null +++ b/tests/test_cases/expected_kdl/initial_slashdash.kdl @@ -0,0 +1 @@ +another-node diff --git a/tests/test_cases/input/initial_slashdash.kdl b/tests/test_cases/input/initial_slashdash.kdl new file mode 100644 index 0000000..aadeeb7 --- /dev/null +++ b/tests/test_cases/input/initial_slashdash.kdl @@ -0,0 +1,2 @@ +/-node here +another-node From fa816ca6df27a90a664c1661b6282a14c9f81b11 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Mon, 12 Feb 2024 23:17:39 -0800 Subject: [PATCH 080/105] add floats Fixes: https://github.com/kdl-org/kdl/issues/374 --- CHANGELOG.md | 5 ++++ SPEC.md | 27 ++++++++++++++----- .../expected_kdl/floating_point_keywords.kdl | 1 + ...t_keyword_identifier_strings_error.kdl.kdl | 1 + .../input/floating_point_keywords.kdl | 1 + 5 files changed, 28 insertions(+), 7 deletions(-) create mode 100644 tests/test_cases/expected_kdl/floating_point_keywords.kdl create mode 100644 tests/test_cases/input/floating_point_keyword_identifier_strings_error.kdl.kdl create mode 100644 tests/test_cases/input/floating_point_keywords.kdl diff --git a/CHANGELOG.md b/CHANGELOG.md index 2404944..88ce7d7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -67,6 +67,11 @@ conflicts with numbers. * Multi-line strings' literal Newline sequences are now normalized to single `LF`s. +* `#inf`, `#-inf`, and `#nan` have been added in order to properly support + IEEE floats for implementations that choose to represent their decimals that + way. +* Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax + errors. ### KQL diff --git a/SPEC.md b/SPEC.md index c589769..8bb96b8 100644 --- a/SPEC.md +++ b/SPEC.md @@ -302,11 +302,11 @@ characters](#non-identifier-characters). A handful of patterns are disallowed, to avoid confusion with other values: -* idents that appear to start with a [Number](#number) - (like `1.0v2` or `-1em`) - or the "almost a number" pattern of a decimal point without a leading digit - (like `.1`) -* idents that are the language keywords (`true`, `false`, and `null`) without their leading `#` +* idents that appear to start with a [Number](#number) (like `1.0v2` or + `-1em`) or the "almost a number" pattern of a decimal point without a + leading digit (like `.1`)/ +* idents that are the language keywords (`inf`, `-inf`, `nan`, `true`, + `false`, and `null`) without their leading `#`. Identifiers that match these patterns _MUST_ be treated as a syntax error; such values can only be written as quoted or raw strings. @@ -569,9 +569,9 @@ Numbers in KDL represent numerical [Values](#value). There is no logical distinc between real numbers, integers, and floating point numbers. It's up to individual implementations to determine how to represent KDL numbers. -There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary. +There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary. -* All numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative. +* All non-[Keyword](#keyword-numbers) numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative. * Binary numbers start with `0b` and only allow `0` and `1` as digits, which may be separated by `_`. They represent numbers in radix 2. * Octal numbers start with `0o` and only allow digits between `0` and `7`, which may be separated by `_`. They represent numbers in radix 8. * Hexadecimal numbers start with `0x` and allow digits between `0` and `9`, as well as letters `A` through `F`, in either lower or upper case, which may be separated by `_`. They represent numbers in radix 16. @@ -586,6 +586,19 @@ numbers without an integer digit (such as `.1`) are illegal. They must be written with at least one integer digit, like `0.1`. (These patterns are also disallowed from [Identifier Strings](#identifier-string), to avoid confusion.) +#### Keyword Numbers + +There are three special "keyword" numbers included in KDL to accomodate the +widespread use of [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floats: + +* `#inf` - floating point positive infinity. +* `#-inf` - floating point negative infinity. +* `#nan` - floating point NaN/Not a Number. + +To go along with this and prevent foot guns, the bare [Identifier +Strings](#identifier-string) `inf`, `-inf`, and `nan` are considered illegal +identifiers and should yield a syntax error. + ### Boolean A boolean [Value](#value) is either the symbol `#true` or `#false`. These diff --git a/tests/test_cases/expected_kdl/floating_point_keywords.kdl b/tests/test_cases/expected_kdl/floating_point_keywords.kdl new file mode 100644 index 0000000..973a259 --- /dev/null +++ b/tests/test_cases/expected_kdl/floating_point_keywords.kdl @@ -0,0 +1 @@ +floats #inf #-inf #nan diff --git a/tests/test_cases/input/floating_point_keyword_identifier_strings_error.kdl.kdl b/tests/test_cases/input/floating_point_keyword_identifier_strings_error.kdl.kdl new file mode 100644 index 0000000..e120167 --- /dev/null +++ b/tests/test_cases/input/floating_point_keyword_identifier_strings_error.kdl.kdl @@ -0,0 +1 @@ +floats inf -inf nan diff --git a/tests/test_cases/input/floating_point_keywords.kdl b/tests/test_cases/input/floating_point_keywords.kdl new file mode 100644 index 0000000..973a259 --- /dev/null +++ b/tests/test_cases/input/floating_point_keywords.kdl @@ -0,0 +1 @@ +floats #inf #-inf #nan From e773747b0bd22dd12e403945ceea969fc5db0b86 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Mon, 12 Feb 2024 23:20:55 -0800 Subject: [PATCH 081/105] Release 2.0 draft 4 --- README.md | 2 +- SPEC.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b9175e7..2c3fa05 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ Language](SCHEMA-SPEC.md) loosely based on JSON Schema. The language is based on [SDLang](https://sdlang.org), with a number of modifications and clarifications on its syntax and behavior. -The current version of the KDL spec is `2.0.0-draft.3`. +The current version of the KDL spec is `2.0.0-draft.4`. [Play with it in your browser!](https://kdl-play.danini.dev/) diff --git a/SPEC.md b/SPEC.md index 8bb96b8..627e473 100644 --- a/SPEC.md +++ b/SPEC.md @@ -3,8 +3,8 @@ This is the semi-formal specification for KDL, including the intended data model and the grammar. -This document describes KDL version `2.0.0-draft.3`. It was released on -2024-02-07. +This document describes KDL version `2.0.0-draft.4`. It was released on +2024-02-12. ## Introduction From 2710c90ff5329ee311f34be016a72db96244dce2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 13 Feb 2024 00:15:03 -0800 Subject: [PATCH 082/105] facepalm: forgot the full grammar change for float keywords --- SPEC.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/SPEC.md b/SPEC.md index 627e473..dd39ef8 100644 --- a/SPEC.md +++ b/SPEC.md @@ -746,7 +746,7 @@ equals-sign := See Table ([Equals Sign](#equals-sign)) string := identifier-string | quoted-string | raw-string identifier-string := unambiguous-ident | signed-ident | dotted-ident -unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' +unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign @@ -763,7 +763,7 @@ raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-s single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* -number := hex | octal | binary | decimal +number := keyword-number | hex | octal | binary | decimal decimal := sign? integer ('.' integer)? exponent? exponent := ('e' | 'E') sign? integer @@ -775,7 +775,9 @@ hex := sign? '0x' hex-digit (hex-digit | '_')* octal := sign? '0o' [0-7] [0-7_]* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* -keyword := boolean | '#null' +keyword := keyword-number | boolean | '#null' + +keyword-number := '#inf' | '#-inf' | '#nan' boolean := '#true' | '#false' From 2fcf6d42d32368328a0f286a5cd93e3688128b09 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 15 Feb 2024 12:03:18 -0800 Subject: [PATCH 083/105] Update tests/test_cases/expected_kdl/multiline_string_indented.kdl Co-authored-by: Dani Smith --- tests/test_cases/expected_kdl/multiline_string_indented.kdl | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/test_cases/expected_kdl/multiline_string_indented.kdl b/tests/test_cases/expected_kdl/multiline_string_indented.kdl index e7638d8..f693b84 100644 --- a/tests/test_cases/expected_kdl/multiline_string_indented.kdl +++ b/tests/test_cases/expected_kdl/multiline_string_indented.kdl @@ -1,2 +1 @@ node " hey\n everyone\n how goes?" - From dadcfdf2ae33ab5e8baf10c7d6529b804078ae5a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 15 Feb 2024 12:03:25 -0800 Subject: [PATCH 084/105] Update tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl Co-authored-by: Dani Smith --- tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl b/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl index e7638d8..f693b84 100644 --- a/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl +++ b/tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl @@ -1,2 +1 @@ node " hey\n everyone\n how goes?" - From 9132a96e56201d4cf72a98ea40892710bca80bdc Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Sun, 18 Feb 2024 22:15:58 +0100 Subject: [PATCH 085/105] Quote identifiers that contain an equals sign (#381) --- examples/kdl-schema.kdl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/kdl-schema.kdl b/examples/kdl-schema.kdl index 041c464..e3c986d 100644 --- a/examples/kdl-schema.kdl +++ b/examples/kdl-schema.kdl @@ -290,7 +290,7 @@ document { type number } } - node >= description="Only used for numeric values. Constrains them to be greater than or equal to the given number(s)" { + node ">=" description="Only used for numeric values. Constrains them to be greater than or equal to the given number(s)" { max 1 value { min 1 @@ -306,7 +306,7 @@ document { type number } } - node <= description="Only used for numeric values. Constrains them to be less than or equal to the given number(s)" { + node "<=" description="Only used for numeric values. Constrains them to be less than or equal to the given number(s)" { max 1 value { min 1 From 9e7b958f0c35b61f7b4f3f5d022eb24cdf75bf45 Mon Sep 17 00:00:00 2001 From: Bram Gotink Date: Sun, 18 Feb 2024 22:18:50 +0100 Subject: [PATCH 086/105] Ensure spec allows slashdash right after node separator (#382) --- SPEC.md | 4 ++-- tests/test_cases/input/commented_node.kdl | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/SPEC.md b/SPEC.md index dd39ef8..163b024 100644 --- a/SPEC.md +++ b/SPEC.md @@ -719,12 +719,12 @@ language syntax](#grammar-language) is defined below. ``` document := bom? nodes -nodes := ('/-' plain-node-space* node)? (line-space* node)* line-space* +nodes := (line-space* node)* line-space* plain-line-space := newline | ws | single-line-comment plain-node-space := ws* escline ws* | ws+ -line-space := plain-line-space+ ('/-' plain-node-space* node)? +line-space := plain-line-space+ | '/-' plain-node-space* node node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node-children))? required-node-space := node-space* plain-node-space+ diff --git a/tests/test_cases/input/commented_node.kdl b/tests/test_cases/input/commented_node.kdl index c9e5d12..1460d67 100644 --- a/tests/test_cases/input/commented_node.kdl +++ b/tests/test_cases/input/commented_node.kdl @@ -1,2 +1,3 @@ /- node_1 node_2 +/- node_3 From b294e9cb5ad4d1fd69ad764f395bfc4e8a3243e6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 5 Mar 2024 12:45:30 -0800 Subject: [PATCH 087/105] Update README.md Co-authored-by: Bannerets --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2c3fa05..1f4d609 100644 --- a/README.md +++ b/README.md @@ -176,7 +176,7 @@ string " my multiline value -" + " ``` Raw strings, which do not support `\` escapes and can be used when you want From 2de2ddc708278d8dd72f960984045e1f2dcb4e77 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 5 Mar 2024 12:45:48 -0800 Subject: [PATCH 088/105] Update README.md Co-authored-by: Bannerets --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1f4d609..ed3d9ab 100644 --- a/README.md +++ b/README.md @@ -187,7 +187,7 @@ exec #" echo "foo" echo "bar" cd C:\path\to\dir -"# + "# regex #"\d{3} "[^/"]+""# ``` From aeb41cc7d721c66653b8e58af8b2c31cdece7ce3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 5 Mar 2024 12:46:04 -0800 Subject: [PATCH 089/105] Update examples/ci.kdl Co-authored-by: Bannerets --- examples/ci.kdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ci.kdl b/examples/ci.kdl index d2fcf0e..1e000aa 100644 --- a/examples/ci.kdl +++ b/examples/ci.kdl @@ -46,7 +46,7 @@ jobs { echo foo echo bar echo baz - " + " } } } From d0b30c3f35fe406912d60c43dfb92b872f3c9e60 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 5 Mar 2024 12:47:07 -0800 Subject: [PATCH 090/105] Update SPEC.md Co-authored-by: Bannerets --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 163b024..8e38b2b 100644 --- a/SPEC.md +++ b/SPEC.md @@ -378,7 +378,7 @@ In addition to escaping individual characters, `\` can also escape whitespace. When a `\` is followed by one or more literal whitespace characters, the `\` and all of that whitespace are discarded. For example, `"Hello World"` and `"Hello \ World"` are semantically identical. See [whitespace](#whitespace) -and [newlines](#newlines) for how whitespace is defined. +and [newlines](#newline) for how whitespace is defined. Note that only literal whitespace is escaped; whitespace escapes (`\n` and such) are retained. For example, these strings are all semantically identical: From 281de7e97732bc83f2773f5e03c09e6f5f28c0ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Mon, 1 Apr 2024 14:26:44 -0700 Subject: [PATCH 091/105] review fixes --- SPEC.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 8e38b2b..13ce90a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -709,6 +709,8 @@ They may be represented in Strings (but not Raw Strings) using `\u{}`. * `U+200E-200F`, `U+202A-202E`, and `U+2066-2069`, the [unicode "direction control" characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) +* `U+FEFF`, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM), + except as the first code point in a document. ## Full Grammar @@ -755,7 +757,7 @@ quoted-string := '"' (single-line-string-body | newline multi-line-string-body n single-line-string-body := (string-character - newline)* multi-line-string-body := string-character* string-character := '\' escape | [^\\"] - disallowed-literal-code-points -escape := ["\\bfnrt] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ +escape := ["\\bfnrts] | 'u{' hex-digit{1, 6} '}' | (unicode-space | newline)+ hex-digit := [0-9a-fA-F] raw-string := '#' raw-string-quotes '#' | '#' raw-string '#' From d064bc9026d31cbc3cd6cdbe78efe37a05fb443a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Mon, 1 Apr 2024 14:37:52 -0700 Subject: [PATCH 092/105] clarify multi-line strings and escapes interaction --- SPEC.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 13ce90a..880e34d 100644 --- a/SPEC.md +++ b/SPEC.md @@ -304,7 +304,7 @@ A handful of patterns are disallowed, to avoid confusion with other values: * idents that appear to start with a [Number](#number) (like `1.0v2` or `-1em`) or the "almost a number" pattern of a decimal point without a - leading digit (like `.1`)/ + leading digit (like `.1`). * idents that are the language keywords (`inf`, `-inf`, `nan`, `true`, `false`, and `null`) without their leading `#`. @@ -397,6 +397,31 @@ such) are retained. For example, these strings are all semantically identical: " ``` +Escapes MUST be processed _after_ [Multi-line String](#multi-line-strings) +processing. That is, the following strings are illegal: + +```kdl +// Indentation checks are processed before whitespace escapes. + " + foo\ +bar + " + +// Essentially trying to escape `foo\nbar\`, which is an error due to missing +// escape character. + " + foo + bar\ + " +``` + +But the following is legal, since it doesn't use Multi-line String rules: + +```kdl + "foo\ + bar" +``` + ##### Invalid escapes Except as described in the escapes table, above, `\` *MUST NOT* precede any From fa9d30388c2dab1309eabb3e491a14735634cb37 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 15 Feb 2024 00:17:41 -0800 Subject: [PATCH 093/105] remove duplication of keyword-number --- SPEC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index 880e34d..942766b 100644 --- a/SPEC.md +++ b/SPEC.md @@ -802,7 +802,7 @@ hex := sign? '0x' hex-digit (hex-digit | '_')* octal := sign? '0o' [0-7] [0-7_]* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* -keyword := keyword-number | boolean | '#null' +keyword := boolean | '#null' keyword-number := '#inf' | '#-inf' | '#nan' From bea0f67685718d19fa27c9b1b8e1176b41d399de Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Mon, 1 Apr 2024 16:53:33 -0700 Subject: [PATCH 094/105] turn it around: escapes should be resolved _before_ dedenting --- SPEC.md | 61 ++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 36 insertions(+), 25 deletions(-) diff --git a/SPEC.md b/SPEC.md index 942766b..f9ce154 100644 --- a/SPEC.md +++ b/SPEC.md @@ -397,31 +397,6 @@ such) are retained. For example, these strings are all semantically identical: " ``` -Escapes MUST be processed _after_ [Multi-line String](#multi-line-strings) -processing. That is, the following strings are illegal: - -```kdl -// Indentation checks are processed before whitespace escapes. - " - foo\ -bar - " - -// Essentially trying to escape `foo\nbar\`, which is an error due to missing -// escape character. - " - foo - bar\ - " -``` - -But the following is legal, since it doesn't use Multi-line String rules: - -```kdl - "foo\ - bar" -``` - ##### Invalid escapes Except as described in the escapes table, above, `\` *MUST NOT* precede any @@ -500,6 +475,42 @@ sequences. For clarity: this normalization is for individual sequences. That is, the literal sequence `CRLF CRLF` becomes `LF LF`, not `LF`. +#### Interaction with Whitespace Escapes + +Multi-line strings support the same mechanism for escaping whitespace. When +Processing a Multi-line String, implementations MUST resolve all whitespace +escapes _before_ dedenting the string. + +For example, the following is legal: + +```kdl + " + foo \ +bar + baz + " + // becomes: + "foo bar\nbaz" +``` + +But the following is not, because the whitespace escape would consume the +indentation prior to dedenting: + +```kdl + " + foo + bar\ + " + + // equivalent to writing: + + " + foo + bar" + + // which is illegal. +``` + #### Example ```kdl From c9134e3c162325e40d8b6b833fdddc0dcfc94518 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 2 Apr 2024 00:58:25 -0700 Subject: [PATCH 095/105] change escape resolution order again --- SPEC.md | 25 +++++++------------------ 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/SPEC.md b/SPEC.md index f9ce154..203c4f7 100644 --- a/SPEC.md +++ b/SPEC.md @@ -478,37 +478,26 @@ literal sequence `CRLF CRLF` becomes `LF LF`, not `LF`. #### Interaction with Whitespace Escapes Multi-line strings support the same mechanism for escaping whitespace. When -Processing a Multi-line String, implementations MUST resolve all whitespace -escapes _before_ dedenting the string. +processing a Multi-line String, implementations MUST resolve all whitespace +escapes _after_ dedenting the string. Furthermore, a whitespace escape that +attempts to escape the final line's newline and/or whitespace prefix is +invalid, since this technically means it's trying to escape "nothing". -For example, the following is legal: +For example, the following example are both illegal: ```kdl + // All multi-line strings must have the right dedent. " foo \ bar baz " - // becomes: - "foo bar\nbaz" -``` - -But the following is not, because the whitespace escape would consume the -indentation prior to dedenting: -```kdl + // Equivalent to trying to write a string containing `foo\nbar\`. " foo bar\ " - - // equivalent to writing: - - " - foo - bar" - - // which is illegal. ``` #### Example From fa204cec62abef085e33af65f849994846ae68a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Tue, 2 Apr 2024 21:36:25 -0700 Subject: [PATCH 096/105] unicode was not defined in grammar --- SPEC.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/SPEC.md b/SPEC.md index 203c4f7..e6673d6 100644 --- a/SPEC.md +++ b/SPEC.md @@ -818,6 +818,8 @@ bom := '\u{FEFF}' disallowed-literal-code-points := See Table (Disallowed Literal Code Points) +unicode := Any Unicode Scalar Value + unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`) single-line-comment := '//' ^newline* (newline | eof) From 6a77436e09b3a0e3e2fea57c13e33a5bb9f2e765 Mon Sep 17 00:00:00 2001 From: Romain Delamare Date: Wed, 17 Apr 2024 21:41:50 +0200 Subject: [PATCH 097/105] kql: only allow top() at start of selector (#388) --- QUERY-SPEC.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/QUERY-SPEC.md b/QUERY-SPEC.md index 114a75c..70e4906 100644 --- a/QUERY-SPEC.md +++ b/QUERY-SPEC.md @@ -30,6 +30,11 @@ properties, node names, etc). With the exception of `top()` and `()`, they are a used inside a `[]` selector. Some matchers are unary, but most of them involve binary operators. +The `top()` matcher can only be used as the first matcher of a selector. This means +that it cannot be the right operand of the `>`, `>>`, `+`, or `++` operators. As `||` +combines selectors, the `top()` can appear just after it. For instance, + `a > b || top() > b` is valid, but `a > top()` is not. + * `top()`: Returns all toplevel children of the current document. * `top() > []`: Equivalent to `top()` on its own. * `(foo)`: Selects any element whose type annotation is `foo`. @@ -111,7 +116,8 @@ what they expand to. ``` query-str := $bom? query query := selector q-ws* "||" q-ws* query | selector -selector := filter q-ws* selector-operator q-ws* selector | filter +selector := filter q-ws* selector-operator q-ws* selector-subsequent | filter +selector-subsequent := matchers q-ws* selector-operator q-ws* selector-subsequent | matchers selector-operator := ">>" | ">" | "++" | "+" filter := "top(" q-ws* ")" | matchers matchers := type-matcher $string? accessor-matcher* | $string accessor-matcher* | accessor-matcher+ From bcfb3321c48c48d6d9e644bb6691e26eb16d5774 Mon Sep 17 00:00:00 2001 From: Thomas Jollans Date: Thu, 13 Jun 2024 20:55:44 +0200 Subject: [PATCH 098/105] Tweak rules for escaped whitespace in multi-line strings (#392) These rules are a bit more liberal than what was described previously, but I think they're clearer and more consistent: * This way, strings have the (I think intuitive) property that, when you 'blindly' remove the whitespace escapes, the meaning is unchanged. * If you take any valid single-line string and add a newline character and some indentation both at the start and the end, the string will still be valid (and unchanged) - previously, this was not necessarily the case if there were whitespace escapes. --- SPEC.md | 63 ++++++++++++++++++++++++++++++++------------------------- 1 file changed, 36 insertions(+), 27 deletions(-) diff --git a/SPEC.md b/SPEC.md index e6673d6..c6ca536 100644 --- a/SPEC.md +++ b/SPEC.md @@ -475,31 +475,6 @@ sequences. For clarity: this normalization is for individual sequences. That is, the literal sequence `CRLF CRLF` becomes `LF LF`, not `LF`. -#### Interaction with Whitespace Escapes - -Multi-line strings support the same mechanism for escaping whitespace. When -processing a Multi-line String, implementations MUST resolve all whitespace -escapes _after_ dedenting the string. Furthermore, a whitespace escape that -attempts to escape the final line's newline and/or whitespace prefix is -invalid, since this technically means it's trying to escape "nothing". - -For example, the following example are both illegal: - -```kdl - // All multi-line strings must have the right dedent. - " - foo \ -bar - baz - " - - // Equivalent to trying to write a string containing `foo\nbar\`. - " - foo - bar\ - " -``` - #### Example ```kdl @@ -510,7 +485,7 @@ multi-line " " ``` -The last example's string value will be: +This example's string value will be: ``` foo @@ -518,7 +493,8 @@ This is the base indentation bar ``` -Equivalent to `" foo\nThis is the base indentation\n bar"`. +which is equivalent to `" foo\nThis is the base indentation\n bar"` +when written as a single-line string. --------- @@ -588,6 +564,39 @@ multi-line "[\n] [tab]" ``` +#### Interaction with Whitespace Escapes + +Multi-line strings support the same mechanism for escaping whitespace. When +processing a Multi-line String, implementations MUST dedent the string _after_ +resolving all whitespace escapes, but _before_ resolving other backslash escapes. +Furthermore, a whitespace escape that attempts to escape the final line's newline +and/or whitespace prefix is invalid since the multi-line string has to still be +valid with the escaped whitespace removed. + +For example, the following example is illegal: + +```kdl + // Equivalent to trying to write a string containing `foo\nbar\`. + " + foo + bar\ + " +``` + +while the following example is allowed +```kdl + " + foo \ +bar + baz + \ " + // this is equivalent to + " + foo bar + baz + " +``` + ### Number Numbers in KDL represent numerical [Values](#value). There is no logical distinction in KDL From 1e924bcc7f6ebe6b5ef3d16255f1a869f62f2b50 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 3 Oct 2024 20:53:01 -0700 Subject: [PATCH 099/105] clarifications around multiline prefixes --- SPEC.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/SPEC.md b/SPEC.md index c6ca536..b21630b 100644 --- a/SPEC.md +++ b/SPEC.md @@ -445,8 +445,9 @@ desired. A Multi-line string _MUST_ start with a [Newline](#newline) immediately following its opening `"`. Its final line _MUST_ contain only whitespace, followed by a single closing `"`. All in-between lines that contain -non-newline characters _MUST_ start with the exact same whitespace as the -final line (precisely matching codepoints, not merely counting characters). +non-newline characters _MUST_ start with _at least_ the exact same whitespace +as the final line (precisely matching codepoints, not merely counting characters). +They may contain additional whitespace following this prefix. The value of the Multi-line String omits the first and last Newline, the Whitespace of the last line, and the matching Whitespace prefix on all From 93c4400a96e737c8ab001ca1bc8b9e6ad7da8624 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 27 Nov 2024 01:01:35 -0800 Subject: [PATCH 100/105] clarify that numbers don't need to be IEEE 754 floats --- SPEC.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/SPEC.md b/SPEC.md index b21630b..0886d9e 100644 --- a/SPEC.md +++ b/SPEC.md @@ -634,6 +634,10 @@ To go along with this and prevent foot guns, the bare [Identifier Strings](#identifier-string) `inf`, `-inf`, and `nan` are considered illegal identifiers and should yield a syntax error. +The existence of these keywords does not imply that any numbers be represented +as IEEE 754 floats. These are simply for clarity and convenience for any +implementation that chooses to represent their numbers in this way. + ### Boolean A boolean [Value](#value) is either the symbol `#true` or `#false`. These From fa3050ccc01959be0be2eafdd52b62dc97d6c64c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Wed, 27 Nov 2024 23:59:09 -0800 Subject: [PATCH 101/105] add 128-bit ints --- CHANGELOG.md | 1 + SPEC.md | 2 ++ 2 files changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 88ce7d7..fdf4140 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -72,6 +72,7 @@ way. * Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax errors. +* `u128` and `i128` have been added as well-known number type annotations. ### KQL diff --git a/SPEC.md b/SPEC.md index 0886d9e..e09582a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -218,6 +218,7 @@ Signed integers of various sizes (the number is the bit size): * `i16` * `i32` * `i64` +* `i128` Unsigned integers of various sizes (the number is the bit size): @@ -225,6 +226,7 @@ Unsigned integers of various sizes (the number is the bit size): * `u16` * `u32` * `u64` +* `u128` Platform-dependent integer types, both signed and unsigned: From 1588b1f5fd9d7807cf25f6d06a58770ee09f1af3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 28 Nov 2024 22:39:19 -0800 Subject: [PATCH 102/105] get rid of syntactically significant unicode equals signs (#400) Fixes: #399 --- CHANGELOG.md | 4 ---- README.md | 17 +++++++------ SPEC.md | 24 ++++--------------- .../expected_kdl/unicode_equals_signs.kdl | 1 - .../test_cases/expected_kdl/unicode_silly.kdl | 1 + .../test_cases/input/unicode_equals_signs.kdl | 4 ---- tests/test_cases/input/unicode_silly.kd | 1 + 7 files changed, 15 insertions(+), 37 deletions(-) delete mode 100644 tests/test_cases/expected_kdl/unicode_equals_signs.kdl create mode 100644 tests/test_cases/expected_kdl/unicode_silly.kdl delete mode 100644 tests/test_cases/input/unicode_equals_signs.kdl create mode 100644 tests/test_cases/input/unicode_silly.kd diff --git a/CHANGELOG.md b/CHANGELOG.md index fdf4140..abd18b9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -59,10 +59,6 @@ whitespace matching the whitespace prefix of the closing line. Multiline strings and raw strings now must have a newline immediately following their opening `"`, and a final newline plus whitespace preceding the closing `"`. -* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY - EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for - properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare - identifiers. * `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and conflicts with numbers. * Multi-line strings' literal Newline sequences are now normalized to single diff --git a/README.md b/README.md index ed3d9ab..a390762 100644 --- a/README.md +++ b/README.md @@ -158,11 +158,10 @@ node3 #"C:\Users\zkat\raw\string"# You don't have to quote strings unless any the following apply: * The string contains whitespace. - * The string contains any of `[]{}()\/#";`. - * The string is one of `true`, `false`, or `null`. + * The string contains any of `[]{}()\/#";=`. + * The string is one of `true`, `false`, `null`, `inf`, `-inf`, or `nan`. * The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit. - * The string contains an equals sign (including unicode equals signs `﹦`, - `=`, and `🟰`). + (aka "looks like a number") In essence, if it can get confused for other KDL or KQL syntax, it needs quotes. @@ -296,8 +295,8 @@ smile 😁 // Identifiers are very flexible. The following is a legal bare identifier: <@foo123~!$%^&*.:'|?+> -// And you can also use unicode, even for the equals sign! -ノード お名前=☜(゚ヮ゚☜) +// And you can also use unicode! +ノード お名前=ฅ^•ﻌ•^ฅ // kdl specifically allows properties and values to be // interspersed with each other, much like CLI commands. @@ -335,9 +334,9 @@ SDLang, but that had some design choices I disagreed with. #### Ok, then, why not SDLang? -SDLang is designed for use cases that are not interesting to me, but are very -relevant to the D-lang community. KDL is very similar in many ways, but is -different in the following ways: +SDLang is an excellent base, but I wanted some details ironed out, and some +things removed that only really made sense for SDLang's current use-cases, including +some restrictions about data representation. KDL is very similar in many ways, except: * The grammar and expected semantics are [well-defined and specified](SPEC.md). * There is only one "number" type. KDL does not prescribe representations. diff --git a/SPEC.md b/SPEC.md index e09582a..c812c4a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -112,8 +112,8 @@ my-node 1 2 \ // comments are ok after \ ### Property A Property is a key/value pair attached to a [Node](#node). A Property is -composed of a [String](#string), followed immediately by an [equals -sign](#equals-sign), and then a [Value](#value). +composed of a [String](#string), followed immediately by an equals sign (`=`, `U+003D`), +and then a [Value](#value). Properties should be interpreted left-to-right, with rightmost properties with identical names overriding earlier properties. That is: @@ -131,17 +131,6 @@ still be spec-compliant. Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and make it act as plain whitespace, even if it spreads across multiple lines. -#### Equals Sign - -Any of the following characters may be used as equals signs in properties: - -| Name | Character | Code Point | -|----|-----|----| -| EQUALS SIGN | `=` | `U+003D` | -| SMALL EQUALS SIGN | `﹦` | `U+FE66` | -| FULLWIDTH EQUALS SIGN | `=` | `U+FF1D` | -| HEAVY EQUALS SIGN | `🟰` | `U+1F7F0` | - ### Argument An Argument is a bare [Value](#value) attached to a [Node](#node), with no @@ -334,8 +323,7 @@ negative number. The following characters cannot be used anywhere in a [Identifier String](#identifier-string): -* Any of `(){}[]/\"#;` -* Any [Equals Sign](#equals-sign) +* Any of `(){}[]/\"#;=` * Any [Whitespace](#whitespace) or [Newline](#newline). * Any [disallowed literal code points](#disallowed-literal-code-points) in KDL documents. @@ -780,19 +768,17 @@ node-prop-or-arg := prop | value node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof -prop := string optional-node-space equals-sign optional-node-space value +prop := string optional-node-space '=' optional-node-space value value := type? optional-node-space (string | number | keyword) type := '(' optional-node-space string optional-node-space ')' -equals-sign := See Table ([Equals Sign](#equals-sign)) - string := identifier-string | quoted-string | raw-string identifier-string := unambiguous-ident | signed-ident | dotted-ident unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? -identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign +identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"' single-line-string-body := (string-character - newline)* diff --git a/tests/test_cases/expected_kdl/unicode_equals_signs.kdl b/tests/test_cases/expected_kdl/unicode_equals_signs.kdl deleted file mode 100644 index 4ab6443..0000000 --- a/tests/test_cases/expected_kdl/unicode_equals_signs.kdl +++ /dev/null @@ -1 +0,0 @@ -node p1=val1 p2=val2 p3=val3 diff --git a/tests/test_cases/expected_kdl/unicode_silly.kdl b/tests/test_cases/expected_kdl/unicode_silly.kdl new file mode 100644 index 0000000..5fa566d --- /dev/null +++ b/tests/test_cases/expected_kdl/unicode_silly.kdl @@ -0,0 +1 @@ +ノード お名前=ฅ^•ﻌ•^ฅ diff --git a/tests/test_cases/input/unicode_equals_signs.kdl b/tests/test_cases/input/unicode_equals_signs.kdl deleted file mode 100644 index 37d8e02..0000000 --- a/tests/test_cases/input/unicode_equals_signs.kdl +++ /dev/null @@ -1,4 +0,0 @@ -node \ - p1﹦val1 \ // U+FE66 - p2=val2 \ // U+FF1D - p3🟰val3 // U+1F7F0 diff --git a/tests/test_cases/input/unicode_silly.kd b/tests/test_cases/input/unicode_silly.kd new file mode 100644 index 0000000..5fa566d --- /dev/null +++ b/tests/test_cases/input/unicode_silly.kd @@ -0,0 +1 @@ +ノード お名前=ฅ^•ﻌ•^ฅ From 90e22bc7892001443fdafa06caedb98953b55a61 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 28 Nov 2024 22:53:42 -0800 Subject: [PATCH 103/105] [v2] more predictable slashdash (#407) Fixes: https://github.com/kdl-org/kdl/issues/401 --- CHANGELOG.md | 3 +- SPEC.md | 112 +++++++++++------- .../slashdash_multi_line_comment_entry.kdl | 1 + .../slashdash_multi_line_comment_inline.kdl | 1 + .../slashdash_multiple_child_blocks.kdl | 3 + .../slashdash_newline_before_children.kdl | 1 + .../slashdash_newline_before_entry.kdl | 1 + .../slashdash_newline_before_node.kdl | 0 .../slashdash_single_line_comment_entry.kdl | 1 + .../slashdash_single_line_comment_node.kdl | 1 + ...slashdash_child_block_before_entry_err.kdl | 5 + .../slashdash_multi_line_comment_entry.kdl | 6 + .../slashdash_multi_line_comment_inline.kdl | 1 + .../input/slashdash_multiple_child_blocks.kdl | 10 ++ .../slashdash_newline_before_children.kdl | 4 + .../input/slashdash_newline_before_entry.kdl | 2 + .../input/slashdash_newline_before_node.kdl | 2 + .../slashdash_single_line_comment_entry.kdl | 2 + .../slashdash_single_line_comment_node.kdl | 3 + 19 files changed, 112 insertions(+), 47 deletions(-) create mode 100644 tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl create mode 100644 tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl create mode 100644 tests/test_cases/input/slashdash_child_block_before_entry_err.kdl create mode 100644 tests/test_cases/input/slashdash_multi_line_comment_entry.kdl create mode 100644 tests/test_cases/input/slashdash_multi_line_comment_inline.kdl create mode 100644 tests/test_cases/input/slashdash_multiple_child_blocks.kdl create mode 100644 tests/test_cases/input/slashdash_newline_before_children.kdl create mode 100644 tests/test_cases/input/slashdash_newline_before_entry.kdl create mode 100644 tests/test_cases/input/slashdash_newline_before_node.kdl create mode 100644 tests/test_cases/input/slashdash_single_line_comment_entry.kdl create mode 100644 tests/test_cases/input/slashdash_single_line_comment_node.kdl diff --git a/CHANGELOG.md b/CHANGELOG.md index abd18b9..4927376 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -36,7 +36,7 @@ * Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values. * The spec prose now more explicitly states that strings and raw strings can be used as type annotations. -* A statement in the spec prose that said "It is reasonable for an +* Removed a statement in the spec prose that said "It is reasonable for an implementation to ignore null values altogether when deserializing". This is no longer encouraged or desired. * Code points have been constrained to [Unicode Scalar @@ -69,6 +69,7 @@ * Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax errors. * `u128` and `i128` have been added as well-known number type annotations. +* Slashdash (`/-`) -compatible locations adjusted to be more clear and intuitive. ### KQL diff --git a/SPEC.md b/SPEC.md index c812c4a..c3f749a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -272,8 +272,17 @@ node prop=(regex).* ### String Strings in KDL represent textual UTF-8 [Values](#value). A String is either an -[Identifier String](#identifier-string) (like `foo`), a [Quoted String](#quoted-string) (like `"foo"`) or -a [Raw String](#raw-string) (like `#"foo"#`). Identifier Strings let you write short, "single-word" strings with a minimum of syntax; Quoted Strings let you write strings with whitespace (including newlines!) or escapes; Raw Strings let you write strings with whitespace *but without escapes*, allowing you to not worry about the string's content containing anything that might look like an escape. +[Identifier String](#identifier-string) (like `foo`), a [Quoted +String](#quoted-string) (like `"foo"`) or a [Raw String](#raw-string) (like +`#"foo"#`): + +* Identifier Strings let you write short, "single-word" strings with a + minimum of syntax +* Quoted Strings let you write strings with whitespace + (including newlines!) or escapes +* Raw Strings let you write strings with whitespace *but without escapes*, + allowing you to not worry about the string's content containing anything that + might look like an escape. Strings _MUST_ be represented as UTF-8 values. @@ -299,9 +308,9 @@ A handful of patterns are disallowed, to avoid confusion with other values: * idents that are the language keywords (`inf`, `-inf`, `nan`, `true`, `false`, and `null`) without their leading `#`. -Identifiers that match these patterns _MUST_ be treated as a syntax error; -such values can only be written as quoted or raw strings. -The precise details of the identifier syntax is specified in the [Full Grammar](#full-grammar) below. +Identifiers that match these patterns _MUST_ be treated as a syntax error; such +values can only be written as quoted or raw strings. The precise details of the +identifier syntax is specified in the [Full Grammar](#full-grammar) below. Identifier Strings are terminated by [Whitespace](#whitespace) or [Newlines](#newline). @@ -695,22 +704,26 @@ can be nested. Finally, a special kind of comment called a "slashdash", denoted by `/-`, can be used to comment out entire _components_ of a KDL document logically, and -have those elements be treated as whitespace. - -Slashdash comments can be used before: - -* A [Node](#node) name (or its type annotation): the entire Node is - treated as Whitespace, including all props, args, and children. -* A node [Argument](#argument) (or its type annotation), in which case - the Argument value is treated as Whitespace. -* A [Property](#property) key, in which case the entire property, both - key and value, is treated as Whitespace. -* A [Children Block](#children-block), in which case the entire block, - including all children within, is treated as Whitespace. +have those elements not be included as part of the parsed document data. + +Slashdash comments can be used before the following, including before their type +annotations, if present: + +* A [Node](#node): the entire Node is treated as Whitespace, including all + props, args, and children. +* An [Argument](#argument): the Argument value is treated as Whitespace. +* A [Property](#property) key: the entire property, including both key and value, + is treated as Whitespace. A slashdash of just the property value is not allowed. +* A [Children Block](#children-block): the entire block, including all + children within, is treated as Whitespace. Only other children blocks, whether + slashdashed or not, may follow a slashdashed children block. + +A slashdash may be be followed by any amount of whitespace, including newlines and +comments, before the element that it comments out. ### Newline -The following characters [should be treated as new +The following character sequences [should be treated as new lines](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf): | Acronym | Name | Code Pt | @@ -750,35 +763,36 @@ language syntax](#grammar-language) is defined below. ``` document := bom? nodes +// Nodes nodes := (line-space* node)* line-space* -plain-line-space := newline | ws | single-line-comment -plain-node-space := ws* escline ws* | ws+ - -line-space := plain-line-space+ | '/-' plain-node-space* node -node-space := plain-node-space+ ('/-' plain-node-space* (node-prop-or-arg | node-children))? +base-node := slashdash? type? node-space* string + (node-space+ slashdash? node-prop-or-arg)* + // slashdashed node-children must always be after props and args. + (node-space+ slashdash node-children)* + (node-space+ node-children)? + (node-space+ slashdash node-children)* +node := base-node node-space* node-terminator +final-node := base-node node-space* node-terminator? -required-node-space := node-space* plain-node-space+ -optional-node-space := node-space* - -base-node := type? optional-node-space string (required-node-space node-prop-or-arg)* (required-node-space node-children)? -node := base-node optional-node-space node-terminator -final-node := base-node optional-node-space node-terminator? +// Entries node-prop-or-arg := prop | value node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof -prop := string optional-node-space '=' optional-node-space value -value := type? optional-node-space (string | number | keyword) -type := '(' optional-node-space string optional-node-space ')' +prop := string node-space* '=' node-space* value +value := type? node-space* (string | number | keyword) +type := '(' node-space* string node-space* ')' +// Strings string := identifier-string | quoted-string | raw-string identifier-string := unambiguous-ident | signed-ident | dotted-ident -unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' +unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? -identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points +identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points - equals-sign +disallowed-keyword-identifiers := 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan' quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"' single-line-string-body := (string-character - newline)* @@ -792,6 +806,7 @@ raw-string-quotes := '"' (single-line-raw-string-body | newline multi-line-raw-s single-line-raw-string-body := (unicode - newline - disallowed-literal-code-points)* multi-line-raw-string-body := (unicode - disallowed-literal-code-points)* +// Numbers number := keyword-number | hex | octal | binary | decimal decimal := sign? integer ('.' integer)? exponent? @@ -804,29 +819,31 @@ hex := sign? '0x' hex-digit (hex-digit | '_')* octal := sign? '0o' [0-7] [0-7_]* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* +// Keywords and booleans. keyword := boolean | '#null' - keyword-number := '#inf' | '#-inf' | '#nan' - boolean := '#true' | '#false' -escline := '\\' ws* (single-line-comment | newline | eof) - -newline := See Table (All line-break white_space) - -ws := unicode-space | multi-line-comment - +// Specific code points bom := '\u{FEFF}' - disallowed-literal-code-points := See Table (Disallowed Literal Code Points) - unicode := Any Unicode Scalar Value +unicode-space := See Table (All White_Space unicode characters which are not `newline`) -unicode-space := See Table (All [White_Space](#whitespace) unicode characters which are not `newline`) - +// Comments single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block +slashdash := '/-' line-space* + +// Whitespace +ws := unicode-space | multi-line-comment +escline := '\\' ws* (single-line-comment | newline | eof) +newline := See Table (All Newline White_Space) +// Whitespace where newlines are allowed. +line-space := newline | ws | single-line-comment +// Whitespace within nodes, where newline-ish things must be esclined. +node-space := ws* escline ws* | ws+ ``` ### Grammar language @@ -850,3 +867,6 @@ Specifically: `a - 'x'` means "any `a`, except something that matches the literal `'x'`". * The prefix `^` means "something that does not match" whatever follows it. For example, `^foo` means "must not match `foo`". +* A single definition may be split over multiple lines. Newlines are treated as + spaces. +* `//` at the beginning of a line is used for comments. \ No newline at end of file diff --git a/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_entry.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_multi_line_comment_inline.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl b/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl new file mode 100644 index 0000000..6ff16cc --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_multiple_child_blocks.kdl @@ -0,0 +1,3 @@ +node foo { + three +} diff --git a/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl b/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl new file mode 100644 index 0000000..3b77f56 --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_newline_before_children.kdl @@ -0,0 +1 @@ +node 1 2 diff --git a/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl b/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_newline_before_entry.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl b/tests/test_cases/expected_kdl/slashdash_newline_before_node.kdl new file mode 100644 index 0000000..e69de29 diff --git a/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl b/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl new file mode 100644 index 0000000..0c7db5c --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_single_line_comment_entry.kdl @@ -0,0 +1 @@ +node 1 3 diff --git a/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl b/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl new file mode 100644 index 0000000..6810417 --- /dev/null +++ b/tests/test_cases/expected_kdl/slashdash_single_line_comment_node.kdl @@ -0,0 +1 @@ +node2 diff --git a/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl b/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl new file mode 100644 index 0000000..b9edfc3 --- /dev/null +++ b/tests/test_cases/input/slashdash_child_block_before_entry_err.kdl @@ -0,0 +1,5 @@ +node /-{ + child +} foo { + bar +} diff --git a/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl b/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl new file mode 100644 index 0000000..97a41e7 --- /dev/null +++ b/tests/test_cases/input/slashdash_multi_line_comment_entry.kdl @@ -0,0 +1,6 @@ +node 1 /- /* +multi +line +comment +here +*/ 2 3 diff --git a/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl b/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl new file mode 100644 index 0000000..1fd93ce --- /dev/null +++ b/tests/test_cases/input/slashdash_multi_line_comment_inline.kdl @@ -0,0 +1 @@ +node 1 /-/*two*/2 3 diff --git a/tests/test_cases/input/slashdash_multiple_child_blocks.kdl b/tests/test_cases/input/slashdash_multiple_child_blocks.kdl new file mode 100644 index 0000000..2f85ce1 --- /dev/null +++ b/tests/test_cases/input/slashdash_multiple_child_blocks.kdl @@ -0,0 +1,10 @@ +node foo /-{ + one +} \ +/-{ + two +} { + three +} /-{ + four +} diff --git a/tests/test_cases/input/slashdash_newline_before_children.kdl b/tests/test_cases/input/slashdash_newline_before_children.kdl new file mode 100644 index 0000000..deefb7f --- /dev/null +++ b/tests/test_cases/input/slashdash_newline_before_children.kdl @@ -0,0 +1,4 @@ +node 1 2 /- +{ + child +} diff --git a/tests/test_cases/input/slashdash_newline_before_entry.kdl b/tests/test_cases/input/slashdash_newline_before_entry.kdl new file mode 100644 index 0000000..f6de9f9 --- /dev/null +++ b/tests/test_cases/input/slashdash_newline_before_entry.kdl @@ -0,0 +1,2 @@ +node 1 /- +2 3 diff --git a/tests/test_cases/input/slashdash_newline_before_node.kdl b/tests/test_cases/input/slashdash_newline_before_node.kdl new file mode 100644 index 0000000..545464f --- /dev/null +++ b/tests/test_cases/input/slashdash_newline_before_node.kdl @@ -0,0 +1,2 @@ +/- +node 1 2 3 diff --git a/tests/test_cases/input/slashdash_single_line_comment_entry.kdl b/tests/test_cases/input/slashdash_single_line_comment_entry.kdl new file mode 100644 index 0000000..2f807fc --- /dev/null +++ b/tests/test_cases/input/slashdash_single_line_comment_entry.kdl @@ -0,0 +1,2 @@ +node 1 /- // stuff +2 3 diff --git a/tests/test_cases/input/slashdash_single_line_comment_node.kdl b/tests/test_cases/input/slashdash_single_line_comment_node.kdl new file mode 100644 index 0000000..a378a18 --- /dev/null +++ b/tests/test_cases/input/slashdash_single_line_comment_node.kdl @@ -0,0 +1,3 @@ +/- // this is a comment +node1 +node2 From 76a1de517b23d95d462ed5d69f5242388268c0f9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Thu, 28 Nov 2024 22:55:52 -0800 Subject: [PATCH 104/105] Release 2.0.0 draft 5 --- CHANGELOG.md | 14 +++++++++++++- README.md | 2 +- SPEC.md | 4 ++-- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4927376..5bd6a73 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,18 @@ # KDL Changelog -## 2.0.0 (2024-02-07) +## 2.0.0-draft.5 (2024-11-28) + +* Equals signs other than `=` are no longer supported in properties. +* 128-bit integer type annotations have been added to the list of "well-known" + type annotations. +* Multiline string escape rules have been tweaked significantly. +* `\s` is now a valid escape within a string, representing a space character. +* Slashdash (`/-`)-compatible locations and related grammar adjusted to be more + clear and intuitive. This includes some changes relating to whitespace, + including comments and newlines, which are breaking changes. +* Various updates to test suite to reflect changes. + +## 2.0.0 (Unreleased) ### Grammar diff --git a/README.md b/README.md index a390762..415a91f 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ Language](SCHEMA-SPEC.md) loosely based on JSON Schema. The language is based on [SDLang](https://sdlang.org), with a number of modifications and clarifications on its syntax and behavior. -The current version of the KDL spec is `2.0.0-draft.4`. +The current version of the KDL spec is `2.0.0-draft.5`. [Play with it in your browser!](https://kdl-play.danini.dev/) diff --git a/SPEC.md b/SPEC.md index c3f749a..1cc7ea0 100644 --- a/SPEC.md +++ b/SPEC.md @@ -3,8 +3,8 @@ This is the semi-formal specification for KDL, including the intended data model and the grammar. -This document describes KDL version `2.0.0-draft.4`. It was released on -2024-02-12. +This document describes KDL version `2.0.0-draft.5`. It was released on +2024-11-28. ## Introduction From 8aa4c15758d5ada3426e7f4fc52dcc6c474f257e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kat=20March=C3=A1n?= Date: Fri, 29 Nov 2024 00:01:48 -0800 Subject: [PATCH 105/105] prep readme for merging to main --- README.md | 46 +++++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 415a91f..5bd9944 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,12 @@ # The KDL Document Language +> [!WARNING] +> The main branch of this repository shows the latest v2.0.0 draft, which is a +> work in progress and not considered the "mainline" KDL yet. Most KDL +> implementations in the wild are based on the [v1.0.0 +> spec](https://github.com/kdl-org/kdl/tree/1.0.0) instead, so you may want to +> refer to that if you're using KDL today. + KDL is a small, pleasant document language with XML-like node semantics that looks like you're invoking a bunch of CLI commands! It's meant to be used both as a serialization format and a configuration language, much like JSON, YAML, @@ -44,22 +51,23 @@ There's a living [specification](SPEC.md), as well as various [implementations](#implementations). You can also check out the [FAQ](#faq) to answer all your burning questions! +The current version of the KDL spec is `2.0.0-draft.5`. + In addition to a spec for KDL itself, there are also standard specs for [a KDL Query Language](QUERY-SPEC.md) based on CSS selectors, and [a KDL Schema Language](SCHEMA-SPEC.md) loosely based on JSON Schema. -The language is based on [SDLang](https://sdlang.org), with a number of -modifications and clarifications on its syntax and behavior. - -The current version of the KDL spec is `2.0.0-draft.5`. +The language is based on [SDLang](https://sdlang.org), with a [number of +modifications and clarifications on its syntax and behavior](#why-not-sdlang). [Play with it in your browser!](https://kdl-play.danini.dev/) ## Design and Discussion -KDL is still extremely new, and discussion about the format should happen over -on the [discussions page](https://github.com/kdl-org/kdl/discussions). Feel -free to jump in and give us your 2 cents! +KDL 2.0 design is still in progress. Discussions and questions about the format +should happen over on the [discussions +page](https://github.com/kdl-org/kdl/discussions). Feel free to jump in and give +us your 2 cents! ## Implementations @@ -261,6 +269,8 @@ mynode /-commented "not commented" /-key=value /-{ a b } +// The above is equivalent to: +mynode "not commented" ``` ### Type Annotations @@ -332,6 +342,7 @@ Same as "cuddle". Because nothing out there felt quite right. The closest one I found was SDLang, but that had some design choices I disagreed with. + #### Ok, then, why not SDLang? SDLang is an excellent base, but I wanted some details ironed out, and some @@ -339,18 +350,23 @@ things removed that only really made sense for SDLang's current use-cases, inclu some restrictions about data representation. KDL is very similar in many ways, except: * The grammar and expected semantics are [well-defined and specified](SPEC.md). -* There is only one "number" type. KDL does not prescribe representations. +* There is only one "number" type. KDL does not prescribe representations, but + does have keywords for NaN, infinity, and negative infinity if decimal numbers + are intended to be represtented as IEEE754 floats. * Slashdash (`/-`) comments are great and useful! -* I am not interested in having first-class date types, and SDLang's are very - non-standard. +* Quoteless "identifier" strings are supported. (e.g. `node foo=bar`, vs `node foo="bar"`) +* KDL does not have first-class date or binary data types. Instead, it + supports arbitrary type annotations for any custom data type you might need: + `(date)"2021-02-03"`, `(binary)"deadbeefbadc0ffee"`. * Values and properties can be interspersed with each other, rather than one having to follow the other. -* KDL does not have a first-class binary data type. Just use strings with base64. -* All strings in KDL are multi-line, and raw strings are written with - Rust-style syntax (`r"foo"`), instead of backticks. -* KDL identifiers can use UTF-8 and are much more lax about symbols than SDLang. +* All strings in KDL are multi-line, and multi-line strings are automatically dedented to match their closing quote's indentation level. +* Raw strings are written with `#` (`#"foo\bar"#`), instead of backticks. +* KDL identifiers can use UTF-8 and are more lax about symbols than SDLang. * KDL does not support "anonymous" nodes. -* Instead, KDL supports arbitrary identifiers for node names and attribute +* Namespaces are not supported, but `:` is a legal identifier character, and applications + can choose to implement namespaces as they see fit. +* KDL supports arbitrary identifiers for node names and attribute names, meaning you can use arbitrary strings for those: `"123" "value"=1` is a valid node, for example. This makes it easier to use KDL for representing arbitrary key/value pairs.