From 608e599153102421fd195f5d343b083476ecf5bd Mon Sep 17 00:00:00 2001 From: Pierre-Antoine Champin Date: Wed, 8 May 2024 19:29:31 +0200 Subject: [PATCH] add publication snapshot for REC --- publication-snapshots/REC/Overview.html | 6218 +++++++++++++++++ publication-snapshots/REC/ca-overview.svg | 1 + .../REC/dataset-bn-graph.svg | 1 + publication-snapshots/REC/double-circle.svg | 1 + publication-snapshots/REC/duplicate-paths.svg | 1 + publication-snapshots/REC/shared-hashes.svg | 1 + publication-snapshots/REC/unique-hashes.svg | 1 + 7 files changed, 6224 insertions(+) create mode 100644 publication-snapshots/REC/Overview.html create mode 100644 publication-snapshots/REC/ca-overview.svg create mode 100644 publication-snapshots/REC/dataset-bn-graph.svg create mode 100644 publication-snapshots/REC/double-circle.svg create mode 100644 publication-snapshots/REC/duplicate-paths.svg create mode 100644 publication-snapshots/REC/shared-hashes.svg create mode 100644 publication-snapshots/REC/unique-hashes.svg diff --git a/publication-snapshots/REC/Overview.html b/publication-snapshots/REC/Overview.html new file mode 100644 index 0000000..6ad423f --- /dev/null +++ b/publication-snapshots/REC/Overview.html @@ -0,0 +1,6218 @@ + + + + + + + + + +RDF Dataset Canonicalization + + + + + + + + + + + + + + + + + + +
+

+

RDF Dataset Canonicalization

A Standard RDF Dataset Canonicalization Algorithm

+

W3C Recommendation

+
+ More details about this document +
+
This version:
+ https://www.w3.org/TR/2024/REC-rdf-canon-20240521/ +
+
Latest published version:
+ https://www.w3.org/TR/rdf-canon/ +
+
Latest editor's draft:
https://w3c.github.io/rdf-canon/spec/
+
History:
+ https://www.w3.org/standards/history/rdf-canon/ +
+ Commit history +
+
Test suite:
https://w3c.github.io/rdf-canon/tests/
+
Implementation report:
+ https://w3c.github.io/rdf-canon/reports/ +
+ + + +
Editors:
+ Dave Longley (Digital Bazaar) +
+ Gregg Kellogg +
+ Dan Yamamoto +
+
+ Former editor: +
+ Manu Sporny (Digital Bazaar) (CG Report) +
+
Author:
+ Dave Longley (Digital Bazaar) +
+
Feedback:
+ GitHub w3c/rdf-canon + (pull requests, + new issue, + open issues) +
public-rch-wg@w3.org with subject line [rdf-canon] … message topic … (archives)
+
Errata:
Errata exists.
+ +
+
+

+ See also + + translations. +

+ + +
+
+

Abstract

+

RDF [RDF11-CONCEPTS] describes a graph-based data model for making claims + about the world and provides the foundation for reasoning upon that graph + of information. At times, it becomes necessary to compare the differences + between sets of graphs, digitally sign them, or generate short identifiers + for graphs via hashing algorithms. This document outlines an algorithm for + normalizing RDF datasets such that these operations can be + performed.

+
+ +

Status of This Document

This section describes the status of this + document at the time of its publication. A list of current W3C + publications and the latest revision of this technical report can be found + in the W3C technical reports index at + https://www.w3.org/TR/.

+

This document describes the RDFC-1.0 algorithm for canonicalizing + RDF datasets, which was the input from the + W3C Credentials Community Group + published as [CCG-RDC-FINAL].

+ +

At the time of publication, [RDF11-CONCEPTS] is the most recent recommendation + defining RDF datasets and [N-QUADS], + however work on an updated specification + is ongoing within the W3C RDF-star Working Group. + Some dependencies from relevant updated specifications are provided + normatively in this specification with the expectation + that a future update to this specification will replace those with normative + references to updated RDF specifications.

+

+ This document was published by the RDF Dataset Canonicalization and Hash Working Group as + a Recommendation using the + Recommendation track. +

+ W3C recommends the wide deployment of this specification as a standard for + the Web. +

+ A W3C Recommendation is a specification that, after extensive + consensus-building, is endorsed by + W3C and its Members, and + has commitments from Working Group members to + royalty-free licensing + for implementations. + Future updates to this Recommendation may incorporate + new features. +

+ + This document was produced by a group + operating under the + W3C Patent + Policy. + + + W3C maintains a + public list of any patent disclosures + made in connection with the deliverables of + the group; that page also includes + instructions for disclosing a patent. An individual who has actual + knowledge of a patent which the individual believes contains + Essential Claim(s) + must disclose the information in accordance with + section 6 of the W3C Patent Policy. + +

+ This document is governed by the + 03 November 2023 W3C Process Document. +

+ +

1. Introduction

This section is non-normative.

+ + +

When data scientists discuss canonicalization, + they do so in the context of achieving a particular set of goals. + Since the same information may sometimes be expressed in a variety of different ways, + it often becomes necessary to transform each of these + different ways into a single, standard representation. + With a standard representation, the differences between + two different sets of data can be easily determined, + a cryptographically-strong hash identifier can be generated for a particular + set of data, + and a particular set of data may be digitally-signed for later + verification.

+ +

In particular, this specification is about normalizing + RDF datasets, which are collections of graphs. Since + a directed graph can express the same information in more than one + way, it requires canonicalization to achieve the aforementioned goals + and any others that may arise via serendipity.

+ +

Most RDF datasets can be canonicalized fairly quickly, in terms + of algorithmic time complexity. However, those that contain nodes that do + not have globally unique identifiers pose a greater challenge. Normalizing + these datasets presents the graph isomorphism problem, a + problem that is believed to be difficult to solve quickly in the worst + case. Fortunately, existing real world data is rarely, if ever, modeled in + a way that manifests as the worst case and new data can be modeled to avoid + it. In fact, software systems that detect a problematic dataset + (see 7.1 Dataset Poisoning) can choose + to assume it's an attempted denial of service attack, rather than a + real input, and abort.

+ +

This document outlines an algorithm for generating a canonical + serialization of an RDF dataset given an RDF dataset as input. + The algorithm is called the + RDF Canonicalization algorithm version 1.0 or + RDFC-1.0.

+ +
Note
+

RDF 1.1 Concepts and Abstract Syntax [RDF11-CONCEPTS] lacks clarity on the representation of + language-tagged strings, + where language tags of the form xx-YY + are treated as being case insensitive. Implementations might represent language tags + using all lower case in the form xx-yy, + retain the original representation xx-YY, + or use [BCP47] formatting conventions, + leading to different canonical forms, and therefore, different hashed values.

+
    +
  • The Canonicalization algorithm is based on the RDF 1.1 definition, + in the sense that the language tag xx-YY + is case insensitive, which might lead to different canonicalizations if the user is not aware of this problem.
  • +
  • User communities ought to agree to use lower case + language tags, + while being aware that some implementations might normalize language tags, + affecting hash values.
  • +
  • Future evolution of RDF might regulate this issue, which RDF environments might have to adapt to, + and this might lead to an update of RDFC-1.0.
  • +
+ +
+
Note

See B. URDNA2015 + for a comparison with the version of the algorithm published + in RDF Dataset Canonicalization [CCG-RDC-FINAL].

+ +

1.1 Uses of Dataset Canonicalization

+ +

There are different use cases where graph or dataset canonicalization are important:

+
    +
  • Determining if one serialization is isomorphic to another.
  • +
  • Digital signing of graphs (datasets) independent of serialization or format.
  • +
  • Comparing two graphs (datasets) to find differences.
  • +
  • Communicating change sets when remotely updating an RDF source.
  • +
+

A canonicalization algorithm is necessary, but not necessarily sufficient, to handle many of these use cases. The use of blank nodes in RDF graphs and datasets has a long history and creates inevitable complexities. Blank nodes are used for different purposes:

+
    +
  • when a well known identifier for a node is not known, or the author of a document chooses not to unambiguously name that node,
  • +
  • when a node is used to stitch together parts of a graph and the nodes themselves are not interesting (e.g., RDF Collections in [RDF11-MT]),
  • +
  • when someone is trying to create an intentionally difficult graph topology.
  • +
+

Furthermore, + RDF semantics dictate that deserializing an RDF document + results in the creation of unique blank nodes, + unless it can be determined that on each occasion, + the blank node identifies the same resource. + This is due to the fact that blank node identifiers + are an aspect of a concrete RDF syntax + and are not intended to be persistent or portable. + Within the abstract RDF model, + blank nodes do not have identifiers + (although some + RDF store + implementations may use stable identifiers and may choose to make them portable). + See Blank Nodes + in [RDF11-CONCEPTS] for more information.

+ +

RDF does have a provision for allowing blank nodes + to be published in an externally identifiable way through the use of + Skolem IRIs, + which allow a given RDF store to replace the use of blank nodes + in a concrete syntax with IRIs, + which then serve to repeatably identify that blank node within that particular RDF store; + however, this is not generally useful for talking about the + same graph in different RDF stores, + or other concrete representations. + In any case, a stable blank node identifier defined for one + RDF store or serialization is arbitrary, + and typically not relatable to the context within which it is used.

+ +

This specification defines an algorithm for creating stable + blank node identifiers repeatably for different serializations + possibly using individualized blank node identifiers + of the same RDF graph (dataset) by grounding each blank node + through the nodes to which it is connected. + As a result, a graph signature can be obtained by hashing a canonical serialization + of the resulting canonicalized dataset, + allowing for the isomorphism and digital signing use cases. + This specification does not define such a graph signature.

+ +

As blank node identifiers can be stable even with other changes to a graph (dataset), + in some cases it is possible to compute the difference between two graphs (datasets), + for example if changes are made only to ground triples, + or if new blank nodes are introduced which do not create an automorphic confusion + with other existing blank nodes. + If any information which would change the generated blank node identifier, + a resulting diff might indicate a greater set of changes than actually exists. + Additionally, if the starting dataset is an N-Quads document, + it may be possible to correlate the original blank node identifiers + used within that N-Quads document with those issued in the + canonicalized dataset.

+ +
Note

Although alternative hash algorithms might be used + with this specification, + applications ought to carefully weigh the advantages + and disadvantages of using an alternative hash function. + This is the case, in particular, for any representation of the canonical n-quads form + or issued identifiers map + that does not identify the associated hash algorithm. Any use case + that requires reproduction of the same output is expected to + unequivocally express or communicate the internal + hash algorithm that was used when generating + the canonical n-quads form. +

+
+ +

1.2 How to Read this Document

+ + +

This document is a detailed specification for an RDF dataset + canonicalization algorithm. The document is primarily intended for the + following audiences:

+ +
    +
  • Software developers that want to implement an RDF dataset + canonicalization algorithm.
  • +
  • Masochists.
  • +
+ +

To understand the basics in this specification you must be familiar with + basic RDF concepts [RDF11-CONCEPTS]. A working knowledge of + graph theory and + graph isomorphism + is also recommended.

+
+ +

1.3 Typographical conventions

This section is non-normative.

+ +

The following typographic conventions are used in this specification:

+ +
+
markup
+ Markup (elements, attributes, properties), + machine processable values (string, characters, media types), + property names, + and file names are in red-orange monospace font.
+
variable
+ A variable in pseudo-code or in an algorithm description is italicized.
+
definition
+ A definition of a term, to be used elsewhere in this or other specifications, + is italicized and in bold.
+
definition reference
+ A reference to a definition in this document + is underlined and is also an active link to the definition itself.
+
markup definition reference
+ References to a definition in this document, + when the reference itself is also a markup, is underlined, + in a red-orange monospace font, and is also an active link to the definition itself.
+
external definition reference
+ A reference to a definition in another document + is underlined and italicized, and is also an active link to the definition itself.
+
markup external definition reference
+ A reference to a definition in another document, + when the reference itself is also a markup, + is underlined and italicized in a red-orange monospace font, + and is also an active link to the definition itself.
+
hyperlink
+ A hyperlink is underlined and in blue.
+
[reference]
+ A document reference (normative or informative) is enclosed in square brackets + and links to the references section.
+
Explanation
+ An expandable area to find a more detailed, non-normative explanation of a + particular algorithmic step. +
+ Explanation +

This area would provide more information about the step involved.

+
+
+
Logging
+ An expandable area to find suggestions for implementations to log + information about processing, + which may be useful in comparing with other implementations, + or with logs provided with each test case. +
+ Logging +

For example, the following output snippet might + describe the operation of an implementation using the [YAML] format.

+
ca:
+  ca2:
+    bn_to_quads:
+      e0:
+        - _:e0 <http://example.com/#p1> _:e1 .
+      e1:
+        ...
+  ca3:
+  - identifier: e0
+    h1dq:
+      nquads:
+        - _:a <http://example.com/#p1> _:z .
+
+
+
+ +
Note

Notes are in light green boxes with a green left border and with a "Note" header in green. + Notes are always informative.

+ +
+
+ Example 1 +
Examples are in light khaki boxes, with khaki left border,
+and with a numbered "Example" header in khaki.
+Examples are always informative. The content of the example is in monospace font and may be 
+syntax colored.
+
+Examples may have tabbed navigation buttons
+to show the results of transforming an example into other representations.
+
+Code examples are generally given in a Turtle or TriG format for brevity,
+where each line represents a single triple or quad.
+Additionally, have the following implied directives:
+
+BASE <http://example.com/>
+PREFIX : <#>
+
+Following the Turtle/TriG syntax rules, blank nodes always appear in the 
+`_:xyz` format.
+
+
+
+
+ +

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

+ The key words MUST, MUST NOT, and SHOULD in this document + are to be interpreted as described in + BCP 14 + [RFC2119] [RFC8174] + when, and only when, they appear in all capitals, as shown here. +

+

A conforming processor is a system which can generate + the canonical n-quads form of an input dataset + consistent with the algorithms defined in this specification.

+ +

The algorithms in this specification are normative, + because to consistently reproduce the same canonical identifiers, + implementations MUST strictly conform to the steps outlined in these algorithms.

+ +
Note

Implementers can partially check their level of conformance with + this specification by successfully passing the test cases of the + RDF Dataset Canonicalization test suite. + Note, however, that passing all the tests in the test + suite does not imply complete conformance to this specification. It only implies + that the implementation conforms to the aspects tested by the test suite.

+
+ +

3. Terminology

+ + +

3.1 Terms defined by this specification

+ +
canonical n-quads form
+
+ The canonicalized representation of a quad is defined in A. A Canonical form of N-Quads. + A quad in canonical n-quads form represents a graph name, if present, in the same manner as + a subject, and each quad is terminated with a single LF (line feed, code point U+000A). +
+
canonicalization function
+
A canonicalization function maps RDF datasets + into isomorphic datasets [RDF11-CONCEPTS]. + Two datasets produce the same canonical result if and only if they are isomorphic. + The RDFC-1.0 algorithm implements a canonicalization function. + Some datasets may be constructed to prevent this algorithm from + terminating in a reasonable amount of time (see 7.1 Dataset Poisoning), + in which case the algorithm can be considered to be + a partial canonicalization function. + +
canonicalized dataset
+
A canonicalized dataset is the combination of the following: + + A concrete serialization of a canonicalized dataset MUST label + all blank nodes using the canonical blank node identifiers. +
+
gossip path
+
A particular enumeration of every incident mention emanating + from a blank node. This recursively includes transitively related + mentions until any named node or blank node already labeled by + a particular identifier issuer is reached. Gossip paths are + encoded and operated on in the RDFC-1.0 algorithm as strings. (See + 4.8 Hash N-Degree Quads for more information + on the construction of gossip paths.) +
+
hash
+
The lowercase, hexadecimal representation of a message digest.
+
hash algorithm
+
The default hash algorithm used by RDFC-1.0, namely, SHA-256 [FIPS-180-4]. +

Implementations MUST support a parameter to define the hash algorithm, + MUST support SHA-256 and SHA-384 [FIPS-180-4], + and SHOULD support the ability to specify other hash algorithms. + Using a different hash algorithm will generally result in different output than + using the default.

+ +
Note

There is no expectation that the default hash algorithm + will also be used by any application creating a hash digest of the + canonical N-Quads result.

+
+
identifier issuer
+
An identifier issuer is used to issue new blank node identifiers. It + maintains a + blank node identifier issuer state.
+
input blank node identifier map
+
Records any blank node identifiers already assigned to the + input dataset. + If the input dataset is provided as an N-Quads document, + the map relates blank nodes in the abstract input dataset + to the blank node identifiers used within the N-Quads document, + otherwise, identifiers are assigned arbitrarily for + each blank node in the input dataset not previously identified. +
Note
Implementations or environments might deal with blank + node identifiers more directly; for example, some implementations might + retain blank node identifiers in the parsed or abstract dataset. Implementations + are expected to reuse these to enable usable mappings between input blank node + identifiers and output blank node identifiers outside of the algorithm.
+
+
input dataset
+
The abstract RDF dataset that is provided as input to + the algorithm.
+
mention
+
+ A node is mentioned in a quad + if it is a component of that quad, + as a subject, predicate, object, or graph name.
+
mention set
+
The set of all quads in a dataset + that mention a node n is called the mention set of n, + denoted Qn.
+
quad
+
A tuple composed of subject, predicate, object, and graph name. + This is a generalization of an RDF triple along with a graph name. +
+
+
+ +

3.2 Terms defined by cited specifications

+ +
blank node
+
A blank node + as specified by [RDF11-CONCEPTS]. In short, it is a node in a graph that is + neither an IRI, nor a + literal.
+
blank node identifier
+
A blank node identifier + as specified by [RDF11-CONCEPTS]. In short, it is a string that begins + with _: that is used as an identifier for a + blank node. Blank node identifiers + are typically implementation-specific local identifiers; this document + specifies an algorithm for deterministically specifying them.
+
+ Concrete syntaxes, like [Turtle] or [N-Quads], prepend blank node identifiers with the _: string + to differentiate them from other nodes in the graph. This affects the + canonicalization algorithm, which is based on calculating a hash over the representations of quads in this format. +
+
default graph
+
The default graph + as specified by [RDF11-CONCEPTS].
+
graph name
+
A graph name + as specified by [RDF11-CONCEPTS].
+
IRI
+
An IRI (Internationalized Resource Identifier) is a string that conforms to the syntax + defined in [RFC3987].
+
object
+
An object + as specified by [RDF11-CONCEPTS].
+
predicate
+
A predicate + as specified by [RDF11-CONCEPTS].
+
RDF dataset
+
A dataset + as specified by [RDF11-CONCEPTS]. + For the purposes of this specification, an RDF dataset + is considered to be a set of quads
+
RDF graph
+
An RDF graph + as specified by [RDF11-CONCEPTS].
+
RDF triple
+
A triple + as specified by [RDF11-CONCEPTS].
+
string
+ A string is a sequence of zero or more Unicode characters.
+
subject
+
A subject + as specified by [RDF11-CONCEPTS].
+
true and false
+ Values that are used to express one of two possible boolean states.
+
Unicode code point order
+
This refers to determining the order of two Unicode strings (A and B), + using Unicode Codepoint Collation, + as defined in [XPATH-FUNCTIONS], + which defines a + total ordering + of strings comparing code points. + Note that for UTF-8 encoded strings, comparing the byte sequences gives the same result as code point order. +
+
+
+ +

4. Canonicalization

+ + +

Canonicalization is the process of transforming an + input dataset to its serialized canonical form. + That is, any two input datasets that contain the same information, + regardless of their arrangement, + will be transformed into the same serialized canonical form. + The problem requires directed + graphs to be deterministically ordered into sets of nodes and edges. This + is easy to do when all of the nodes have globally-unique identifiers, but + can be difficult to do when some of the nodes do not. Any nodes without + globally-unique identifiers must be issued deterministic identifiers.

+ +
Note

+ This specification defines a canonicalized dataset to include stable identifiers for blank nodes, + practical uses of which will always generate a canonical serialization of such a dataset.

+ +

In time, there may be more than one canonicalization algorithm and, + therefore, for identification purposes, this algorithm is named the + "RDF Canonicalization algorithm version 1.0" + (RDFC-1.0).

+ +

Figure 1 provides an overview of RDFC-1.0, + with steps 1 through 7 corresponding to the various steps described in + 4.4.3 Algorithm.

+ +
+ + +

+ The image represents an overview of the RDFC-1.0 algorithm. + The Input Document is deserialized into the Input Dataset + and Input Blank Node Identifier Map. + Canonicalization steps 1-6 are executed resulting in + the Canonicalized Dataset including the + Input Blank Node Identifier Map and Issued Identifiers Map. + Step 7 of the Canonicalization algorithm creates + the canonical n-quads form of the Canonicalized Dataset.

+
+
Figure 1 An illustrated overview of the RDFC-1.0 algorithm.
+ Image available in + + SVG + .
+
+ +

4.1 Overview

This section is non-normative.

+ + +

To determine a canonical labeling, RDFC-1.0 considers the + information connected to each blank node. + Nodes with unique first degree information can immediately be issued a canonical identifier + via the Issue Identifier algorithm. + When a node has non-unique first degree information, + it is necessary to determine all information that is transitively connected + to it throughout the entire dataset. + 4.6 Hash First Degree Quads defines a + node’s first degree information via its first degree hash.

+ +

Hashes are computed from the information of each blank node. + These hashes encode the mentions incident to each blank node. + The hash of a string s, is the lower-case, + hexadecimal representation of the result of passing s + through a cryptographic hash function. + By default, RDFC-1.0 uses the SHA-256 hash algorithm [FIPS-180-4].

+ +
Note

The "degree" terminology is used within this specification + as colloquial way of describing + the eccentricity or + radius + of any two nodes within a dataset. + This concept is also related to "degrees of separation", + as in, for example, "six degrees of separation". + Nodes with unique first degree information can be considered nodes with a radius of one.

+
+ +

4.2 Canonicalization State

+ + +

When performing the steps required by the canonicalization algorithm, + it is helpful to track state in a data structure called the + canonicalization state. The information contained in the + canonicalization state is described below.

+ +
+
blank node to quads map
+
A map that relates a blank node identifier to + the quads in which they appear in the + input dataset.
+
hash to blank nodes map
+
A map that relates a hash to a + list of + blank node identifiers.
+
canonical issuer
+
An identifier issuer, initialized with the + prefix c14n (short for canonicalization), for issuing canonical + blank node identifiers. +
Note
+ Mapping all blank nodes to use this + identifier spec means that an RDF dataset composed of two + different RDF graphs will issue different + identifiers than that for the graphs taken independently. This may + happen anyway, due to automorphisms, + or overlapping statements, but an identifier based on the resulting + hash along with an issue sequence number specific to that hash would + stand a better chance of surviving such minor changes, and allow the + resulting information to be useful for RDF Diff. +
+
+
+
+ +

4.3 Blank Node Identifier Issuer State

+ + +

The canonicalization algorithm issues identifiers to blank nodes. + The Issue Identifier algorithm uses an + identifier issuer to accomplish this task. + The information an identifier issuer needs to keep track of is described + below.

+ +
+
identifier prefix
+
The identifier prefix is a string that is used at the beginning of an + blank node identifier. It should be initialized to a + string that is specified by the canonicalization algorithm. When + generating a new blank node identifier, the prefix + is concatenated with a identifier counter. For example, + c14n is a proper initial value for the + identifier prefix that would produce + blank node identifiers like c14n1.
+
identifier counter
+
A counter that is appended to the identifier prefix to + create an blank node identifier. It is initialized to + 0.
+
issued identifiers map
+
An ordered map that relates blank node identifiers to issued identifiers, + to prevent issuance of more than one new identifier per existing identifier, + and to allow blank nodes to + be assigned identifiers some time after issuance.
+
+
+ +

4.4 Canonicalization Algorithm

+ + +

The canonicalization algorithm converts an input dataset + into a canonicalized dataset or raises an error if + the input dataset is determined to be overly complex. + This algorithm will assign + deterministic identifiers to any blank nodes in the + input dataset.

+ +

4.4.1 Overview

This section is non-normative.

+ + +

RDFC-1.0 canonically labels an RDF dataset + by assigning each blank node a canonical identifier. + In RDFC-1.0, an RDF dataset D + is represented as a set of quads of the form < s, p, o, g > + where the graph component g is empty if and only if the + triple < s, p, o > is in the default graph. + It is expected that, for two RDF datasets, + RDFC-1.0 returns the same canonically labeled list of quads + if and only if the two datasets are isomorphic (i.e., the same modulo blank node identifiers). +

+ +

RDFC-1.0 consists of several sub-algorithms. + These sub-algorithms are introduced in the following sub-sections. + First, we give a high level summary of RDFC-1.0.

+ +
    +
  1. Initialization. + Initialize the state needed for the rest of the algorithm + using 4.2 Canonicalization State. + Also initialize the canonicalized dataset using the input dataset + (which remains immutable) + the input blank node identifier map + (retaining blank node identifiers from the input if possible, otherwise assigning them arbitrarily); + the issued identifiers map from the canonical issuer is added upon completion of the algorithm.
  2. +
  3. Compute first degree hashes. + Compute the first degree hash for each blank node in the dataset using 4.6 Hash First Degree Quads.
  4. +
  5. Canonically label unique nodes. + Assign canonical identifiers via 4.5 Issue Identifier Algorithm, + in Unicode code point order, to each blank node whose first degree hash is unique.
  6. +
  7. Compute N-degree hashes for non-unique nodes. + For each repeated first degree hash (proceeding in Unicode code point order), + compute the N-degree hash via 4.8 Hash N-Degree Quads + of every unlabeled blank node that corresponds to the given repeated hash.
  8. +
  9. Canonically label remaining nodes. + In Unicode code point order of the N-degree hashes, + issue canonical identifiers to each corresponding blank node using + 4.5 Issue Identifier Algorithm. + If more than one node produces the same N-degree hash, + the order in which these nodes receive a canonical identifier does not matter.
  10. +
  11. Finish. + Return the serialized canonical form of the canonicalized dataset. + Alternatively, return the canonicalized dataset containing + the input blank node identifier map and issued identifiers map.
  12. +
+
+ +

4.4.2 Examples

This section is non-normative.

+ + + + + +
+ +

4.4.3 Algorithm

+ + +

The following algorithm will run with a minimal number of iterations in each step + for typical input datasets. + In some extreme cases, the algorithm can behave poorly, particularly in Step 5. + Implementations MUST defend against potential denial-of-service attacks + by raising suitable exceptions and terminating early. + See 7.1 Dataset Poisoning for further information.

+ +
Note

Implementations can consider placing limits on the number of + calls to 4.8 Hash N-Degree Quads based on the number + of blank nodes in the hash to blank nodes map. + For most typical datasets, more than a couple + of iterations on 4.8 Hash N-Degree Quads per blank node would be unusual.

+ +
    +
  1. Create the canonicalization state. + If the input dataset is an N-Quads document, + parse that document into a dataset in the canonicalized dataset, + retaining any blank node identifiers used within that document + in the input blank node identifier map; + otherwise arbitrary identifiers are assigned for each + blank node. +
    + Explanation +

    This has the effect of initializing the + blank node to quads map, + and the hash to blank nodes map, + as well as instantiating a new canonical issuer.

    +

    After this algorithm completes, + the input blank node identifier map state + and canonical issuer may be used to + correlate blank nodes used in the + input dataset with both their original identifiers, + and associated canonical identifiers.

    +
    +
  2. +
  3. For every quad Q in input dataset: +
      +
    1. For each blank node that is a component of Q, + add a reference to Q from the + map entry for the + blank node identifier identifier + in the blank node to quads map, + creating a new entry if necessary, + using the identifier for the blank node found in the + input blank node identifier map. +
      + Explanation +

      This establishes the blank node to quads map, + relating each blank node with the set of quads + of which it is a component, + via the map for each blank node in the input dataset to its assigned identifier.

      +
      Note

      + Literal components of + quads are not subject to any normalization. + As noted in + Section 3.3 + of [RDF11-CONCEPTS], + literal term equality + is based on the + lexical form, + rather than the literal value, + so two literals "01"^^xsd:integer and "1"^^xsd:integer are treated as distinct resources. +

      +
      +
    2. +
    +
    + Logging +

    Log the state of the blank node to quads map:

    +
    # Blank node to quads map for unique hashes example
    +ca:
    +  log point: Entering the canonicalization function (4.4.3).
    +  ca.2:
    +    log point: Extract quads for each bnode (4.4.3 (2)).
    +    Bnode to quads:
    +      e0:
    +        - <http://example.com/#p> <http://example.com/#q> _:e0 .
    +        - _:e0 <http://example.com/#s> <http://example.com/#u> .
    +      e1:
    +        - <http://example.com/#p> <http://example.com/#r> _:e1 .
    +        - _:e1 <http://example.com/#t> <http://example.com/#u> .
    +  ...
    +
    +
  4. +
  5. For each key n + in the blank node to quads map: +
    + Explanation +

    This step creates a hash for every blank node in the input document. + Some blank nodes will lead to a unique hash, + while other blank nodes may share a common hash.

    +
    +
      +
    1. Create a hash, hf(n), + for n according to the + Hash First Degree Quads algorithm.
    2. +
    3. Append n to the value associated to hf(n) in + hash to blank nodes map, + creating a new entry if necessary.
    4. +
    +
    + Logging +

    Log the results from the Hash First Degree Quads algorithm.

    +
    # First degree hashes for unique hashes example
    +ca:
    +  ...
    +  ca.3:
    +    log point: Calculated first degree hashes (4.4.3 (3)).
    +    with:
    +      - identifier: e0
    +        h1dq:
    +          log point: Hash First Degree Quads function (4.6.3).
    +          nquads:
    +            - <http://example.com/#p> <http://example.com/#q> _:a .
    +            - _:a <http://example.com/#s> <http://example.com/#u> .
    +          hash: 21d1dd5ba21f3dee9d76c0c00c260fa6f5d5d65315099e553026f4828d0dc77a
    +      - identifier: e1
    +        h1dq:
    +          log point: Hash First Degree Quads function (4.6.3).
    +          nquads:
    +            - <http://example.com/#p> <http://example.com/#r> _:a .
    +            - _:a <http://example.com/#t> <http://example.com/#u> .
    +          hash: 6fa0b9bdb376852b5743ff39ca4cbf7ea14d34966b2828478fbf222e7c764473
    +  ...
    +
    +
  6. +
  7. For each hash to identifier list + map entry in + hash to blank nodes map, code point ordered by hash: +
    + Explanation +

    This step establishes the canonical identifier for blank nodes having + a unique hash, which are recorded in the canonical issuer.

    +
    +
      +
    1. If identifier list has more than one entry, + continue to the next mapping.
    2. +
    3. Use the + Issue Identifier algorithm, + passing canonical issuer and the + single blank node identifier, identifier in + identifier list to issue a + canonical replacement identifier for identifier.
    4. +
    5. Remove the map entry for hash from the + hash to blank nodes map.
    6. +
    +
    + Logging +

    Log the assigned canonical identifiers.

    +
    # Assigned canonical identifiers for shared hashes example
    +ca:
    +  ...
    +  ca.4:
    +    log point: Create canonical replacements for hashes mapping to a single node (4.4.3 (4)).
    +    with:
    +      - identifier: e2
    +        hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
    +        canonical label: c14n0
    +      - identifier: e3
    +        hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
    +        canonical label: c14n1
    +  ...
    +
    +
  8. +
  9. For each hash to identifier list + map entry in + hash to blank nodes map, code point ordered by + hash: +
    + Explanation +

    This step establishes the canonical identifier for blank nodes having + a shared hash. + This is done by creating unique blank node identifiers for all + blank nodes traversed by the Hash N-Degree Quads algorithm, + running through each blank node without a canonical identifier in the order + of the hashes established in the previous step.

    +
    +
    + Logging +

    Log hash and identifier list for this iteration.

    +
    # Hash and Identifier List for each iteration of step 5 using shared hashes example
    +ca:
    +  ...
    +  ca.5:
    +    log point: Calculate hashes for identifiers with shared hashes (4.4.3 (5)).
    +    with:
    +      - hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
    +        identifier list: [ "e0", "e1"]
    +    ...
    +  ...
    +
    +
      +
    1. Create hash path list where each item will be a result + of running the + Hash N-Degree Quads algorithm. +
      + Explanation +

      This list will be populated in step 5.2, and will establish an order for those blank nodes + sharing a common first-degree hash.

      +
      +
    2. +
    3. For each blank node identifier + n in identifier list: +
        +
      1. If a canonical identifier has already been issued for + n, continue to the next + blank node identifier.
      2. +
      3. Create temporary issuer, an + identifier issuer initialized with the prefix + b.
      4. +
      5. Use the + Issue Identifier algorithm, + passing temporary issuer and n, to + issue a new temporary blank node identifier bn + to n.
      6. +
      7. Run the + Hash N-Degree Quads algorithm, + passing the canonicalization state, + n for identifier, and + temporary issuer, + appending the + result to the hash path list. +
        + Logging +

        Include logs for each call to Hash N-Degree Quads algorithm.

        +
        # Logs from calls to Hash N-Degree Quads algorithm for shared hashes example
        +ca:
        +  ...
        +  ca.5:
        +    log point: Calculate hashes for identifiers with shared hashes (4.4.3 (5)).
        +    with:
        +      - hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
        +        identifier list: [ "e0", "e1"]
        +        ca.5.2:
        +          log point: Calculate hashes for identifiers with shared hashes (4.4.3 (5.2)).
        +          with:
        +            - identifier: e0
        +              hndq:
        +                log point: Hash N-Degree Quads function (4.8.3).
        +                identifier: e0
        +                issuer: {e0: b0}
        +                ...
        +            ...
        +        ...
        +  ...
        +
        +
      8. +
      +
    4. +
    5. For each result in the hash path list, + code point ordered by the hash in result: +
      + Explanation +

      The previous step created temporary identifiers for the + blank nodes sharing a common first degree hash, + which is now used to generate their canonical identifiers.

      +
      +
        +
      1. For each blank node identifier, + existing identifier, that was issued a temporary + identifier by identifier issuer in result, + issue a canonical identifier, + in the same order, + using the Issue Identifier algorithm, + passing canonical issuer and existing identifier. +
        + Explanation +

        In Step 5.2, + hash path list was created with an ordered + set of results. + Each result contained a temporary issuer + which recorded temporary identifiers associated with + a particular blank node identifier in + identifier list. + This step processes each returned temporary issuer, + in order, and allocates canonical identifiers + to the temporary identifier mappings contained + within each temporary issuer, + creating a full order on the remaining blank nodes + with unissued canonical identifiers. +

        +
        +
      2. +
      +
      + Logging +

      Log newly issued canonical identifiers.

      +
      # Newly issued canonical identifiers from step 5.3 for shared hashes example
      +ca:
      +  ...
      +  ca.5:
      +    log point: Calculate hashes for identifiers with shared hashes (4.4.3 (5)).
      +    with:
      +      - hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
      +        identifier list: [ "e0", "e1"]
      +        ...
      +        ca.5.3:
      +          log point: Canonical identifiers for temporary identifiers (4.4.3 (5.3)).
      +          issuer:
      +              - blank node: e1
      +                canonical identifier: c14n2
      +              - blank node: e0
      +                canonical identifier: c14n3
      +  ...
      +
      +
    6. +
    +
  10. +
  11. Add the issued identifiers map + from the canonical issuer to the + canonicalized dataset. +
    + Explanation +

    This step adds the issued identifiers map + from the canonical issuer to the + canonicalized dataset, the keys in the + issued identifiers map are map entries in the + input blank node identifier map.

    +
    +
    + Logging +

    Log the state of the canonical issuer at the completion of the algorithm.

    +
    # Canonical issuer state after step 6 for shared hashes example
    +ca:
    +  ...
    +  ca.6:
    +    log point: Issued identifiers map (4.4.3 (6)).
    +    issued identifiers map: {e2: c14n0, e3: c14n1, e1: c14n2, e0: c14n3}
    +
    +
  12. +
  13. Return the serialized canonical form + of the canonicalized dataset. + Upon request, alternatively (or additionally) return the + canonicalized dataset itself, which includes the + input blank node identifier map, and + issued identifiers map from the canonical issuer. +
    Note

    Technically speaking, one implementation + might return a canonicalized dataset that maps + particular blank nodes to different identifiers than another + implementation, however, this only occurs when there are + isomorphisms in the dataset such that a canonically serialized + expression of the dataset would appear the same from either + implementation.

    +
    + Explanation +

    The serialized canonical form is an N-Quads + document where the blank node identifiers are taken + from the canonical identifiers associated with each blank node.

    +

    The canonicalized dataset is composed of the original + input dataset, the input blank node identifier map, + containing identifiers for each blank node in the input dataset, + and the canonical issuer, + containing an issued identifiers map + mapping the identifiers in the input blank node identifier map + to their canonical identifiers. +

    +
    +
  14. +
+
+
+ +

4.5 Issue Identifier Algorithm

+ + +

This algorithm issues a new blank node identifier for + a given existing blank node identifier. It also updates + state information that tracks the order in which new + blank node identifiers were issued. The order of issuance is + important for canonically labeling blank nodes that are isomorphic + to others in the dataset.

+ +

4.5.1 Overview

+ + +

The algorithm maintains an issued identifiers map to + relate an existing blank node identifier from the input dataset + to a new blank node identifier using a given identifier prefix + (c14n) with new identifiers issued by appending an incrementing number. + For example, when called for a blank node identifier such as e3, + it might result in a issued identifier of c14n1.

+
+ +

4.5.2 Algorithm

+ + +

The algorithm takes an identifier issuer I and an + existing identifier as inputs. The output is a new + issued identifier. The steps of the algorithm are:

+ +
    +
  1. If there is a + map entry for existing identifier in + issued identifiers map of I, + return it.
  2. +
  3. Generate issued identifier by concatenating + identifier prefix with the string value of + identifier counter.
  4. +
  5. Add an entry + mapping existing identifier to issued identifier + to the issued identifiers map of I.
  6. +
  7. Increment identifier counter.
  8. +
  9. Return issued identifier.
  10. +
+
+
+ +

4.6 Hash First Degree Quads

+ + +

This algorithm calculates a hash for a given blank node + across the quads in a dataset in which that blank node + is a component. + If the hash uniquely identifies that blank node, + no further examination is necessary. + Otherwise, a hash will be created for the blank node using + the algorithm in 4.8 Hash N-Degree Quads + invoked via 4.4 Canonicalization Algorithm.

+ +

4.6.1 Overview

This section is non-normative.

+ + +

To determine whether the first degree information of a node n is unique, + a hash is assigned to its mention set, + Qn. + The first degree hash of a blank node n, + denoted hf(n), + is the hash that results from 4.6 Hash First Degree Quads + when passing n. + Nodes with unique first degree hashes have unique first degree information.

+ +

For consistency, blank node identifiers used in Qn + are replaced with placeholders in a canonical n-quads serialization of that quad. + Every blank node component is replaced with either a or z, + depending on if that component is n or not.

+ +

The resulting serialized quads are then code point ordered, + concatenated, and hashed. + This hash is the first degree hash of n, hf(n).

+
+ +

4.6.2 Examples

This section is non-normative.

+ + + + + +
+ +

4.6.3 Algorithm

+ + +

This algorithm takes the canonicalization state and a + reference blank node identifier as inputs.

+ +
    +
  1. Initialize nquads to an empty list. + It will be used to store quads in canonical n-quads form.
  2. +
  3. Get the list of quads quads + from the map entry for + reference blank node identifier in the + blank node to quads map.
  4. +
  5. For each quad quad in quads: +
      +
    1. Serialize the quad in canonical n-quads form with the + following special rule: +
        +
      1. If any component in quad is an + blank node, then serialize it using a + special identifier as follows: +
          +
        1. If the blank node's existing + blank node identifier matches the + reference blank node identifier then use the + blank node identifier a, + otherwise, use the blank node identifier + z.
        2. +
        +
      2. +
      +
    2. +
    +
  6. +
  7. Sort nquads in Unicode code point order.
  8. +
  9. Return the hash that results from passing the sorted + and concatenated nquads through the + hash algorithm. +
    + Logging +

    Log the inputs and result of running this algorithm.

    +
    # Inputs and hash result for the Hash First Degree Hash algorithm for unique hashes example
    +h1dq:
    +  log point: Hash First Degree Quads function (4.6.3).
    +  nquads:
    +    - <http://example.com/#p> <http://example.com/#q> _:a .
    +    - _:a <http://example.com/#s> <http://example.com/#u> .
    +  hash: 21d1dd5ba21f3dee9d76c0c00c260fa6f5d5d65315099e553026f4828d0dc77a
    +
    +
  10. +
+
+
+ + + +

4.8 Hash N-Degree Quads

+ + +

This algorithm calculates a hash for a given blank node + across the quads in a dataset in which that blank node + is a component for which the hash does not uniquely identify that blank node. + This is done by expanding the search from quads directly referencing that + blank node (the mention set), to those quads + which contain nodes which are also components of quads in the mention set, + called the gossip path. + This process proceeds in every greater degrees of indirection until + a unique hash is obtained.

+ +

4.8.1 Overview

This section is non-normative.

+ + +

Usually, when trying to determine if two nodes in a graph are + equivalent, you simply compare their identifiers. However, what if the + nodes don't have identifiers? Then you must determine if the two nodes + have equivalent connections to equivalent nodes all throughout the + whole graph. This is called the graph isomorphism problem. This + algorithm approaches this problem by considering how one might draw + a graph on paper. You can test to see if two nodes are equivalent + by drawing the graph twice. The first time you draw the graph the + first node is drawn in the center of the page. If you can draw the + graph a second time such that it looks just like the first, except + the second node is in the center of the page, then the nodes are + equivalent. This algorithm essentially defines a deterministic way to + draw a graph where, if you begin with a particular node, the graph + will always be drawn the same way. If two graphs are drawn the same way + with two different nodes, then the nodes are equivalent. A + hash is used to indicate a particular way that the graph + has been drawn and can be used to compare nodes.

+ +

When two blank nodes have the same first degree hash, + extra steps must be taken to detect global, + or N-degree, distinctions. + All information that is in any way connected to the blank node n + through other blank nodes, even transitively, must be considered.

+ +

To consider all transitive information, + the algorithm traverses and encodes all possible paths of incident + mentions emanating from n, called gossip paths, + that reach every unlabeled blank node connected to n. + Each unlabeled blank node is assigned a temporary identifier + in the order in which it is reached in the + gossip path being explored. + The mentions that are traversed to reach + connected blank nodes are encoded in these paths via related hashes. + This provides a deterministic way to order all paths coming from n that + reach all blank nodes connected to n without relying on input blank + node identifiers.

+ +

This algorithm works in concert with the main canonicalization algorithm + to produce a unique, deterministic identifier for a particular blank + node. This hash incorporates all of the information that + is connected to the blank node as well as how it is connected. It does + this by creating deterministic paths that emanate out from the blank + node through any other adjacent blank nodes.

+ +

Ultimately, the algorithm selects the shortest gossip path + (based on its encoding as a string), distributing canonical + identifiers to the unlabeled blank nodes in the order in which they + appear in this path. + The hash of this encoded shortest path, + called the N-degree hash of n, + distinguishes n from other blank nodes in the dataset.

+ +

For clarity, we consider a gossip path encoded via the string s + to be shortest provided that:

+ +
    +
  1. The length of s is less than or equal to the length + of any other gossip path string s′.
  2. +
  3. If s and s′ have the same length (as strings), + then s is code point ordered less than or equal to s′.
  4. +
+ +

For example, abc is shorter than bbc, + whereas abcd is longer than bcd.

+ +

The following provides a high level outline for how the N-degree hash of n + is computed along the shortest gossip path. + Note that the full algorithm considers all gossip paths, + ultimately returning the hash of the shortest encoded path.

+ +
    +
  1. Compute related hashes. + Compute the related hash Hn set for n, + i.e., all first degree mentions between n and another blank node. + Note that this includes both unlabeled blank nodes and those + already issued a canonical identifier (labeled blank nodes).
  2. +
  3. Explore mentions. + Given the related hash x in Hn, + record x in the data to hash Dn. + Determine whether each blank node reachable via the mention with related hash x + has already received an identifier. +
      +
    1. Record the identifiers of labeled nodes. + If a blank node already has an identifier, + record its identifier in Dn once for every + mention with related hash x. + Skip to the next related hash in Hn + and repeat step 2.
    2. +
    3. Distribute and record temporary identifiers to unlabeled nodes. + For each unlabeled blank node, + assign it a temporary identifier according to the order in which it is reached in the gossip path, + recording its given identifier in Dn (including repetitions). + Add each unlabeled node to the recursion list Rn(x) + in this same order (omitting repetitions).
    4. +
    5. Recurse on newly labeled nodes. + For each ni in Rn(x) +
        +
      1. Record its identifier in Dn
      2. +
      3. Append < r(i) > to Dn + where r(i) is the data to hash that results from returning to + step 1, + replacing n with ni.
      4. +
      +
    6. +
    +
  4. +
  5. Compute the N-degree hash of n. + Hash Dn to return the N-degree hash of n, + namely hN(n). + Return the updated issuer In + that has now distributed temporary identifiers to all unlabeled blank nodes connected to n.
  6. +
+ +

As described above in step 2.3, + HN recurses on each unlabeled blank node + when it is first reached along the gossip path being explored. + This recursion can be visualized as moving along the path from n + to the blank node ni that is receiving a temporary identifier. + If, when recursing on ni, + another unlabeled blank node nj is discovered, + the algorithm again recurses. + Such a recursion traces out the gossip path from n + to nj via ni.

+ +

The recursive hash r(i) is the hash returned from + the completed recursion on the node ni + when computing hN(n). + Just as hN(n) is the hash of Dn, + we denote the data to hash in the recursion on ni + as Di. + So, r(i) = h(Di). + For each related hash xHn, + Rn(x) is called the recursion list on + which the algorithm recurses.

+
+ +

4.8.2 Examples

This section is non-normative.

+ + + +
+ +

4.8.3 Algorithm

+ + +

The inputs to this algorithm are the canonicalization state, + the identifier for the blank node to + recursively hash quads for, and path identifier issuer which is + an identifier issuer that issues temporary + blank node identifiers. The output from this algorithm + will be a hash and the identifier issuer used + to help generate it.

+
+ Logging +

Log the inputs to the algorithm.

+
# Inputs for the Hash N-Degree Quads algorithm for double circle example
+hndq:
+  log point: Hash N-Degree Quads function (4.8.3).
+  identifier: e0
+  issuer: {e0: b0}
+  ...
+
+ +
    +
  1. Create a new map Hn + for relating hashes to related blank nodes.
  2. +
  3. Get a reference, quads, to the list of quads + from the map entry + for identifier + in the blank node to quads map. +
    + Explanation +

    quads is the mention set of identifier.

    +
    +
    + Logging +

    Log the quads from the mention set of identifier.

    +
    # Inputs for the Hash N-Degree Quads algorithm for double circle example
    +hndq:
    +  identifier: e0
    +  log point: Hash N-Degree Quads function (4.8.3).
    +  issuer: {e0: b0}
    +  hndq.2:
    +    log point: Quads for identifier (4.8.3 (2)).
    +    quads:
    +    - _:e0 <http://example.org/vocab#next> _:e1 .
    +    - _:e0 <http://example.org/vocab#prev> _:e1 .
    +    - _:e1 <http://example.org/vocab#next> _:e0 .
    +    - _:e1 <http://example.org/vocab#prev> _:e0 .
    +  ...
    +
    +
  4. +
  5. For each quad in quads: +
    + Explanation +

    This loop calculates the related hash Hn + for other blank nodes within the mention set of identifier.

    +
    +
      +
    1. For each component in quad, where component + is the subject, object, or + graph name, and it is a + blank node that is not identified by + identifier: +
        +
      1. Set hash to the result of the + Hash Related Blank Node algorithm, + passing the blank node identifier for + component as related, quad, + issuer, and + position as either s, o, or + g based on whether component is a + subject, object, + graph name, respectively.
      2. +
      3. Add a mapping of hash to the + blank node identifier for component + to Hn, adding an entry + as necessary.
      4. +
      +
    2. +
    +
    + Logging +

    Include the logs for each iteration of the + Hash Related Blank Node algorithm + and the resulting Hn.

    +
    # Step 3 of Hash N-Degree Quads using double circle example
    +hndq:
    +  identifier: e0
    +  log point: Hash N-Degree Quads function (4.8.3).
    +  issuer: {e0: b0}
    +  ...
    +hndq.3:
    +  log point: Hash N-Degree Quads function (4.8.3 (3)).
    +  with:
    +    - quad: _:e0 <http://example.org/vocab#next> _:e1 .
    +      hndq.3.1:
    +        log point: Hash related bnode component (4.8.3 (3.1))
    +        with:
    +          - position: o
    +            related: e1
    +            h1dq:
    +              log point: Hash First Degree Quads function (4.6.3).
    +              nquads:
    +                - _:z <http://example.org/vocab#next> _:a .
    +                - _:z <http://example.org/vocab#prev> _:a .
    +                - _:a <http://example.org/vocab#next> _:z .
    +                - _:a <http://example.org/vocab#prev> _:z .
    +              hash: 60dc8fc7b5481014b6ea38efb05455676d1e93e19b99119ab294941dacc16b3b
    +            input: "o<http://example.org/vocab#next>60dc8fc7b5481014b6ea38efb05455676d1e93e19b99119ab294941dacc16b3b"
    +            hash: 20bb08971220a5382a9a06ba2977c5fb859e63192e0b2015a378af89e453f25e
    +    - quad: _:e0 <http://example.org/vocab#prev> _:e1 .
    +      ...
    +  Hash to bnodes:
    +      20bb08971220a5382a9a06ba2977c5fb859e63192e0b2015a378af89e453f25e:
    +        - e1
    +      1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d:
    +        - e1
    +      56d0774755aaf8d9cf4da8af3728e5589f94e5cd7d9aee86f0c5a7bc1d71c7ca:
    +        - e1
    +      2a5dd448b9467a08479008a5350829441868b7f913343cd500fe8619e047cff4:
    +        - e1
    +...
    +
    +
  6. +
  7. Create an empty string, data to hash.
  8. +
  9. For each related hash to blank node list mapping in + Hn, code point ordered + by related hash: +
    + Explanation +

    This loop explores the gossip paths for each + related blank node sharing a common hash to identifier + finding the shortest such path (chosen path). + This determines how canonical identifiers for + otherwise commonly hashed blank nodes are chosen. +

    +

    + Each path is represented by the concatenation of the + identifiers for each related blank node + — either the issued identifier, + or a temporary identifier created using a copy of issuer. + Those for which temporary identifiers were issued are later + recursed over using this algorithm. +

    +
    +
    + Logging +

    Log the value of related hash + and state of data to hash.

    +
    # Log related hash and data to hash in each iteration of step 5 for double circle example.
    +hndq:
    +  log point: Hash N-Degree Quads function (4.8.3).
    +  identifier: e0
    +  issuer: {e0: b0}
    +  ...
    +  hndq.5:
    +    log point: Hash N-Degree Quads function (4.8.3 (5)), entering loop.
    +    with:
    +    - related_hash: 1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d
    +      data_to_hash: ""
    +      ...
    +
    +
      +
    1. Append the related hash to the data to hash.
    2. +
    3. Create a string chosen path.
    4. +
    5. Create an unset chosen issuer variable.
    6. +
    7. For each permutation p of blank node list: +
      + Logging +

      Log each permutation p.

      +
      # Log each permutation of step 5.4 using double circle example.
      +hndq:
      +  log point: Hash N-Degree Quads function (4.8.3).
      +  identifier: e0
      +  issuer: {e0: b0}
      +  ...
      +  hndq.5:
      +    log point: Hash N-Degree Quads function (4.8.3 (5)), entering loop.
      +    with:
      +    - related_hash: 1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d
      +      data_to_hash: ""
      +      hndq.5.4:
      +        log point: Hash N-Degree Quads function (4.8.3 (5.4)), entering loop.
      +        with:
      +        - perm: [ "e1"]
      +          ...
      +
      +
        +
      1. Create a copy of issuer, issuer copy.
      2. +
      3. Create a string path.
      4. +
      5. Create a recursion list, to store + blank node identifiers that must be + recursively processed by this algorithm.
      6. +
      7. For each related in p: +
          +
        1. If a canonical identifier has been issued for + related by canonical issuer, append the string _:, followed by + the canonical identifier for related, to path. +
          Explanation +

          A canonical identifier may have been generated before calling this algorithm, + if it was issued from an earlier call to Hash First Degree Quads algorithm. + There is no reason to recurse and apply the algorithm to any related blank node that has already been assigned a canonical identifier. + Furthermore, using the canonical identifier also further distinguishes it from any temporary identifier, allowing for even greater efficiency in finding the chosen path.

          +
          +
        2. +
        3. Otherwise: +
            +
          1. If issuer copy has not issued + an identifier for related, append + related to recursion list. +
            + Explanation +

            Temporarily labeled nodes have identifiers recorded + in issuer copy, + which is later used to recursively call this algorithm, + so that eventually all nodes are given canonical identifiers.

            +
            +
          2. +
          3. Use the + Issue Identifier algorithm, + passing issuer copy and the related, and + append the string _:, followed by the result, to path.
          4. +
          +
        4. +
        5. If chosen path is not empty and the length + of path is greater than or equal to the length + of chosen path and path is + greater than chosen path when + considering code point order, + then skip to the next + permutation p. +
          + Explanation +

          If path is already longer than + the prospective chosen path, + we can terminate this iteration early.

          +
          +
        6. +
        +
        + Explanation +

        path is used to generate a hash at a later step; in this respect, it is similar to + the Hash First Degree Quads algorithm which + uses the serialization of quads in nquads for hashing. For the sake of consistency, the + nquad representation of blank node identifiers is used in these steps, hence the + usage of the _: string.

        +
        +
        + Logging +

        Log related and path.

        +
        # Log related and path of step 5.4.4 using double circle example.
        +hndq:
        +  log point: Hash N-Degree Quads function (4.8.3).
        +  identifier: e0
        +  issuer: {e0: b0}
        +  ...
        +  hndq.5:
        +    log point: Hash N-Degree Quads function (4.8.3 (5)), entering loop.
        +    with:
        +    - related_hash: 1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d
        +      data_to_hash: ""
        +      hndq.5.4:
        +        log point: Hash N-Degree Quads function (4.8.3 (5.4)), entering loop.
        +        with:
        +        - perm: [ "e1"]
        +          hndq.5.4.4:
        +            log point: Hash N-Degree Quads function (4.8.3 (5.4.4)), entering loop.
        +            with:
        +              - related: e1
        +                path: ""
        +          ...
        +
        +
      8. +
      9. For each related in recursion list: +
        + Explanation +

        The prospective path is extended with + the hash resulting from recursively calling this algorithm + on each related blank node issued a temporary identifier.

        +
        +
        + Logging +

        Log recursion list and path.

        +
        # Log related and path of step 5.4.5 using double circle example.
        +hndq:
        +  log point: Hash N-Degree Quads function (4.8.3).
        +  identifier: e0
        +  issuer: {e0: b0}
        +  ...
        +  hndq.5:
        +    log point: Hash N-Degree Quads function (4.8.3 (5)), entering loop.
        +    with:
        +    - related_hash: 1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d
        +      data_to_hash: ""
        +      hndq.5.4:
        +        log point: Hash N-Degree Quads function (4.8.3 (5.4)), entering loop.
        +        with:
        +        - perm: [ "e1"]
        +          ...
        +          hndq.5.4.5:
        +            log point: Hash N-Degree Quads function (4.8.3 (5.4.5)), before possible recursion.
        +            recursion list: [ "e1"]
        +            path: "_:b1"
        +          ...
        +
        +
          +
        1. Set result to the result of recursively executing + the Hash N-Degree Quads algorithm, + passing the canonicalization state, + related for identifier, and + issuer copy for path identifier issuer. +
          + Logging +

          Log related and + include logs for each recursive call to Hash N-Degree Quads algorithm.

          +
          # Log related and path of step 5.4.5.1 using double circle example.
          +hndq:
          +  log point: Hash N-Degree Quads function (4.8.3).
          +  identifier: e0
          +  issuer: {e0: b0}
          +  ...
          +  hndq.5:
          +    log point: Hash N-Degree Quads function (4.8.3 (5)), entering loop.
          +    with:
          +    - related_hash: 1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d
          +      data_to_hash: ""
          +      hndq.5.4:
          +        log point: Hash N-Degree Quads function (4.8.3 (5.4)), entering loop.
          +        with:
          +        - perm: [ "e1"]
          +          ...
          +          hndq.5.4.5:
          +            log point: Hash N-Degree Quads function (4.8.3 (5.4.5)), before possible recursion.
          +            recursion list: [ "e1"]
          +            path: "_:b1"
          +            with:
          +              - related: e1
          +                hndq:
          +                  ...
          +
          +
        2. +
        3. Use the + Issue Identifier algorithm, + passing issuer copy and related; append the string _:, followed by + the result, to path.
        4. +
        5. Append <, the hash in + result, and > to path.
        6. +
        7. Set issuer copy to the + identifier issuer in result.
        8. +
        9. If chosen path is not empty and the length + of path is greater than or equal to the length + of chosen path and path is + greater than chosen path when considering code point order, + then skip to the next p. +
          + Explanation +

          If path is already longer than + the prospective chosen path, + we can terminate this iteration early.

          +
          +
        10. +
        +
      10. +
      11. If chosen path is empty or path is + less than chosen path when considering code point order, + set chosen path to path and chosen issuer + to issuer copy. +
      12. +
      +
    8. +
    9. Append chosen path to data to hash. +
      + Logging +

      Log chosen path and data to hash.

      +
      # Log chosen path and data to hash logs of step 5.5 using double circle example.
      +hndq:
      +  log point: Hash N-Degree Quads function (4.8.3).
      +  identifier: e0
      +  issuer: {e0: b0}
      +  ...
      +  hndq.5:
      +    log point: Hash N-Degree Quads function (4.8.3 (5)), entering loop.
      +    with:
      +    - related_hash: 1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d
      +      data_to_hash: ""
      +      ...
      +      hndq.5.5:
      +        log point: Hash N-Degree Quads function (4.8.3 (5.5). End of current loop with Hn hashes.
      +        chosen path: "_:b1_:b1<1ae899f76e760eb7caf6656437aaef845b50887aff7baeb3531add85ec02ed35>"
      +        data to hash: "1e4e55ba02b8b0b527c32e2343fbcfee2e2bd9c1972c67cc01f85fabde7bc42d_:b1_:b1<1ae899f76e760eb7caf6656437aaef845b50887aff7baeb3531add85ec02ed35>"
      +      ...
      +
      +
    10. +
    11. Replace issuer, by reference, withchosen issuer.
    12. +
    +
  10. +
  11. Return issuer and the hash that results from + passing data to hash through the + hash algorithm. +
    + Logging +

    Log issuer and results from passing data to hash + through the hash algorithm.

    +
    # Log issuer and resulting hash of step 6 using double circle example.
    +hndq:
    +  log point: Hash N-Degree Quads function (4.8.3).
    +  identifier: e0
    +  issuer: {e0: b0}
    +  ...
    +  hndq.6:
    +    log point: Leaving Hash N-Degree Quads function (4.8.3).
    +    hash: e332b4b59e1c4794ee72a4df0f63723326ffb6d6a5c0d0cb4d2dd8d8d5ebf5a4
    +    issuer: {e0: b0, e1: b1}
    +
    +
  12. +
+
+
+
+ +

5. Serialization

+ + +

This section describes the process of creating a serialized [N-Quads] representation + of a canonicalized dataset.

+ +

The serialized canonical form of a canonicalized dataset + is an N-Quads document [N-QUADS] + created by representing each quad from the canonicalized dataset + in canonical n-quads form, + sorting them into code point order, + and concatenating them. (Note that each canonical N-Quads statement ends with a new line, + so no additional separators are needed in the concatenation.) + The resulting document has a media type of application/n-quads, + as described in C. N-Quads Internet Media Type, File Extension and Macintosh File Type + of [N-QUADS].

+ +

When serializing quads in canonical n-quads form, + components which are blank nodes MUST be serialized using the + canonical label associated with each blank node + from the issued identifiers map component of the + canonicalized dataset.

+ + +
+ +

6. Privacy Considerations

This section is non-normative.

+ + +

The nature of the canonicalization algorithm inherently correlates its output, + i.e., the canonical labels and the sorted order of quads, with the input dataset. + This could pose issues, particularly when dealing with datasets containing personal information. + For example, even if certain information is removed from the canonicalized dataset + for some privacy-respecting reason, there remains the possibility that a third party + could infer the omitted data by analyzing the canonicalized dataset. + If it is necessary to decouple the canonicalization algorithm's input and output, + some suitable post-processing methods for the output of the canonicalization should be performed. + This specification has been designed to help make additional processing easier, but + other specifications that build on top of this one are responsible for providing any + specific details. + See Selective Disclosure + in Verifiable Credential Data Integrity 1.0 [VC-DATA-INTEGRITY] for more details about such + post-processing methods. +

+
+ +

7. Security Considerations

This section is non-normative.

+ + +

7.1 Dataset Poisoning

This section is non-normative.

+ + +

The canonicalization algorithm examines every difference in the + information connected to blank nodes in order to ensure that each will + properly receive its own canonical identifier. This process can be + exploited by attackers to construct datasets which are known to take + large amounts of computing time to canonicalize, but that do not express + useful information or express it using unnecessary complexity. + Implementers of the algorithm are expected to add mitigations that will, + by default, abort canonicalizing problematic inputs. +

+

Suggested mitigations include, but are not limited to:

+
    +
  • providing a configurable timeout with a default value applicable to + an implementation's common use
  • +
  • providing a configurable limit on the number of iterations of steps + performed in the algorithm, particularly recursive steps + and permutations of long lists
  • +
+ +

Additionally, software that uses implementations of the algorithm can + employ best-practice schema validation to reject data that does not meet + application requirements, thereby preventing useless poison datasets from + being processed. However, such mitigations are application specific and + not directly applicable to implementers of the canonicalization algorithm + itself. +

+
+ +

7.2 Insecure Hash Algorithms

This section is non-normative.

+ + +

It is possible that the default hash algorithm used by RDFC-1.0 might become + insecure at some point in the future. To mitigate this, this algorithm + and implementations of it can be parameterized to use a different + hash function, without the need to make any changes to the + canonicalization algorithm itself. + However, using a different hash algorithm will generally lead to different results; + applications making use of this specification should carefully weigh the advantages + and disadvantages of using an alternative hash function. +

+ +
Note

The possible implications of the default hash algorithm + becoming insecure are mitigated by that fact that no internal hash + values are revealed, and the canonicalization algorithm is designed to cope + with first-degree hash collisions.

+
+ +
+ +

8. Use Cases

This section is non-normative.

+ +

The use cases that have driven the development of the RDF Dataset Canonicalization algorithm are documented in a separate document. It includes further background and explanations for the design decisions taken [RCH-EXPLAINER].

+
+ +

9. Examples

This section is non-normative.

+ + +

9.1 Duplicate Paths

+ + +

This example illustrates a more complicated example where the same paths + through blank nodes are duplicated in a graph, but use different + blank node identifiers.

+ +
+ + +

+ The image represents the graph described in + + the following code block + .

+
+
Figure 7 An illustration of a graph with duplicated paths.
+ Image available in + + SVG + .
+
+ +
_:e0 :p1 _:e1 .
+_:e1 :p2 "Foo" .
+_:e2 :p1 _:e3 .
+_:e3 :p2 "Foo" .
+ +

The following is a summary of the more detailed execution log + found here.

+ + +
+ +

9.2 Double Circle

+ + +

This example illustrates another complicated example of + nodes that are doubly connected in opposite directions.

+ +
+ + +

+ The image represents the graph described in + + the following code block + .

+
+
Figure 8 An illustration of a graph back and forth links to nodes.
+ Image available in + + SVG + .
+
+ +
_:e0 :next _:e1 .
+_:e0 :prev _:e1 .
+_:e1 :next _:e0 .
+_:e1 :prev _:e0 .
+ +

The example is not explored in detail, but the + execution log found here + shows examples of more complicated + pathways through the algorithm

+ +
+ +

9.3 Dataset with Blank Node Named Graph

+ + +

This example illustrates an example of a dataset, + where one graph is named using a blank node, + which is also the object of a triple in the default graph.

+ +
+ + +

+ The image represents the dataset described in + + the following code block + .

+
+
Figure 9 An illustration of a dataset containing a graph named with a blank node.
+ Image available in + + SVG + .
+
+ +
_:e0 :p1 _:e1 .
+_:e1 :p2 "Foo" .
+_:e1 :p3 _:g0 .
+_:e0 :p1 _:e1 _:g0 .
+_:e1 :p2 "Bar" _:g0 .
+ +

The following is a summary of the more detailed execution log + found here.

+ + +
+
+ +

A. A Canonical form of N-Quads

+ + +

This section defines a canonical form of N-Quads which has + a completely specified layout. + The grammar for the language remains unchanged.

+ +

Canonical N-Quads updates and extends + Canonical N-Triples in [N-TRIPLES] + to include graphLabel.

+ +

While the N-Quads syntax [N-QUADS] allows choices for the representation and layout of RDF data, + the canonical form of N-Quads provides a unique syntactic representation of any quad. + Each code point + can be represented by only one of + UCHAR, + ECHAR, + or unencoded character, + where the relevant production allows for a choice in representation. + Each quad is represented entirely on a single line with specified white space.

+ +

Canonical N-Quads has the following additional constraints on layout:

+ +
+ +

B. URDNA2015

This section is non-normative.

+ + +

RDF Dataset Canonicalization [CCG-RDC-FINAL] describes + "Universal RDF Dataset Normalization Algorithm 2015" + (URDNA2015), + essentially the same algorithm + as RDFC-1.0, and generally implementations implementing URDNA2015 + should be compatible with this specification. + The minor change is in the canonical n-quads form where + some control characters were previously represented without escaping. + The version of the algorithm defined in A. A Canonical form of N-Quads + clarifies the representation of simple literals and the characters + within STRING_LITERAL_QUOTE + that are encoded using ECHAR.

+
+ +

C. URGNA2012

This section is non-normative.

+ +

A previous version of this algorithm has light deployment. For purposes of identification, + the algorithm is called the + "Universal RDF Graph Canonicalization Algorithm 2012" + (URGNA2012), + and differs from the stated algorithm in the following ways:

+ +
+ +

D. Index

D.1 Terms defined by this specification

+ + +

D.2 Terms defined by reference

+ +
    +
  • + [BCP47] defines the following: +
      +
    • + formatting conventions +
    • +
    +
  • + [INFRA] defines the following: +
      +
    • + boolean type +
    • + entry (for map) +
    • + key (for map) +
    • + list +
    • + map +
    • + string +
    • +
    +
  • + [N-QUADS] defines the following: +
      +
    • + C. N-Quads Internet Media Type, File Extension and Macintosh File Type +
    • + ECHAR +
    • + EOL +
    • + graphLabel +
    • + HEX +
    • + literal +
    • + STRING_LITERAL_QUOTE +
    • + UCHAR +
    • +
    +
  • + [N-TRIPLES] defines the following: +
      +
    • + Canonical N-Triples +
    • +
    +
  • + [RDF11-CONCEPTS] defines the following: +
      +
    • + blank node identifiers +
    • + blank nodes +
    • + Blank Nodes +
    • + default graph +
    • + graph name +
    • + IRIs +
    • + isomorphic datasets +
    • + language tags +
    • + language-tagged strings +
    • + lexical form +
    • + literal +
    • + literal term equality +
    • + literal value +
    • + object type +
    • + predicate +
    • + RDF datasets +
    • + RDF graph +
    • + RDF source +
    • + RDF triple +
    • + Section 3.3 +
    • + simple literals +
    • + Skolem IRIs +
    • + subject +
    • +
    +
  • + [RDF11-MT] defines the following: +
      +
    • + RDF Collections +
    • +
    +
  • + [VC-DATA-INTEGRITY] defines the following: +
      +
    • + Selective Disclosure +
    • +
    +
  • + [XML11] defines the following: +
      +
    • + Char +
    • +
    +
  • + [XPATH-FUNCTIONS] defines the following: +
      +
    • + Unicode code point order +
    • +
    +
  • +
+
+ +

E. Changes since the First Public Working Draft of 24 November 2022

This section is non-normative.

+ + +
+ +

F. Changes since the Candidate Recommentation Snapshot of 31 October 2023

This section is non-normative.

+ + +
+ +

G. Acknowledgements

This section is non-normative.

+ + +

The editors would like to thank Jeremy Carroll for his work on the + graph canonicalization problem, + Andy Seaborne and Gavin Carothers for providing valuable feedback and testing input + for the algorithm defined in this specification, + Sir Tim Berners-Lee for his thoughts on graph canonicalization over the years, + Jesús Arias Fisteus for his work on a similar algorithm, and Aiden Hogan, whose + publication [Hogan-Canonical-RDF] provided an important contemporary + analysis of the canonicalization problem and served as an independent + justification of the development of RDFC-1.0.

+ +

The editors would also like to thank + the chairs of the Working Group, Phil Archer and Markus Sabadello, + and specific members of the Working Group whose active contributions + were critical in completing this work: + Pierre-Antoine Champin, + Ivan Herman, + David Lehn, + Kazue Sako, + Manu Sporny, and + Ted Thibodeau Jr. +

+ +

Members of the RDF Dataset Canonicalization and Hash Working Group Group included Ahamed Azeem, Ahmad Alobaid, Andy Seaborne, Benjamin Goering, Benjamin Young, Brent Zundel, Damien Graux, Dan Brickley, Dan Yamamoto, Daniel Pape, Dave Longley, David Lehn, Duy-Tung Luu, Gregg Kellogg, Ivan Herman, Jean-Yves Rossi, Jennifer Meier, Jesse Wright, Kazue Sako, Leonard Rosenthol, Mahmoud Alkhraishi, Manu Sporny, Markus Sabadello, Michael Prorock, Phil Archer, Pierre-Antoine Champin, Sebastian Crane, Ted Thibodeau Jr, Timothée Haudebourg, and Tobias Kuhn. +

+ +

This specification is based on work done in the + W3C Credentials Community Group + published as [CCG-RDC-FINAL]. + Contributors to the Community Group Final Report include: + Blake Regalia, + Dave Longley, + David Lehn, + David Lozano Jarque, + Gregg Kellogg, + Manu Sporny, + Markus Sabadello, + Matt Collier, and + Sebastian Schmittner. +

+ +

+ Portions of the work on this specification have been funded by the European + Union's StandICT.eu 2023 program under sub-grantee contract numbers No. 08/12 + and 09/25. The + content of this specification does not necessarily reflect the position or the + policy of the European Union and no official endorsement should be inferred. +

+ +

+ Portions of the work on this specification have also been funded by the U.S. + Department of Homeland Security's Silicon Valley Innovation Program under contracts + 70RSAT21T00000020 and 70RSAT23T00000006. The content of this specification does + not necessarily reflect the position or the policy of the U.S. Government and no + official endorsement should be inferred. +

+ +

+ The Working Group acknowledges that the success of this specification is + dependent on a long history of work performed over multiple decades in both + academia and industry. We thank the individuals who iterated on the science + which led to the completion of this specification. A partial list of these + papers is found below, to the best of the Working Group's recollection. + Omission from this list, whether intentional or unintentional, is not meant + to imply that such an unlisted paper was not similarly important to the + development of this work. +

+ + +
+ + + +

H. References

H.1 Normative references

+ +
[FIPS-180-4]
+ FIPS PUB 180-4: Secure Hash Standard (SHS). U.S. Department of Commerce/National Institute of Standards and Technology. August 2015. National Standard. URL: https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf +
[INFRA]
+ Infra Standard. Anne van Kesteren; Domenic Denicola. WHATWG. Living Standard. URL: https://infra.spec.whatwg.org/ +
[N-Quads]
+ RDF 1.1 N-Quads. Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/n-quads/ +
[N-TRIPLES]
+ RDF 1.1 N-Triples. Gavin Carothers; Andy Seaborne. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/n-triples/ +
[RDF11-CONCEPTS]
+ RDF 1.1 Concepts and Abstract Syntax. Richard Cyganiak; David Wood; Markus Lanthaler. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf11-concepts/ +
[RFC2119]
+ Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119 +
[RFC3987]
+ Internationalized Resource Identifiers (IRIs). M. Duerst; M. Suignard. IETF. January 2005. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc3987 +
[RFC8174]
+ Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174 +
[Turtle]
+ RDF 1.1 Turtle. Eric Prud'hommeaux; Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/ +
[UNICODE]
+ The Unicode Standard. Unicode Consortium. URL: https://www.unicode.org/versions/latest/ +
[XML11]
+ Extensible Markup Language (XML) 1.1 (Second Edition). Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau; John Cowan et al. W3C. 16 August 2006. W3C Recommendation. URL: https://www.w3.org/TR/xml11/ +
[XPATH-FUNCTIONS]
+ XQuery 1.0 and XPath 2.0 Functions and Operators (Second Edition). Ashok Malhotra; Jim Melton; Norman Walsh; Michael Kay. W3C. 14 December 2010. W3C Recommendation. URL: https://www.w3.org/TR/xpath-functions/ +
+

H.2 Informative references

+ +
[BCP47]
+ Tags for Identifying Languages. A. Phillips, Ed.; M. Davis, Ed.. IETF. September 2009. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc5646 +
[CCG-RDC-FINAL]
+ RDF Dataset Canonicalization. Dave Longley. W3C. October 9, 2022. CG-FINAL. URL: https://www.w3.org/community/reports/credentials/CG-FINAL-rdf-dataset-canonicalization-20221009/ +
[DesignIssues-Diff]
+ Delta: an ontology for the distribution of differences between RDF graphs. Tim Berners-Leee. W3C. September 25, 2015. unofficial. URL: https://www.w3.org/DesignIssues/Diff +
[eswc2014Kasten]
+ A Framework for Iterative Signing of Graph Data on the Web. Andreas Kasten; Ansgar Scherp; Peter Schauß . ISWC 2014. 2014. unofficial. URL: https://doi.org/10.1007/978-3-319-07443-6_11 +
[Hogan-Canonical-RDF]
+ Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes. Aiden Hogan. ACM. November 2017. ACM Trans. Web 11, 4, Article 22. URL: https://aidanhogan.com/docs/rdf-canonicalisation.pdf +
[HPL-2003-142]
+ Signing RDF Graphs. Jeremy J. Carroll. HP Laboratories Bristol. July 23, 2003. unofficial. URL: https://web.archive.org/web/20230129125726/https://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf +
[RCH-EXPLAINER]
+ RDF Dataset Canonicalization and Hash Working Group — Explainer and Use Cases. Phil Archer. W3C. 19 October 2023. W3C Working Group Note. URL: https://www.w3.org/TR/rch-explainer/ +
[RDF11-MT]
+ RDF 1.1 Semantics. Patrick Hayes; Peter Patel-Schneider. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf11-mt/ +
[VC-DATA-INTEGRITY]
+ Verifiable Credential Data Integrity 1.0. Manu Sporny; Dave Longley; Greg Bernstein; Dmitri Zagidulin; Sebastian Crane. W3C. 28 April 2024. W3C Candidate Recommendation. URL: https://www.w3.org/TR/vc-data-integrity/ +
[YAML]
+ YAML Ain’t Markup Language (YAML™) Version 1.2. Oren Ben-Kiki; Clark Evans; Ingy döt Net. 1 October 2009. URL: http://yaml.org/spec/1.2/spec.html +
+
\ No newline at end of file diff --git a/publication-snapshots/REC/ca-overview.svg b/publication-snapshots/REC/ca-overview.svg new file mode 100644 index 0000000..985668d --- /dev/null +++ b/publication-snapshots/REC/ca-overview.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/publication-snapshots/REC/dataset-bn-graph.svg b/publication-snapshots/REC/dataset-bn-graph.svg new file mode 100644 index 0000000..550431e --- /dev/null +++ b/publication-snapshots/REC/dataset-bn-graph.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/publication-snapshots/REC/double-circle.svg b/publication-snapshots/REC/double-circle.svg new file mode 100644 index 0000000..b296910 --- /dev/null +++ b/publication-snapshots/REC/double-circle.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/publication-snapshots/REC/duplicate-paths.svg b/publication-snapshots/REC/duplicate-paths.svg new file mode 100644 index 0000000..c227df8 --- /dev/null +++ b/publication-snapshots/REC/duplicate-paths.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/publication-snapshots/REC/shared-hashes.svg b/publication-snapshots/REC/shared-hashes.svg new file mode 100644 index 0000000..7995383 --- /dev/null +++ b/publication-snapshots/REC/shared-hashes.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/publication-snapshots/REC/unique-hashes.svg b/publication-snapshots/REC/unique-hashes.svg new file mode 100644 index 0000000..1a6f32f --- /dev/null +++ b/publication-snapshots/REC/unique-hashes.svg @@ -0,0 +1 @@ + \ No newline at end of file