Skip to content

rdf-ext/documentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 

Repository files navigation

RDF-EXT documentation

What is rdf-ext

Rdf-ext is a JavaScript library for working with RDF. RDF is a graph data model defined by the W3C and is widely implemented. If you are new to RDF have a look at our Linked Data Training, we assume some basic knowledge of RDF and its data model in this documentation.

What is the RDFJS specification

For many years there were multiple RDF libraries and interfaces available in the JavaScript world. In 2013 the RDF JavaScript Libraries Community Group was initiated. One of the result of this group is the RDFJS specification, a low-level interface for working with RDF and Linked Data in ECMAScript platforms like Web browsers and Node.js.

All libraries described in this document are based on the RDFJS interface and read/write data structures defined in this specification.

What are the main packages & what do they do

The RDFJS interface is a low-level interface specification. Next to the core-module rdf-ext we provide several other module categories that use the same interface:

  • Parsers & serializers: RDF is a data model and unlike other data models it is not bound to a particular serialization. There are many different formats available, some are plain-text based, others are JSON or XML based and even a binary files are available. In general, one can convert each serialization to another one without any loss of data. Parser and serializer modules implement the specification of each format and transform it from/to an RDFJS interface structure. We suggest to always use the provided parsers and serializers and avoid doing this in your own code.

  • Stores: Stores provide a way to persist RDFJS interface structures. This can be a simple in-memory store or a persistent back-end in an RDF graph database (for example a SPARQL endpoint). Each store implements the same abstract interface, if you start with an in-memory store it is easy to switch to a more persistent layer by simply choosing another store implementation. No other code changes are necessary.

In many cases developers want to work with simple interfaces to reduce code complexity and focus on solving problems using RDF. For that reason we introduce modules that are built on top of rdf-ext:

  • Dataset: Dataset is a (work in progress) specification of a high-level interface on top of the RDFJS interface specification. It provides additional functions that facilitate interacting with RDF data. Unless you have a good reason to do so, this is the library you want to start working with.

  • Helpers: Helpers provide abstractions for common tasks in the RDF programming world exposed as simple interfaces. While you could implement the functionality that helpers provide on your own, you save quite some lines of code for particular tasks.

Basics

Note that we only explain our libraries in this document. If you don't understand a particular RDF concept, please follow on the links provided within the text to learn more about it.

A Triple is made up of subject, predicate and object.

A Quad is a Triple with an additional context, whereas the context represents the graph-concept in RDF 1.1. We generally use quads in the example code, but often without explicitly declaring the context, this way the triple is simply added to the so-called "default graph".

Create a triple/quad

const rdf = require('rdf-ext')

let subject = rdf.namedNode('http://example.org/subject') 
let predicate = rdf.namedNode('http://example.org/predicate')
let object = rdf.literal('object')

let quad = rdf.quad(subject, predicate, object)

// log the triples to console with toString()
// note that this is N-Triples/N-Quads serialization in rdf-ext
console.log(quad.toString())
foo@bar:~$ node ./create.js
<http://example.org/subject> <http://example.org/predicate> "object" .

Code for this example

In this example we start with an rdf object from the rdf-ext package. We then create a subject, a predicate and an object. As always with RDF, subject and predicate need to be IRIs. The object can either be an IRI or a literal, as in our example.

We then create the quad. Here we omit declaring the context for the quad explicitly, therefore the triple is simply added to the so-called "default graph".

In the last line we use quad.toString() to log the triple/quad to the console. In rdf-ext, a triple is always in N-Triples like syntax with one triple per line and a dot at the end of the line. In case of a quad, it will be N-Quads syntax. Assuming the example code is stored in a file named create.js , the output it will create is shown above.

If you want to create a blank node instead of a named-node, simply use the blankNode function:

let bnode = rdf.blankNode()

Work with triples

Most of the time we want to work with more than one triple so we need to have some kind of container where we can add multiple triples. While one could do that manually by using arrays, a more suitable structure is dataset. By using this structure you get several useful functions, for example a function for matching triples. Also, dataset takes care of detecting duplicates, so you don't have to.

Let us describe a person, specifically Sheldon Cooper from the TV series The Big Bang Theory. We maintain triples about him in a specific GitHub repository, see here. We will mainly use schema.org/Person as vocabulary in this example.

const rdf = require('rdf-ext')

// create a new dataset using the rdf-ext factory
let dataset = rdf.dataset()
let bnode = rdf.blankNode()

dataset.add(rdf.quad(rdf.namedNode('http://example.org/sheldon'), rdf.namedNode('http://schema.org/givenName'), rdf.literal('Sheldon')))
dataset.add(rdf.quad(rdf.namedNode('http://example.org/sheldon'), rdf.namedNode('http://schema.org/familyName'), rdf.literal('Cooper')))
dataset.add(rdf.quad(rdf.namedNode('http://example.org/sheldon'), rdf.namedNode('http://schema.org/address'), bnode))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/addressCountry'), rdf.literal('US')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/addressLocality'), rdf.literal('Pasadena')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/addressRegion'), rdf.literal('CA')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/postalCode'), rdf.literal('91104')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/streetAddress'), rdf.literal('2311 North Los Robles Avenue, Aparment 4A')))

// log the triples to console with toString()
// note that this is N-Triples serialization in rdf-ext
console.log(dataset.toString())
foo@bar:~$ node ./dataset-add-triples.js 
<http://example.org/sheldon> <http://schema.org/givenName> "Sheldon" .
<http://example.org/sheldon> <http://schema.org/familyName> "Cooper" .
<http://example.org/sheldon> <http://schema.org/address> _:b1 .
_:b1 <http://schema.org/addressCountry> "US" .
_:b1 <http://schema.org/addressLocality> "Pasadena" .
_:b1 <http://schema.org/addressRegion> "CA" .
_:b1 <http://schema.org/postalCode> "91104" .
_:b1 <http://schema.org/streetAddress> "2311 North Los Robles Avenue, Aparment 4A" .

Code for this example

While this was fun, this is not what we normally do. It is way more common to load Triples from any RDF serialization and then work on them.

Parse triples from a file

By installing the npm package tbbt-ld you will get several characters of The Big Bang Theory described in Turtle format. By using the parser package rdf-parser-n3 we can parse it into a dataset structure. Note that the N3 parser (used by rdf-parser-n3) can parse Turtle, TriG, N-Triples, N-Quads, and Notation3 (N3).

const fs = require('fs')
const rdf = require('rdf-ext')
const N3Parser = require('rdf-parser-n3')

// create N3 parser instance
let parser = new N3Parser({factory: rdf})

// Read a Turtle file and stream it to the parser
let quadStream = parser.import(fs.createReadStream('./node_modules/tbbt-ld/data/person/sheldon-cooper.ttl'))

// create a new dataset and import the quad stream into it (reverse pipe) with Promise API
rdf.dataset().import(quadStream).then((dataset) => {
  // loop over all quads an write them to the console
  dataset.forEach((quad) => {
    console.log(quad.toString())
  })
})

Code for this example

Additional parsers available:

Serialize triples

We already saw the simplest form of serialization: In rdf-ext, the .toString() method generates N-Triples or N-Quads. This is also the default behavior for dataset structures. This is a great format for exchanging triples and also suitable for large data dumps. It can also be compressed well with standard compression formats like gzip, bzip, etc. For large based datasets the recommended way to serialize to N-Triples is to use the specific serializer. Let us revisit the dataset example from above and integrate a proper serializer

const rdf = require('rdf-ext')
const Readable = require('stream').Readable
const SerializerNtriples = require('@rdfjs/serializer-ntriples')

// create a new dataset using the rdf-ext factory
let dataset = rdf.dataset()
let bnode = rdf.blankNode()

dataset.add(rdf.quad(rdf.namedNode('http://example.org/sheldon'), rdf.namedNode('http://schema.org/givenName'), rdf.literal('Sheldon')))
dataset.add(rdf.quad(rdf.namedNode('http://example.org/sheldon'), rdf.namedNode('http://schema.org/familyName'), rdf.literal('Cooper')))
dataset.add(rdf.quad(rdf.namedNode('http://example.org/sheldon'), rdf.namedNode('http://schema.org/address'), bnode))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/addressCountry'), rdf.literal('US')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/addressLocality'), rdf.literal('Pasadena')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/addressRegion'), rdf.literal('CA')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/postalCode'), rdf.literal('91104')))
dataset.add(rdf.quad(bnode, rdf.namedNode('http://schema.org/streetAddress'), rdf.literal('2311 North Los Robles Avenue, Aparment 4A')))

// create the serializer
const serializerNtriples = new SerializerNtriples()
const input = dataset.toStream()
const output = serializerNtriples.import(input)

output.on('data', ntriples => {
  console.log(ntriples.toString())
})

Code for this example

You can obviously also stream that directly to a file. Due to the fact that this is stream based, files can be as large as they have to be for your use-case.

Note that with console.log() you will get an additional, empty line. This will not happen if you write the output to a file directly.

Additional serializers available:

At the time writing there is no RDF/XML serializer available.

Using Sink and Source interfaces

In our examples we passed data from one object to another, often using either the .import() or .match() functions. These functions are defined in the RDFJS specification, more specifically in Sink and Source. Parsers and serializers typically implement the Sink interface and thus provide an .import() function. You either pass a text stream (parser) or quad stream (serializer) to this function and it emits either a quad (parser) or a serialized quad (serializer).

The Store interface implements both Source and Sink. If you want to persist data from another structure, you import it to the Store instance by using the .import()function. When you want to pass data from a store to another structure, you can use the .match() function to get some or all quads back.

Note that there is one exception: Dataset behaves differently and does not implement Source and Sink. Although it does provide both a .match() and .import() method, they behave differently by design. Please consult the Dataset chapter about why.

In case you do need Sink and Source interfaces on a Dataset structure you need to use rdf-store-dataset instead, which provides a Source/Sink compatible wrapper around it.

Using a Store

In the beginning of this document, we introduced the concept of a Store. This interface can be used to persist data. When you develop some code this might be a simple in-memory store but sooner or later you want to write the data somewhere, for example into an existing triplestore using SPARQL.

We start with a simple in-memory store based on the Dataset interface introduced above. However, we work with the store API so we can replace this in-memory back-end with a persistent storage layer with only little code changes.

What are the other packages, what do they do

  1. What is this package, when should it be used?
  2. With what other packages does it play well? What are the related packages I should check out?
  3. What are the main objects or classes, methods and their signature
  4. Examples
  1. What is this package, when should it be used?
  2. With what other packages does it play well? What are the related packages I should check out?
  3. What are the main objects or classes, methods and their signature
  4. Examples

After that: Same thing but for the rest of the packages.

API Reference

rdf-ext

Dataset

no, the dataset has a match() and import() method, but the return values are a little bit different. the rdf-store-dataset can be used, if a source and sink interface is required or a in memory store.

or promise interface. so import() accepts a Stream, but returns a Promise. that fits better to the rest of the interface.

and match() returns a new dataset

Spec

Dataset provides a synchronous interface to interact with quads in JavaScript. As mentioned above, it does provide both a .match() and .import() method but they behave differently and do not implement the Source & Sink interface.

Store

About

RDF-Ext documentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •