Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF object mapping #186

Open
frensjan opened this issue May 2, 2023 · 4 comments
Open

RDF object mapping #186

frensjan opened this issue May 2, 2023 · 4 comments

Comments

@frensjan
Copy link

frensjan commented May 2, 2023

I'd like to add yet another option to the (already long) list of suggestions to make consuming data from RDF data(bases) easier.

Why?

As indicated already elsewhere: capturing data in RDF and making it accessible through SPARQL has its merits. However, integration with other ecosystems is sometimes challenging because of the technology mismatch. Most of the time a 'simple' JSON interface is requested which results in hand-written query to object mapping.

Previous work

A lot of related work is provided in: #39, #48, #126, #127, #128

The proposed solution also bears resemblance with JSON construction functions in SQL databases. E.g. jsonb_build_array, json_object_agg and others in PostgreSQL.

Making RDF data accessible through GraphQL is another example. It has the benefit of riding it's hype cycle. However, the mapping is not completely straightforward. Especially when more complex graph patterns are required, aggregations are involved, etc.

The proposed solution also draws from the ability to express (perhaps implicitly) correlated sub queries as described in the proposal for LATERAL. As also linked in other suggestions for SPARQL 1.2, Stardog supports the array aggregation function which relates to this.

Proposed solution

The meat of this proposal is to construct trees of objects (akin to object trees from GraphQL queries) based on solutions from SPARQL queries.

The solution would (probably) require a new query (sub) type, e.g. indicated by SELECT OBJECT, CONSTRUCT OBJECT, CONSTRUCT FRAMED, JSON or something else. For the examples below I took the liberty to introduce the 'SELECT OBJECT' keyword.

Scalars, lists and objects/maps

The abstract data model for results could consist of elements like scalars, lists and objects / maps.

Scalars could be selected by:

  • an expression, including
    • a literal
    • referencing a variable from the underlying query solution
  • a (lateral) subquery which must yields exactly one solution with one variable

Examples of the syntax:

select object ?o
where { ?s ?p ?o }

select object 42
where { ?s ?p ?o }

select object ( ?o = true )
where { ?s ?p ?o }

select object
  ( select ?title where ?t :title ?title )
where {
  ?t a :Thread
}

Lists could be selected by:

  • a sequence of scalar expressions (literals, variables, etc)
  • (lateral) subqueries that has one variable but yields any number of solutions (one for each element in the list

Examples of the syntax:

select object [ ?s, ?o ]
where { ?s ?p ?o }

select object [ ?s 42 ]
where { ?s ?p ?o }

select object [
  strends( str(?s), "foo" ),
  ?p = :label,
  ?o * 3
]
where { ?s ?p ?o }

select object
  [ select ?title where ?t :title ?title ]
where {
  ?t a :Thread
}

Objects / maps could be selected by scalar and list expressions keyed by strings. Examples of the syntax:

select object {
   "@id": ?s,
   "value": ?o,
   "foo": [
      ?o > 42,
      ?o * 3
    ],
   "bar": strends( str(?s), "foo" ),
   "qux": [
      select ?title where { ?s :title ?title }
    ]
}
where { ?s ?p ?o }

Partitioning / grouping

I must not that I haven't fully figured out what the semantics of the (implicit) partitioning / aggregation should be. I'm not that enthusiastic about solutions that depend on sorting the solutions before grouping / partitioning / aggregation.

A possible route here could to define all variables outside of a list as the composite key for the aggregation of the contents of the list. For example given the query:

select ?person ?name where {
  ?person a :Person ;
  :name ?name
}

with solutions:

person name
:p1 John Abrams
:p2 Tim Brown
:p2 Touchdown Timmy
:p3 William Clark

could be queried with

select object [{
    "@id": ?person,
    "names": [ ?name ]
}]
where {
  ?person a :Person ;
  :name ?name
}

with solutions

[
  {
    "@id": ":p1",
    "names": [ "John Abrams" ]
  },
  {
    "@id": ":p2",
    "names": [ "Tim Brown", "Touchdown Timmy" ]
  },
  {
    "@id": ":p3",
    "names": [ "William Clark" ]
  }
]

As indicated this could also work for composite keys such as with the following query:

select object [{
  "a": ?a,
  "b": ?b,
  "transactions": [{
    "value": ?value,
    "timestamp": ?timestamp
  }]
}]
where {
  ?t a :Transaction ;
     :from ?a ;
     :to ?b ;
     :value ?value ;
     :timestamp ?timestamp .
}

to generate a result such as:

[
  {
    "a": ":p1",
    "b": ":p2",
    "transactions": [
      {"value": 400, "timestamp": "2022-09-07T08:37:41Z"},
      {"value": 800, "timestamp": "2023-05-02T20:17:03Z"}
    ]
  },
  {
    "a": ":p1",
    "b": ":p3",
    "transactions": [
      {"value": 400, "timestamp": "2022-09-08T09:36:44Z"}
    ]
  }
]

Note that in this case it's a bit muddy as here also the key in the object is

Further examples

The following query is an example of this idea:

select object {
  "@id": ?thread,
  "title": ?title,
  "last-posts": [{
    "@id": ?post,
    "title": ?title,
    "author": {
      "@id": ?author
    },
    "timestamp": ?postedOn,
  }],
  "top-profiles": [{
    "@id": ?topAuthor
  }],
  "post-count": ?postCount
}
where
{
    ?thread a :Thread ;
    :title ?title .

    lateral {
        select (count(*) as ?postCount)
        where { ?post :postedIn ?thread }
    }

    lateral {
        select (?title as ?postTitle)
               (?author as ?postAuthor)
        {
            ?post :postedIn ?thread ;
                  :title ?title ;
                  :author ?author ;
                  :postedOn ?postedOn .
        }
        order by desc(?postedOn)
        limit 5
    }

    lateral {
        select (?author as ?topAuthor)
        {
            ?post :postedIn ?thread ;
                  :author ?author .
        }
        group by ?author
        order by desc(count(*))
        limit 3
    }
}

The intent of this query is to query threads in forum with title, the total number of posts in it as well as information on the five latest messages posted in it on the one hand and the top three authors posting in the thread on the other.

Also, I can imagine a syntax where nested sub queries are part of the projection as supported by some SQL database such as MySQL.

An example of the later could be something like:

select object {
  "@id": ?post,
  "title": ( select ?title where { ?post :title ?title } ),

  "thread": (
    select object / construct framed / json / ... {
      "@id": ?thread,
      "title": (select ?title where { ?thread :title ?title })
    }
    where { ?post :postedIn ?thread }
  ),
  
  "author": (
    select object / construct framed / json / ... {
      "@id": ?author,
      "name": (select ?name where { ?author :name ?name }),
      "postCount": (
        select (count(*) as ?count) where { ?author :created [a :Post] }
      ) 
    }
    where { ?post :author ?author }
  )
}

where {
    ?post a :Post ;
          :postedOn ?postedOn .
}

order by desc(?createdOn)
limit 3

The intent of this query is to select the last three posts and retrieve information from the thread it was posted in as well as the author that create it.

Considerations for backward compatibility

None directly with regards to SPARQL.

Output serialisation

It should be considered whether the output of such a query would always imply a JSON serialisation (as e.g. with the JSON syntax from Apache Jena) or that this a more generic object / tree structure which could also map to other serialisation formats.

JSON-compatible formats

Mapping to newline separated JSON, CBOR, YAML or other 'JSON-compatible' formats formats is probably easily supported through content negotiation.

JSON-LD

Compatibility with JSON-LD could be considered, but it was not my intent to address this. JSON-LD compatible output could definitely be produced by something like the proposed solution.

XML

Mapping to XML is perhaps a bit more difficult in the face of root tags / objects and XML namespaces.

@frensjan
Copy link
Author

frensjan commented May 2, 2023

@VladimirAlexiev
Copy link
Contributor

@frensjan Thanks for this comprehensive treatment! I'll discuss with some of my colleagues.

I'll just comment on the output format:

  • Figuring out some RDF expression of the output seems useful.
  • We had a desire to enable JSON-LD interpretation of JSON results from GraphQL queries coming out of the Ontotext Platform, having in mind that the data is fetched from RDF. But we still haven't done it, because there are a lot of devils in the details. Eg what if you use a GraphQL alias, or select a computed field: what RDF prop would you bind to those JSON keys?
  • If we can figure this out, then we should also have a CONSTRUCT form.

@frensjan
Copy link
Author

Thanks @VladimirAlexiev. I think the main 'contribution' of the example is to a) flexibly perform sub-queries and b) generate 'objects' for easy consumption and not be restricted by the tabular / tuple form of SELECT or the RDF form of CONSTRUCT.

I like GraphQL a lot for it's simplicity and flexibility to express (simple) queries. But for the RDF use-cases I'm working on, it's not powerful enough without resorting to a complex schema with lot of 'custom' fields to express computations.

I think the proposed line of object construction could provide a powerful foundation for a GraphQL layer. But it pushes a large chunk of the implementation into the SPARQL layer. In my view in such a way that the interface provides a lot more expressive power.

As for GraphQL and JSON-LD: the proposed can be used to generate JSON-LD, just not 'automatically'. For instance the blog post example can be queried with something like:

select object {
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "@id": ?post,
  "headline": ?headline,
  "alternativeHeadline": ?alternativeHeadline,
  "image": ?image,
  "award": ?award,
  ...
  "author": {
    "@type": "Person",
    "name": ?authorName
  }
}
where {
  ?post a :BlogPosting ;
    :headline ?headline ;
    :alternativeHeadline ?alternativeHeadline ;
    :image ?image ;
    :award ?award ;
    ...
    :author [
      a :Person,
      :name ?authorName
    ] .
}

Perhaps if the verbosity needs to come down, some abbreviation can be introduced. E.g. referencing statement objects (values) through path expressions:

select object {
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "@id": ?post,
  :headline,
  :alternativeHeadline,
  :image,
  :award,
  ...
  "author": {
    "@type": "Person",
    "name": :author/:name
  }
}
where {
  ?post a :BlogPosting
}

Although it's a bit implicit what the focus node is in this case I'd say.

The GraphQL with JSON-LD combination is attractive. But GraphQL is also a little bit restrictive on the other. SELECT queries have a lot of flexibility in terms of what to pull from the database, but the tabular form is restrictive. CONSTRUCT queries allow for more flexibility but being tied to the RDF model is a tad bit restrictive in form but very much limited in terms of integration with other technologies.

As for JSON-LD: I'm not a user, so probably it's a lack of knowledge. I can see it as very convenient to generate from other data environments. Mapping objects in Javascript, Java, python,etc. to JSON-LD is probably pretty easy and useful. Consuming it as RDF is easy if there's a parser, but I don't see much benefit of parsing a JSON-LD input over parsing say a turtle input. JSON-LD output to me is also not that useful unless you guarantee some sort of shape: i.e., if a client can generate code that matches the expected structure, this really helps a developer. E.g. as with OpenAPI / JSON-schema or GraphQL. But I don't see that with just JSON-LD.

@Aklakan
Copy link

Aklakan commented Oct 11, 2024

We have created a GraphQL based approach that generates SPARQL queries and a corresponding mapping of the result set to JSON.

Our design decision was to make the approach field centric, so you start with the GraphQL field and define it with a SPARQL graph pattern (using @pattern) or a bind expression (using @bind). Each field has a set of source variables and a set of target variables. The advantage (and maybe disadvantage?) of the field centric approach is, that there are no separate clauses for the WHERE part and the projection part.
The GraphQL-queries are self-contained - there is no additional server configuration necessary. So this approach can be seen as a GraphQL-based query form of SPARQL.

The following examples should give an expression for how to use this approach. Note that after executing the GraphQL query of the demos, you can click on the Sparql button to navigate to the SPARQL endpoint and view the underlying SPARQL query there. It makes use of the LATERAL keyword and relies on the order of bindings being preserved (not supported by every SPARQL endpoint).

Documentation and demos can be found here.
The system itself is open source and can be tried out with the latest release (v2.0.0-rc1) of our RDF Processing Toolkit.
We are currently working on allowing use of these annotations on the GraphQL schema such that queries can omit all the directives (@pattern, @bind, ...) if the server is preconfigured with the mappings.

Feedback is very welcome.

Example: Movies

The Movie Browser Demo creates JSON from Wikidata movie data and renders it with plain JavaScript. For the sake of example, the source code of the demo is completely framework free. You can e.g. enter Die Hard and then click on the View JSON in Endpoint button or this link.

query movies @debug
  @prefix(map: {
    rdfs: "http://www.w3.org/2000/01/rdf-schema#",
    xsd: "http://www.w3.org/2001/XMLSchema#",
    schema: "http://schema.org/",
    wd: "http://www.wikidata.org/entity/"
    wdt: "http://www.wikidata.org/prop/direct/"
  })
{
  Movies(limit: 10) @pattern(of: "SELECT ?s { ?s wdt:P31 wd:Q11424 . FILTER (exists { ?s rdfs:label ?l . FILTER(langMatches(lang(?l), 'en')) FILTER(CONTAINS(LCASE(STR(?l)), LCASE(''))) }) }") {
    id          @bind(of: "?s")
    label       @one @pattern(of: "?s rdfs:label ?l. FILTER(LANG(?l) = 'en')")
    genres           @pattern(of: "SELECT DISTINCT ?s (STR(?l) AS ?x) { ?s wdt:P136/rdfs:label ?l . FILTER(langMatches(lang(?l), 'en')) }")
  }
}
{
  "data": {
    "Movies": [
      {
        "id": "http://www.wikidata.org/entity/Q1000094",
        "label": "You\u0027re Dead",
        "genres": [
          "comedy film",
          "thriller film"
        ]
      }
    ]
  }
}

Example: Nested subjects, predicates and objects

Demo

query moviesSPO @debug
  @prefix(map: {
    wd: "http://www.wikidata.org/entity/"
    wdt: "http://www.wikidata.org/prop/direct/"
  })
{
  subjects(limit: 10) @pattern(of: "?s1 wdt:P31 wd:Q11424", to: "s1") @index(by: "?s1", oneIf: "true") {
    predicates @pattern(of: "?s2 ?p2 ?o2", from: "s2", to: ["s2", "p2"]) @index(by: "?p2", oneIf: "true") {
      objects @pattern(of: "?s3 ?p3 ?o3", from: ["s3", "p3"], to: "o3")
    }
  }
}
{
  "data": {
    "http://www.wikidata.org/entity/Q1000094": {
      "http://schema.org/description": {
        "objects": [
          "1999 film directed by Andy Hurst"
        ]
      },
      "http://www.w3.org/2000/01/rdf-schema#label": {
        "objects": [
          "You\u0027re Dead"
        ]
      },
      "http://www.wikidata.org/prop/direct/P136": {
        "objects": [
          "http://www.wikidata.org/entity/Q157443",
          "http://www.wikidata.org/entity/Q2484376"
        ]
    }
  }
}

Example: Custom GeoJSON Mapping

A demo for creating custom JSON structures, such as GeoJSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants