-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathfinding/More Expressive Path Queries #191
Comments
GraphDB's implementation is described at https://graphdb.ontotext.com/documentation/10.3/graph-path-search.html.
Eg the "Kevin Bacon" query looks like this:
Results are returned as RDF-star triples:
@mgberg We'll appreciate feedback on whether you like our approach, and ideas for potential improvement. |
@VladimirAlexiev That approach is fine given the current SPARQL 1.1 specification, but that's just an Ontotext specific feature that takes advantage of federation to inject functionality that SPARQL doesn't have right now. Other vendors and systems also use federation in similar ways to add similar capabilities. However, one of the reasons SPARQL is so powerful is that all triple stores implement the same query language specification (at a minimum), which is one of the reasons why it is so easy to query across disparate datasets (which may be in different triple store implementations) using SPARQL. The point of this feature request is explicitly to make pathfinding queries a part of the SPARQL 1.2 specification (or another future version), which would likely involve adding new keywords for simplicity and expressiveness. This would make SPARQL more useful and powerful for everyone, everywhere, on all SPARQL processors compliant with that new version, all the time. |
Can you point to and briefly describe the features in other implementations you know? |
Linking to #42 which was the first mention of PATH IIRC |
Not completely sure if I get the syntax but it looks like AllegroGraph provides at least some PATH like search as well https://franz.com/agraph/support/documentation/current/magic-properties.html#Paths There are other graph algorithms here: https://franz.com/agraph/support/documentation/current/magic-properties.html#magic-prop-list |
There was Cray Urika/Cray Graph Engine, at least before HP bought Cray. From what I could find it looks like this is discontinued by now. Interestingly, they had quite a lot of graph functions in it and it was integrated in SPARQL as well, even though it looks like it was quite heavily extended:
|
IIRC @rvesse worked for Cray, maybe he has some experience to share? |
A quick comment about AllegroGraph's implementation- I've used it, it works. The part that I don't like (as far as an extension to the SPARQL spec is concerned) is that you first have to define a generator, which is the triple pattern used to determine the next set of "neighbors" in the path, as an RDF resource, store it in the triple store, and reference it when using a pathfinding magic property. In other words, to do path queries, you either need to have write access to the database to define whatever generator you need, or hope that one matching your query has already been defined. One of the reasons I used the Stardog path query syntax as an example is because you provide your predicate or pattern in the query (the Admittedly the syntax was a bit odd to me when I first saw it, but once I learned how it worked it made a lot of sense and I was very excited by the capability it provided. |
Yeah it was a giant syntactic kludge to work within the confines of a SPARQL parser. The only path like algorithms we had were simply for calculating the length of the path between two nodes in the graph, we didn't do anything with returning intermediate nodes. We never actually supported SPARQL 1.1 property paths fully (see CGE Property Path Support) we effectively just flattened simple paths, i.e. fixed length paths, into their equivalent BGPs, and we optionally provided emulation of more complex paths by rewriting them into fixed length expansions of the path, i.e. exploring a path up to N steps where N was configurable. None of our customers ever wanted/needed full property path support. This issue probably has some relation to #6 because you're effectively wanting to call a "function" that will return multiple values for each input row where that variable length list of values denotes the discovered path |
+1 to connections to #6. |
As an additional note- this kind of capability would enable a lot of use cases that SPARQL doesn't handle very well involving "recursive" graph traversal operations that are often solved by magic properties or postprocessing the output of a CONSTRUCT query, such as:
|
Why?
There are several reasons a built-in pathfinding query capability would be useful. For example:
Previous work
There are several vendors/systems that implement pathfindng/more expressive path queries, and they do so in a number of different ways. For example, some extend the SPARQL query syntax, while others take advantage of SPARQL's federation capabilities to create a special service URI that performs the pathfinding operations.
Stardog has (in my opinion) a fantastic implementation of this but with (at least) one flaw. Their implementation can do both pathfinding and more expressive path queries, and is expressed in a very intuitive and SPARQL-friendly way (again, in my opinion). Paths can be provided by either a single predicate, a property path, or a SPARQL query expression. Start and/or end nodes can either be bound or unbound. The documentation can be found here: https://docs.stardog.com/query-stardog/path-queries
The one flaw that sticks out to me is that the only length metric they support is the number of "hops" in the path. If the path is specified by a SPARQL pattern, it would be nice to be able to project out a third variable specifying the value to use for the "length" of that connection. This would enable it to optionally behave more like pathfinding function in a property graph where the length metric is an edge property. I mentioned it in a post in the Stardog Community here: https://community.stardog.com/t/specify-path-length-in-path-queries/4583
Below is an example copied from the linked documentation above- see the documentation for more examples.
There is an implicit relationship between actors based on the movies they appeared together. We can use a basic graph pattern with multiple triple patterns in the path query to extract this information:
PATHS START ?x = :Kevin_Bacon END ?y = :Robert_Redford VIA { ?movie a :Film ; :starring ?x , ?y }
Note that this query filters the intermediate nodes; only the connections via movies are desired, not TV series or other types of works.
This query might return:
If the filter on movies is irrelevant, then a more concise version can be used:
PATHS START ?x = :Kevin_Bacon END ?y = :Robert_Redford VIA ^:starring/:starring
Considerations for backward compatibility
The primary issue with backward compatibility I can think of is that it would require extending the SPARQL result format, and likely intermediate query engine internal representations, as this requires storing and returning lists and/or objects of results, or subtables of results, as a first class citizen. If that was supported, then this would be easier. I made an issue about this a while back (#151), but there are other similar feature requests too.
The text was updated successfully, but these errors were encountered: