-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain nesting pattern causes oscillation in "sorted" Turtle state #49
Comments
@ajnelson-nist thanks for that, but I was not able to reconstruct your problem.
Then I serialised this outcome again, but I got exactly the same file. |
My apologies, I forgot to give my whole command. It's these lines of the accompanying Makefile: .check-%.ttl: \
%.ttl \
$(RDF_TOOLKIT_JAR)
java -jar $(RDF_TOOLKIT_JAR) \
--inline-blank-nodes \
--source $< \
--source-format turtle \
--target _$@ \
--target-format turtle
mv _$@ $@ The relevant detail I should have included: I am using the |
@ajnelson-nist I have it now - it is odd, we will look into it shortly. |
This was a tough one, but I hope I got to the bottom of the issue you reported. In a nutshell, it is a non-feature, which cannot be fixed without major changes in the serialisation algorithm or perhaps cannot be fixed simpliciter. A longer explanation is that the serialiser sorts an RDF graph by sorting all its triples, and triples are sorted first with respect to their subjects, then with respect to their predicates, then with respect to their objects.
It is 2(ii) that causes the odd behaviour you reported. Now the full explanation is rather convoluted. First take these bnodes:
B. specified in lines 34-37:
Suppose that you need to order/sort them - how can you do this? A similar sorting occurs for bnodes in lines 21-24 and 38-41. Then the serialiser gets to sort bnodes:
D. specified in lines 33-42:
Both bnodes are lists, so the serialiser sorts them on the basis of their members. The first member of D, i.e., bnode B, is sorted before the first member of C, i.e., bnode A, therefore D gets sorted before C. Etc. Finally, the serialiser tries to sort the "top" bnodes:
F. specified in lines 29-45:
They are not lists, but they occur as subjects in three respective triples:
Then the second run of the serialiser performs a similar operation, but this time BFO_0000176 is swapped with BFO_0000186. I suspect that the only way out is use a radically different sorting algorithm, e.g., the one discussed 10.1109/Blockchain55522.2022.00069, but even then we may encounter some odd artefacts. |
Sorry to be necromancing, but this issue just got fixed in turtle-formatter which suffered from it as well. The approach was to listen to the parse process and build a list of blank nodes in the order they are encountered by the parser. This allows you to reproduce the ordering of blank nodes when needed. turtle-formatter is based on jena, so you cannot use the fix applied there, but I think all you have to do is add an |
@fkleedorfer, thanks for your insight. We will look into this in due time, which may not happen soon :(. |
Hello,
I do some work with an open source community that has data maintenance processes that include using RDF Toolkit to normalize Turtle text.
Unfortunately, a few times we have encountered issues with nested blank nodes triggering some kind of incorrect sorting behavior. Most recently, this came up in a SHACL shape for a class that constrains values for two properties in a matching way. The constraint is that the properties' values need to be a class-member of neither class X nor class Y. Implementing this logic in SHACL ends up causing a 4-deep--5-deep reference path of blank nodes. (Alternative SHACL spellings are available using IRI-identified nodes to sidestep this issue with RDF Toolkit, but I do not think that would be an appropriate influence of a normalization tool on data management.)
Something about this structure---probably the blank node subgraphs being isomorphic except for one individual reference in the top blank node, and having an
rdf:List
deeper down---confuses RDF Toolkit's sorting, and causes a reliable oscillation between "sorted" states.Here is a minimally reproducing example Turtle graph, excerpted from this file:
The problem observed is that running the file in this state through RDF Toolkit,
--source-format
and--target-format
both Turtle, causes the two lines withsh:path
in them (valuesobo:BFO_0000176
andobo:BFO_0000186
) to swap places. Running that state through RDF Toolkit returns the original input.I encountered this issue today using Java 18,
rdf-toolkit.jar
v1.14.2
. We've previously encountered this in Java Temurin v11, on the first version (or something close to there in time) ofrdf-toolkit.jar
that had dropped support for Java 8.The text was updated successfully, but these errors were encountered: