You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are looking for possible replacements of easyrdf and hardf was the first one that we came across. However, upon deep testing, while the performance is way faster than easyrdf (from sweetrdf now), not all edge cases are being covered.
What is concerning us is the N-Triples format. The TriGWriter is escaping a limited amount of characters - they need to match
(mainly supported \0 which is the end of stream if I am not mistaken).
However, as soon as I get to a couple of characters after 128, the inserted string is "0" or failing. With easyrdf, it takes way WAY more time to insert the data for large blobs of text, but you know, performance costing integrity is not really performance.
Our use case is that we have this CMS that the user has a WYSIWYG editor where they can paste whatever, meaning that a wrong copy/paste can cause one of these characters to be printed. But in case it is an intended character, we would want to avoid removing it.
My question is, are we missusing this library? Are there known/unknown limits to it? Or an intended philosophy to not consider non-printable/special characters as part of the supported string?
The text was updated successfully, but these errors were encountered:
As far as I can tell your question touches three topics:
N-triples literal characters which have to be escaped. The n-triples specification states that "An N-Triples document is a Unicode UNICODE character string encoded in UTF-8" and that a literal is (skipping the outer quotes for better readability) ([^#x22#x5C#xA#xD] | [ECHAR] | [UCHAR])*. This means that any UTF-8-encoded character other than four defined by the [^#x22#x5C#xA#xD] (double quote, backslash, new line and carriage return) is a valid n-triples literal value. The UCHAR part of the definition allows characters to be also written in an escaped form but it's only an option and, both for performance and readability reasons, it's just not worth to write characters in the unicode-escaped form while serializing an n-triple literal.
N-triples literal characters parsing. The parser must handle all variants described by the specification, so also the unicode-escape syntax. But here we are talking about the writer and as far as I know, the hardf reader supports unicode-escapes.
The SPARQL syntax is not fully in line with n-triples nor turtle (see your other question here).
We are looking for possible replacements of easyrdf and hardf was the first one that we came across. However, upon deep testing, while the performance is way faster than easyrdf (from sweetrdf now), not all edge cases are being covered.
What is concerning us is the N-Triples format. The
TriGWriter
is escaping a limited amount of characters - they need to matchand they are replaced by
However, this leaves a huge list of characters that can make it missbehave.
According to https://www.w3.org/TR/rdf-testcases/#ntrip_strings, many other characters need escaping.
I created a small script that tests just the 255 first characters and the results are not looking good. The script is below
I was able to support all 127 initial characters by altering the escape pattern to
and the replacements array to
(mainly supported \0 which is the end of stream if I am not mistaken).
However, as soon as I get to a couple of characters after 128, the inserted string is "0" or failing. With easyrdf, it takes way WAY more time to insert the data for large blobs of text, but you know, performance costing integrity is not really performance.
Our use case is that we have this CMS that the user has a WYSIWYG editor where they can paste whatever, meaning that a wrong copy/paste can cause one of these characters to be printed. But in case it is an intended character, we would want to avoid removing it.
My question is, are we missusing this library? Are there known/unknown limits to it? Or an intended philosophy to not consider non-printable/special characters as part of the supported string?
The text was updated successfully, but these errors were encountered: