Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardizing on XSD 1.1 datatypes #93

Open
ewg118 opened this issue May 28, 2019 · 10 comments
Open

Standardizing on XSD 1.1 datatypes #93

ewg118 opened this issue May 28, 2019 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@ewg118
Copy link

ewg118 commented May 28, 2019

Why?

As someone who works with BC dates with much of our material, there's a significant disconnect between the ISO standards for date encoding and XSD 1.0, specifically with respect to individual software implementations of SPARQL.

For example:

ISO 8601 dictates that AD 1 is "0001" and 1 BC is "0000". 2 BC is "-0001". The gYear in XSD 1.0, however, does not use "0000", but rather 1 BC is "-0001". I have been using Apache Fuseki for the last 6-7 years, and it has remained compliant to XSD 1.0, If you attempt to post RDF with the xsd:gYear datatype with a value of "0000", an error will result. I do not know what other SPARQL endpoints do.

Adding to the confusion, XSD 1.1 adheres more closely to ISO 8601, and therefore "0000"^^xsd:gYear is 1 BC. So if you are not sure which version of XSD the endpoint you are extracting data from uses, you are left wondering whether the value you are getting is 300 or 299 BC. Then, if you want to visualize these dates in Javascript (which follows ISO 8601), you have add 1 to your result (assuming XSD 1.0) so that the visualization corresponds to the actual year.

See discussions d3/d3-time-format#28 and sosol/sosol#114

Previous work

I am not sure to what extent XSD 1.1 compliance has been discussed within the SPARQL community. This issue primarily affects BC dates, and so it applies only a small segment of our community.

Proposed solution

Making matters more complicated is the fact that the W3C XSD group elected to use the same namespace for XSD 1.1 (https://www.w3.org/TR/xmlschema11-1/#xsd-nss).

However, they did declare a full list of URIs for various schema versions (https://www.w3.org/TR/xmlschema11-1/#nonnormative-language-ids). In this list is a URI for XML Schema Definition Language 1.1: http://www.w3.org/XML/XMLSchema/v1.1

Perhaps the solution is to assume the default namespace of http://www.w3.org/XML/XMLSchema is 1.0 and 1.1 should be explicitly declared with the above URI. If the versions of XSD dates can be differentiated by namespace, it would be possible to perform some basic math upon ingestion so that all dates are 1.1 compliant when conducting SPARQL queries.

But the SPARQL 1.2 community does need to make a decision on fully supporting XSD 1.1 datatypes or not so that software implementations can apply these recommendations consistently across all platforms.

Considerations for backward compatibility

Without a separate namespace to differentiate between datatype versions, per-project/endpoint documentation seems like the only way to ensure backward compatibility.

@ewg118
Copy link
Author

ewg118 commented May 28, 2019

Somewhat related to #32 and #55

@JervenBolleman
Copy link
Collaborator

This is a very good question and I think it was answered in part in section datatypes of RDF1.1 which says RDF1.1 matches XSD 1.1.

Which could mean that we just need to tell everyone implementing RDF1.1 that they need to update their functions to match. Considering RDF 1.1. @afs can you remember if this was discussed at the time in either WG?

@JervenBolleman JervenBolleman added the help wanted Extra attention is needed label May 28, 2019
@VladimirAlexiev
Copy link
Contributor

Live another day, learn another lesson (and they are mostly painful ;-)

It'd be easy for a sparql endpoint to report this in SPARQL Service Description (SD), we just need to agree a Feature URL for it. But the implementation level of SD is not so high... eg GraphDB doesn't support it.

@JervenBolleman
Copy link
Collaborator

I don't think a feature flag is sufficient. Mostly because in RDF 1.0 to 1.1 the meaning of the literal "0000"^^xsd:gYear changed in meaning. That means for this community dealing with historical data they have one more problem :( i.e. "-1" xsd 1.0 == "-2" xsd 1.1. this something wider than the SPARQL endpoints. And I rather have us all upgrade to the XSD1.1. semantics including in our functions than to have this weird half/half situation.

@ericprud
Copy link
Member

That retroactively changes the meaning of "0000"^^xsd:gYear. I think it's worth a little outreach on [email protected] to see who's going to get (more) screwed by this, but I basically support this. This decision was made unconsciously (as best I recall) by the SPARQL 1.1 WG ~7 years ago and now we just have to find the best way to clean up the unintended consequences.

@lisp
Copy link
Contributor

lisp commented May 29, 2019

@ewg118 : would it be possible for you to construct a set of discriminating tests?

@afs
Copy link
Collaborator

afs commented May 29, 2019

@afs can you remember if this was discussed at the time in either WG?

xsd:gYear is not a supported datatype so the question is a bit "N/A".

I agree with @ericprud that some outreach would be a good step though first collect all changes for "dependency upgrades" into one place on the wiki and review together.

Would someone like to offer to make that wiki page happen?

It's not a optional flag issue - it's about the data, not the engine. Feed the same data into two engines and get different results.

Jena makes "0000"^^xsd:gYear illegal using a parser in the Jena codebase (the code is extract from Apache Xerces) but there is a related issues because calculations use javax.xml.datatype.XMLGregeorianCalendar is on XSD 1.0.

Maybe other language ecosystems are more up-to-date.

Before defining a whole new set of datatypes and associated functions, we need to think though the implementation impact on engines of all engineering resource levels. If that means writing the arithmetic for much of F&O then that's not a small thing.

(Jena has all of XSD and F&O that apply. This issue would be a point fix ... @ewg118 - send in a patch!)

@JervenBolleman
Copy link
Collaborator

@afs for java you can move to the java.time packages which are nicer ;)
This applies to the other datetime strings as well so it is not just xsd:gYear.

@afs
Copy link
Collaborator

afs commented May 30, 2019

java.time is nicer but it isn't so XML/F&O centric.

@afs
Copy link
Collaborator

afs commented Jun 1, 2019

Is this the set of datatypes we are discussing:
The ones to include are datatypes under xs:anyAtomicType which are not XML-related.

Primitive datatypes
https://www.w3.org/TR/xmlschema11-2/#built-in-primitive-datatypes
without xs:QName and xs:NOTATION

and the derived types of decimal, duration:
https://www.w3.org/TR/xmlschema11-2/#ordinary-built-ins

I'm unclear xs:normalizedString - it seems to be a base for XML-related derived types. Is it useful otherwise?

Maybe mention https://www.w3.org/TR/xsd-precisionDecimal/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants