Generalize prefix over HTTP and HTTPS URIs #156

ddeboer · 2021-11-30T13:44:01Z

Why?

Vocabularies are transitioning from HTTP to HTTPS URIs, for example Schema.org and CreativeCommons. Because the HTTP scheme is – unhappily – part of the URI, this change has implications for SPARQL queries. This problem will become even more widespread in the future when more vocabularies change their HTTP scheme.

When executing SPARQL queries against resources for which it can’t be predicted whether they will be using HTTP or HTTPS URIs, workarounds as well as normalization are necessary in client applications. For example:

perform the same query in a UNION, once with PREFIX schema: <http://schema.org/> and once with PREFIX schema: <https://schema.org/>;
or detect which HTTP scheme is used and adjust the query accordingly.

Previous work

None that I know of.

Proposed solution

Solve the problem generically on the SPARQL-level, so client-side workarounds are no longer necessary. For example SPARQL could accept prefixes without HTTP schema that then work on both HTTP and HTTPS URIs:

PREFIX schema: <schema.org/>

SELECT * WHERE { ?s a schema:Dataset }
# Returns both <https://example.com/resource> a <http://schema.org/Dataset> 
# as well as <https://example.com/resource> a <https://schema.org/Dataset>

Considerations for backward compatibility

The text was updated successfully, but these errors were encountered:

JervenBolleman · 2021-11-30T15:23:35Z

I think this is a very nice usability enhancement. And wonder if this should be a generic adaptation not just to the prefix declaration but to all IRI equality testing or as an special kind of entailment.

namedgraph · 2021-11-30T15:32:11Z

Isn't this a bad practice by schema.org? If so, why should it be normalized?

dbooth-boston · 2021-11-30T16:15:05Z

I agree that this could be viewed as special kind of entailment, but I am very skeptical about using the PREFIX syntax for specifying it.

It would break alignment with Turtle.
Would this proposal treat <http://example.com/foo> as equivalent to <ftp://example.com/foo>? What about <urn:example.com/foo>?
If schema:Dataset were used in an INSERT statement, what URI would be inserted?

If this kind of entailment is desired, I think it would be cleaner to treat it explicitly as a form of entailment, using existing mechanisms for specifying entailment regimes.

ddeboer · 2021-11-30T16:39:43Z

Isn't this a bad practice by schema.org? If so, why should it be normalized?

The problem is not specific to Schema.org, but relevant for all vocabularies etc. that want to migrate their URIs from HTTP to HTTPS.

I agree that this could be viewed as special kind of entailment, but I am very skeptical about using the PREFIX syntax for specifying it.

I agree with the downsides that you mention. Most important to me is solving this issue on the SPARQL-level, not which particular SPARQL solution is picked. So if we forget about the PREFIX approach for now, how would a solution look using entailment? Would that solution be:

concise enough to be usable (something like <https://schema.org/Article> owl:sameAs <http://schema.org/Article>, which would have to repeated for all Schema.org things and properties);
generic enough as long as not all query engines support entailment (e.g. Comunica)?

dbooth-boston · 2021-11-30T16:59:09Z

I have always viewed this problem as part of the usual need to normalize one's data as part of the data intake or ETL process. In other words, normalize those URIs to http: or https: before they are stored into your SPARQL server.

The normalization could also be done within a SPARQL server, using URI pattern matching and rewriting, etc., and storing the normalized result to a separate graph, but the SPARQL code that's needed to do that is a bit messy. URI munging is not SPARQL's strong suit.

JervenBolleman · 2021-11-30T17:03:36Z

For me this is an usability issue. It's easy to forget which ontology dataset uses https and which ones http. Once federating queries it is even harder.

mielvds · 2021-12-01T08:57:26Z

Ideally, this would be fixed on the data intake and ideally, we would use entailment. However, as a query client, you have no guarentees over the dataset or the entailment regime. Also, entailment is a rather complex way to solve such a common issue and it can yield results that are suprising to the client ("how did these URIs get in here? They are nowhere in my query."). You definitely can have both of course. So I agree with @JervenBolleman: this is about improving usability for the one who's writing the query.

I wonder whether you could have something like a UNION PREFIX similar to what graphql has for types? UNION PREFIX s: <https://schema.org/> | <http://schema.org/>

rubensworks · 2021-12-01T10:05:27Z

I agree this is a usability issue that should be solved somehow, but I'm not a big fan of solutions that are based on modifying the query syntax (for the reasons listed by @dbooth-boston).

If I understand correctly, the suggested PREFIX extensions would only be able to cope with prefixed URLs defined in the query, but not within the dataset.
E.g., the following query would not produce the expected result if ?type in endpoint 1 is https://schema.org/Dataset and in endpoint 2 http://schema.org/Dataset:

PREFIX schema: <schema.org/>
SELECT * WHERE {
  SERVICE <urn:endpoint1> { ?s a ?type }
  SERVICE <urn:endpoint2> { ?s a ?type }
}

I think introducing a dedicated (and lightweight) entailment regime might be acceptable for this. Especially since the implementation of this feature will require entailment in any case.

afs · 2021-12-02T11:06:19Z

I agree that handling it at data ingestion and in implementation feature is a better route. (The relative URI syntax is already legal!)

I'm also not keen on addressing migration issues as a permanent feature of the language.

What would be good is a "practice and experience" note.

VladimirAlexiev · 2022-02-17T08:02:52Z

@JervenBolleman makes a very important point #156 (comment): this is only one aspect of IRI equality testing. Sadly, the same IRI written with and without percent-encoding is neither equal nor equivalent:

select (?iri1=?iri2 as ?equal) (sameTerm(?iri1,?iri2) as ?same) {
    values (?iri1 ?iri2) {(<urn:foo%2Dbar> <urn:foo-bar>)}
}

Most modern websites redirect http to https, for any resource. I think this is the good behavior.

I think that schema.org gives a mixed signal by promoting https variants of their semantic terms.
They have 2 versions of their ontology, but only an https version of their context.

But no matter this mixed signal, thousands of website admins will use https in their data, and thousands more will use http.
So the problem @ddeboer raised is legitimate and important.

VladimirAlexiev · 2022-05-06T13:34:28Z

#158: IANA rebukes coap*

CoAP registers different URI schemes for accessing CoAP resources via different protocols. This approach runs counter to the WWW principle that a URI identifies a resource and that multiple URIs for identifying the same resource should be avoided

Curiously, it fails to render such rebuke for http: vs https:

JervenBolleman added the enhancement New feature or request label Nov 30, 2021

JervenBolleman added the entailment SPARQL entailment regimes label Nov 30, 2021

coret mentioned this issue Apr 26, 2022

Geonames ontology URI netwerk-digitaal-erfgoed/geonames-harvester#1

Closed

rubensworks mentioned this issue Jun 9, 2022

Schema.org http/https alignment SolidLabResearch/Challenges#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize prefix over HTTP and HTTPS URIs #156

Generalize prefix over HTTP and HTTPS URIs #156

ddeboer commented Nov 30, 2021 •

edited

Loading

JervenBolleman commented Nov 30, 2021

namedgraph commented Nov 30, 2021

dbooth-boston commented Nov 30, 2021

ddeboer commented Nov 30, 2021

dbooth-boston commented Nov 30, 2021

JervenBolleman commented Nov 30, 2021

mielvds commented Dec 1, 2021

rubensworks commented Dec 1, 2021

afs commented Dec 2, 2021

VladimirAlexiev commented Feb 17, 2022

VladimirAlexiev commented May 6, 2022

Generalize prefix over HTTP and HTTPS URIs #156

Generalize prefix over HTTP and HTTPS URIs #156

Comments

ddeboer commented Nov 30, 2021 • edited Loading

Why?

Previous work

Proposed solution

Considerations for backward compatibility

JervenBolleman commented Nov 30, 2021

namedgraph commented Nov 30, 2021

dbooth-boston commented Nov 30, 2021

ddeboer commented Nov 30, 2021

dbooth-boston commented Nov 30, 2021

JervenBolleman commented Nov 30, 2021

mielvds commented Dec 1, 2021

rubensworks commented Dec 1, 2021

afs commented Dec 2, 2021

VladimirAlexiev commented Feb 17, 2022

VladimirAlexiev commented May 6, 2022

ddeboer commented Nov 30, 2021 •

edited

Loading