Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use labels instead of URIs #159

Open
mielvds opened this issue Apr 7, 2022 · 7 comments
Open

Use labels instead of URIs #159

mielvds opened this issue Apr 7, 2022 · 7 comments

Comments

@mielvds
Copy link

mielvds commented Apr 7, 2022

Disclaimer: This is probably a sketchy idea and it only serves the purpose of a better UX (but I think SPARQL could use some of that). I don't really have a good solution in mind and I also admit that there might not be anything we can properly do. But if that gets a syntax discussion going, I'm happy.

Why?

Because writing queries using opaque URIs is hard, but more importantly: not popular. Wikidata queries are the perfect example:

    SELECT (MAX(?population) AS ?population) ?country WHERE {
      ?city wdt:P31/wdt:P279* wd:Q515 .
      ?city wdt:P1082 ?population .
      ?city wdt:P17 ?country .
    }

Previous work

Previous work would include basically any other query language: SQL, MongoQL or even Cypher where you can just use labels (yes yes, they don't have globally unique identifiers and all that).

Proposed solution

LABEL rdfs:label@en
SELECT * {
?city [country] ?country .
}

would translate to

SELECT * {
?city ?p ?country.
?p rdfs:label "country".
}

Considerations for backward compatibility

I would stick to syntactic sugar for 1.1.

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Apr 7, 2022

@mielvds I see your point but the proposal is half cooked:

  • there may be many things labeled "country"
  • brackets are used in CURIEs for a similar purpose, but brackets in SPARQL are blank nodes, so you can't use them

I'd rather use locally defined names, something like

ALIAS country wdt:P17
ALIAS population wdt:P1082
SELECT * {
  ?city [country] ?country; 
     [population] ?population
}

But in WD, each Pnnnn is represented with a coordinated bunch of props in 6 namespaces. Eg to get population at point in time:

SELECT * {
  ?city p:P1082 [ps:P1082 ?population; pq:P585 ?time]
}

So with the "alias" approach I'd have to go like this:

ALIAS population_direct wdt:P1082
ALIAS population_stmt p:P1082
ALIAS population_main ps:P1082
ALIAS pointInTime_qualifier pq:P585
SELECT * {
  ?city [population_stmt] [
    [population_main] ?population;
    [pointInTime_qualifier] ?time
  ]

It doesn't seem better to me.


BTW writing WD queries is surprisingly non-painful because

  • there's autocompletion "wdt:country" -> wdt:P17 and "wd:Bulgaria" -> "wd:Q219". Since the ranking is very good, it works very well
  • there's readout on hover of both Pnnn and Qnnn
  • if you're writing shapes, there's a SHEX editor that displays dynamic comments as a readout, eg
  wdt:P17 wd:Q219 # country: Bulgaria

@mielvds
Copy link
Author

mielvds commented Apr 7, 2022

@VladimirAlexiev thnaks for your thoughts!

@mielvds I see your point but the proposal is half cooked:

like I said: sketchy ;)

* there may be many things labeled "country"

absolutely

* brackets are used in CURIEs for a similar purpose, but brackets in SPARQL are blank nodes, so you can't use them

It was just for illustration, but correct, it would have to be something else

I'd rather use locally defined names, something like

ALIAS country wdt:P17
ALIAS population wdt:P1082
SELECT * {
  ?city [country] ?country; 
     [population] ?population
}

But in WD, each Pnnnn is represented with a coordinated bunch of props in 6 namespaces. Eg to get population at point in time:

SELECT * {
  ?city p:P1082 [ps:P1082 ?population; pq:P585 ?time]
}

So with the "alias" approach I'd have to go like this:

ALIAS population_direct wdt:P1082
ALIAS population_stmt p:P1082
ALIAS population_main ps:P1082
ALIAS pointInTime_qualifier pq:P585
SELECT * {
  ?city [population_stmt] [
    [population_main] ?population;
    [pointInTime_qualifier] ?time
  ]

It doesn't seem better to me.

Yeah that wouldn't help much. But this does trigger the idea of being able to publish a public alias config, very much like the JSON-LD config.

http://example.org/aliases

{
  "population_direct": "wdt:P1082"
 "population_stmt": "p:P1082"
 "population_main": "ps:P1082"
 "pointInTime_qualifier": "pq:P585"
}
  CONTEXT <http://example.org/aliases>
  SELECT * {
    ?city [population_stmt] [
      [population_main] ?population;
      [pointInTime_qualifier] ?time
    ]

But I'm probably taking it too far :)

BTW writing WD queries is surprisingly non-painful because

* there's autocompletion "wdt:country" -> wdt:P17 and "wd:Bulgaria" -> "wd:Q219". Since the ranking is very good, it works very well

* there's readout on hover of both Pnnn and Qnnn

Sure that helps, but it's only available in the WD editor.

* if you're writing shapes, there's a SHEX editor that displays dynamic comments as a readout, eg
  wdt:P17 wd:Q219 # country: Bulgaria

Now that principle could perhaps be more of a first-class citizen in SPARQL

@jmkeil
Copy link

jmkeil commented Apr 7, 2022

Instead of extending the syntax for aliases, one can just use prefixes for that.

PREFIX name: <http://www.w3.org/2000/01/rdf-schema#label>

SELECT ?resource
WHERE { ?resource name: "the label". }

Not saying that this should be considered as good practice. But there are cases, this can ease reading/writing:

PREFIX key: <http://example.org/parameterName>
PREFIX value: <http://example.org/parameterValue>

SELECT ?resource
WHERE { ?resource <http://example.org/parameters> [key: "a"; value: 1], [key: "b"; value: 2] . }

@afs
Copy link
Collaborator

afs commented Apr 8, 2022

My understanding is that WikiData purposely decouples the appear of the URI from its natural language form because that is dependent on the user. (Is this style found elsewhere?)

At the point of the writing the query (UI etc) that the mapping to abstract URIs happens because it is context/user sensitive.

It takes a number of components, not just SPARQL to have an end-user application.

@jmkeil
Copy link

jmkeil commented Apr 8, 2022

Is this style found elsewhere?

Yes. The OBO Foundry uses this style, too. Except of the harder reading/writing without UI support, it has several advantages for the re-use and maintenance of large multi-lingual datasets (see 1c in [1]).

@VladimirAlexiev
Copy link
Contributor

To elaborate on @jmkeil :

  • pretty much all life science and bio ontologies, including plants etc
  • most ontologies extending BFO
  • the RDA library data ontology

@ericprud
Copy link
Member

ShEx.js has a backtick extension which uses a LABEL directive to specify how to resolve backticked labels, e.g.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX ex1: <http://ex1.example/>
PREFIX ex2: <http://ex2.example/>

LABEL [ rdfs:label skos:label ]
<S> {
  ex1:`protein name` LITERAL;
  ex2:`protein type` [ `signaling` `regulatory` `transport` ];
  `protein width` `ucum microns`
}

The LABEL directive specifies an ordered list of predicates which identify the node to substitute into the schema in place of the backticked label. Try it by clicking on the protein record button in this manifest and selecting some passing data.

ShExC has "LABEL [rdfs:label]" and later on "`transport`" (i'm in escaping hell here).
The metadata graph has: ex1:Signaling rdfs:label "signaling" so of course the resulting term is ex1:Signaling.

Prefixing a backtick (e.g. "ex1:`protein name`") restricts to those terms which include that prefix's namespace URL.

Because ShEx is defined in terms of a JSON structure (ShExJ), this isn't really part of the ShEx language, more of a parser trick. (I'f you comment out (#) or delete the Query Map and click validate, you'll see "predicate": "http://ex2.example/protType" , which means the backtick information is lost.) This is probably not an issue for SPARQL but would require consideration for something like SPIN.

This feature hasn't seen much use in wikidata, probably because the community that's zealous about numeric entity identifiers is the same community that's maintaining the schemas. That said, it's easy to imagine other folks who work with wikidata, OBO, SNOMED, etc wanting to favor {read,type}ability over internationalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants