-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARQL CONSTRUCT support #42
Comments
Yes! We (me and @pmaria) spotted this a few ago. In this example, we should have used |
CONSTRUCT cannot have SPARQL CSV Results, fix that. Contributes to #42
Fixed the |
I updated the title to reflect better the important issue here: SPARQL CONSTRUCT support. Any suggestions on how to add that, would be highly appreciated! CC: @pmaria |
Do we really need to accept construct queries? Are we accepting ASK, and DESCRIBE as well? Construct already retrieves RDF, and we would need to define a reference formulation for RDF. Do we want to open that box? Where is the use-case/necessity? |
If we allow SPARQL descriptions, we cannot restrict the type of queries. Whatever the SPARQL descriptions' recommendation allows is a potential entry. |
Why is that? Could we not define the details of a reference formulation where we stipulate that e.g. only SELECT and ASK queries are allowed? |
The reference formulation refers to how we access the data which is available in a logical source. How the data in the Logical Source were retrieved is beyond the scope of the Reference Formulation. In this case, all what we say is that the data of the Logical Source is retrieved from a SPARQL endpoint which is described via a SPARQL description. If we do not want all data from the SPARQL endpoint, we may define a query but in this case, the query is supposed to be any SPARQL query. One can say in an implementation that I support only SELECT and ASK queries but this is beyond RML. The Reference Formulation would only tell you how to process the data after the SELECT or ASK query but it would not indicate what the results of the query would be or the format of the data in the Logical Source. |
I respectfully disagree. Any reference formulation should define:
If, for SPARQL, we can do this all in 1 reference formulation, that's great. But, currently it is quite unclear how that would work. |
I respectfully agree with all what you say but none of what you say has anything to do with the SPARQL query but with all what comes after the SPARQL query. Either you have a SPARQL query or not, what you say should be defined in a reference formulation, we do not disagree on this. But how we fetch the data from a data source is independent of the reference formulation. If one used a SPARQL query or not to retrieve a set of RDF triples is independent of how one refers to these triples. If one used a SELECT query to retrieve some CSV results, then the reference formulation refers to these CSV results and not to the SPARQL query. |
This is where we are disagreeing. How we are fetching data from a source (the |
The How an iteration is computed is indeed part of the reference formulation, but that is something different than how the data is retrieved (the SPARQL query in this case). What an iteration returns (independently of which reference formulation we use) is a set of key-value pairs that RML can then consider to create the RDF triples of each iteration. |
I must still disagree with this statement. Take these examples rml:referenceFormulation rml:XPath;
rml:iterator "/xpath/iterator/expression"; rml:referenceFormulation rml:JSONPath;
rml:iterator "$.jsonpath.expression"; rml:iterator "SELECT * FROM student;";
rml:referenceFormulation rml:SQL2008Query; rml:iterator "SELECT { s? ?p ?o } WHERE { ?s ?p ?o. } LIMIT 100";
rml:referenceFormulation formats:SPARQL_Results_CSV; All these iterator expressions are specifying what data is to be considered part of the iteration. This includes how we fetch the data. There are also logical sources where an iterator is not necessary, since there is a natural way to form a logical iteration on those sources, like with CSV. But again, here, how we iterate is determined by the rules of the reference formulation. This discussion is an example of why we need more clarity on the definition of the reference formulations. The same question can be asked for SPARQL CONSTRUCT queries, to bring it back to this issue.
I do not believe that we can say that an iteration always returns key-value pairs. This is very much dependent on the reference formulation and how reference expression are to be evaluated against the logical iterations in that reference formulation. I would say an iteration returns a logical iteration where each iteration is a sub-part of the source on which reference expressions can be evaluated to return values from the source data. How this works exactly should also be defined in the reference formulation. For example, in the The point is that every reference formulation needs to define these specific aspects. |
EDIT: Pano already provided a more detailed answer.
At the moment it looks a bit mixed to me. Consider the following example from the IO spec:
To me, here the iterator is stating how the data is actually fetched. Maybe there is something I misunderstood? |
This use if |
Ah, I think you are referring to |
ok I missed this, is it too late that I disagree on this use of the the iterator was meant to indicate how we "traverse" the data, how we iterate over the data we have, it's a pattern that repeats in the data. What pattern can you have if the iterator is a query? |
The way an iterator is currently used is as followed:
This was discussed during a W3C CG meeting and was accepted.
|
You could argue that the iterator was always a query. For example an XPATH expression is basically a query on a document. Same goes for JSONPath. So the distinction between iterator and query was always questionable, as is argued in #28. It is the reference formulation that determines whether the result of the expression/query is iterable, and how it should be iterated. |
Which brings us back to the original issue. It is unclear to me how an "iteration" would be performed for Maybe we could argue that for Turtle and other formats we do not really have a reference formulation, that is, a query language that we can use to access file elements. But I am unsure. |
this is not 100% correct. An iterator in the case of tables in R2RML is not a query. It just happens that the query language is used as the reference formulation for some cases and that's why the iterator may be a query. However, having an SQL query as an iterator, that would return a table, this is not an iteration pattern or well, it is, but it has only 1 iteration, the complete table because this table does not repeat within a table.
this is correct. I think when we proposed RDF as input for the first time, we considered the iteration pattern to be every triple. That was never mentioned anywhere nor specified. If we accept RDF as potential input, then we need a reference formulation to refer to the RDF triples/quads and that reference formulation would also give us the iteration pattern. |
It may have not been intended in that way, but I believe the JSONPath and XPath reference formulations were the only defined reference formulations for which the iterator was actually specified in the mappings in earlier versions of RML. Another way to look at it: we could replace |
Consider example in specification:
Since the iterator uses a
CONSTRUCT
, the reference formulation format cannot beSPARQL_Results_CSV
. Suggest either using a SPARQLSELECT
form, or change the reference formulation format(?).The text was updated successfully, but these errors were encountered: