Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

granularity of target attachment #66

Open
VladimirAlexiev opened this issue Mar 27, 2024 · 2 comments
Open

granularity of target attachment #66

VladimirAlexiev opened this issue Mar 27, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation working-group

Comments

@VladimirAlexiev
Copy link

VladimirAlexiev commented Mar 27, 2024

Are there real use cases where different triples need to be directed to different targets?
https://kg-construct.github.io/rml-resources/portal/requirements/requirements-io.html doesn't describe such.
What is the value of saving some triples of a subject to a file, and other triples to a SPARQL endpoint?

https://kg-construct.github.io/rml-io/spec/docs/#multiple-targets describes endless combinations of targets attached to different levels, but are they useful?

Furthermore, while a quad can go into multiple targets, the entirety of a quad must go to one target: you cannot store its components <g,s,p,o> and o=<value,lang,datatype> in different targets.

If it's a valid case to specify target per languageMap, is it not also valid to specify it per datatypeMap?

So I question the accuracy of statements like this: https://kg-construct.github.io/rml-io/spec/docs/#language-and-graph-map

All triples containing the language tag en are exported to TargetDump1 and all triples within the named graph ex:Characters are exported to TargetDump2.

Consider the map:

  rml:predicateObjectMap [ a rml:PredicateObjectMap;
    rml:graphMap [ a rml:GraphMap;
      rml:logicalTarget <#TargetDump2>;
      rml:constant ex:Characters;
    ];
    rml:predicateMap [ a rml:PredicateMap;
      rml:constant foaf:name;
    ];
    rml:objectMap [ a rml:ObjectMap;
      rml:reference "name/text()";
      rml:languageMap [
        rml:logicalTarget <#TargetDump1>;
        rml:constant "en";
      ];
    ];
  ];

It makes quads with g=ex:Characters, p=foaf:name, lang=@en.
The graphMap and languageMap set these quad components, they don't test them.
So all these quads go to TargetDump1 and TargetDump2: the quoted sentence is confusing since it implies that different sets of triples go to different targets.

Then wouldn't it be better to put the targets at the predicateObjectMap level to make this more clear?

  rml:predicateObjectMap [ a rml:PredicateObjectMap;
    rml:logicalTarget <#TargetDump1>, <#TargetDump2>;
    rml:graphMap [ a rml:GraphMap; rml:constant ex:Characters; ];
    rml:predicateMap [ a rml:PredicateMap; rml:constant foaf:name; ];
    rml:objectMap [ a rml:ObjectMap;
      rml:reference "name/text()";
      rml:languageMap [rml:constant "en"; ];
    ];
  ];

Last but not least, it should be possible to set the target at the level of TripleMap to cater for the most common case.


In summary, I propose to set targets at TripleMap and predicateObjectMap levels,
but not at subjectMap, predicateMap, objectMap, graphMap, languageMap.

@DylanVanAssche
Copy link
Collaborator

Thanks for the interesting issue! This is the feedback we really want to see :)

Are there real use cases where different triples need to be directed to different targets?

Yes, you can make then materialized views of the RDF graph depending on the different purposes you want to use it for.
Examples: separate by language, store sensitive data in a separate target with higher security level access requirements, etc.

https://kg-construct.github.io/rml-resources/portal/requirements/requirements-io.html doesn't describe such.

We should that make more clear 👍

What is the value of saving some triples of a subject to a file, and other triples to a SPARQL endpoint?
One common use-case is keeping backups: your SPARQL endpoint can be queried live while the file is a backup of the current version in case your infrastructure goes down or you want to exchange your RDF graph as a dump with other parties without putting the pressure on your SPARQL endpoint to dump everything each time the other party needs a new version.

https://kg-construct.github.io/rml-io/spec/docs/#multiple-targets describes endless combinations of targets attached to different levels, but are they useful?

We could reduce the length of the spec there by combining some examples but a spec should list all possible examples to make it complete for developers to have no doubt what should happen in a certain combination IMO.

Furthermore, while a quad can go into multiple targets, the entirety of a quad must go to one target: you cannot store its components <g,s,p,o> and o=<value,lang,datatype> in different targets.

Yes, you always store full RDF quads or triples otherwise you cannot query it later or parse it with existing tools & libraries.

If it's a valid case to specify target per languageMap, is it not also valid to specify it per datatypeMap?

Datatype map is also allowed, this map did not exist before but now it does in Core, we should adjust RML-IO to also explicitly allow it.

the quoted sentence is confusing since it implies that different sets of triples go to different targets.

Okay, good point, we need to improve that sentence then since it confuses.

Then wouldn't it be better to put the targets at the predicateObjectMap level to make this more clear?

Then we have only granularity on a graph level. Then you can only store on the same level as a named graph.

Last but not least, it should be possible to set the target at the level of TripleMap to cater for the most common case.

Why should we have this? Adding it in the Subject Map does exactly this.

@DylanVanAssche DylanVanAssche added the documentation Improvements or additions to documentation label Mar 27, 2024
@VladimirAlexiev
Copy link
Author

Examples: separate by language

A languageMap sets a language (constant or from source data), and it cannot separate by language unless you use some complex conditionals. Is it worth the trouble?

Or consider this real case of graphMap: our CrunchBase RDFization saves the triples from each row of each table into a separate graph to enable SPARQL Update scenarios. That's 12M graphs that come from 18 tables that are RDFized to 15 nodes (Orgs are fed from 3 tables, Persons from 2 tables).
What is a reasonable split into files, and how would you describe it? Target URLs are always constant, they cannot come from the data.

(not TripleMap) Adding it in the Subject Map does exactly this.

  • subjectMap and propertyObjectMap are at the same level, so why would a target set in the former influence the target of the latter?
  • If I set Target1 in subjectMap and Target2 in propertyObjectMap, does Target2 override Target1? Or triples are written to both targets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation working-group
Projects
None yet
Development

No branches or pull requests

2 participants