Skip to content

Deviations from the charter

Ivan Herman edited this page Apr 17, 2015 · 3 revisions

For reference: charter of the CSVW Working Group.

Mapping to XML

The charter includes a mapping of the CSV data onto JSON, RDF, and XML. The Working Group has produced the CSV to JSON Mapping (CSV2JSON) and the CSV to RDF Mapping (CSV2RDF) documents, but no XML mapping has been defined. Although the Working Group has repeatedly contacted experts from the XML community to join the work, that led to no results. As a consequence, with the lack of the right persons on the group (which seemed to suggest a lack of interest in the area) that work has been abandoned.

Relationships to the Path expressions to CSV

The charter of the CSVW Working Group includes the following:

The XForms 2.0 draft specification includes the definition of XPath expressions for CSV; the Working Group will consider whether this fulfills the for an XML requirements or whether a separate specification is indeed necessary.

Abandoning the CSV to XML mapping made this issue moot. (Also, that work seems to have stopped, there has been no update on XForms 2.0 since 2012.)

Relationship to R2RML and RDF Direct mapping

The charter of the CSVW Working Group includes the following:

The output of the mapping mechanism for RDF MUST be consistent with either the RDF Direct Mapping or R2RML so that if a table from a relational database is exported as CSV and then mapped it produces semantically identical data.

This requirements have been fulfilled for the RDF Direct mapping with slight deviations from the original requirements; see below. Although R2RML was also considered (see below) that line of work has been abandoned.

Relationship to the RDF Direct Mapping

For single CSV files, the approach taken by the CSV to RDF Mapping (CSV2RDF) document is very close to the Direct Mapping; it follows essentially the same structure. However, there is a subtle difference: whilst the Direct Mapping relies on the information it can extract from the Relational Schema of the table in the enclosing database, such information is, by default, not available for CSV2RDF. The more precise relationship, therefore, between the Direct Mapping and CSV2RDF can be summarized as: it is possible, for each CSV table, to provide a minimal metadata such that the output of the CSV2RDF conversion is semantically equivalent to the output of the RDF Direct mapping in case that CSV table is part of a Database. See the separate page for an example.

(Note that it would probably be possible to produce such a metadata automatically from a Relational Schema, but the definition of such a mapping is beyond the Working Group's charter).

The situation is a collection of interrelated tables in the database (corresponding to multiple but interrelated CSV files) is used. For simple cases, when foreign keys in the database are used to directly refer to unique keys in another table, the statement above still holds; see the separate example for such a case. However, when the correspondence among database tables use, for example, candidate keys, the Direct Mapping results cannot be reproduced in CSV2RDF (see a problematic example for such a situation). The underlying problem is that the RDF Direct Mapping can access several tables easily in parallel within the database. However, the situation with CSV files is different: CSV tables are typically single and, potentially, very large files, meaning that a CSV processor cannot be expected to handle several CSV tables in parallel.

Relationship to R2RML

The group has considered several different avenues for the RDF and JSON mappings, and that included an experimentation with the MMLab's RML Processor that was primarily aimed at extending R2RML to CSV files. After a discussion at the October 2014 Face-to-face meeting the group decided to abandon this approach. The main reasons were:

  • R2RML is, in fact, a complex language, whose usage may be at odds with the expected audience for the output of this group: publishers of CSV files whose knowledge of RDF, and of programming languages in general, may be limited.
  • The R2RML language itself should have been changed. Some features (like references to SQL) should have been removed, features (to handle various CSV variants) should have been added; the complexity of that work would have gone beyond the allocated charter duration for the group.
  • R2RM is intimately bound to RDF, whereas the charter of the group included conversion to JSON as well. I.e., the work on the conversion would have been duplicated, again an extra hurdle on the group's resources and timeline. The final approach reduced the differences between RDF and JSON to a minimum, making the overall structure of the recommendations more streamlined.

Note that the Metadata Vocabulary specification includes a "hook" called transformations where implementations may add references to further transformation engines (either on the raw data or on the data converted to RDF or JSON by standard means). A specialized R2RML processor could be used with the CSV data through that hook.