title | date | authors | tags | ||||||
---|---|---|---|---|---|---|---|---|---|
A Short Case Study Involving Table Schema Frictionless Specs at the European Union |
2021-06-22 |
|
|
The Frictionless specifications are helping with simplifying data validation for applications in production at the European Union. More specifically, Costas Simatos introduced the Frictionless Data community to the Interoperability Test Bed (ITB), an online platform that can be used to test systems against technical specifications --- curious minds will find a recording of his presentation on the subject available on YouTube. Amongst the tools it offers, there is a CSV validator which relies on the Table Schema specifications. Those specifications filled a gap that the RFC 4180 didn't address by having a structured way of defining the content of individual fields in terms of data types, formats and constraints, which is a clear benefit of the Frictionless specifications as reported back in 2020 when a beta version of the CSV validator was launched.
Photo by Clark Van Der Beken on Unsplash
Frictionless specifications are flexible while allowing users to define unambiguously the expected content of a given field, therefore they were officially adopted to realise the validator for the Kohesio pilot phase of 2014-2020, Kohesio being the "Project Information Portal for Cohesion Policy". The Table Schema specifications made it easy and convenient for the Interoperability Test Bed to establish constraints and describe the data to be validated in a concise way based on an initial set of CSV syntax rules, converting written and mostly non-technical definitions to their Frictionless equivalent. Using simple JSON objects, Frictionless specifications allowed the ITB to enforce data validation in multiple ways as can be observed from the schema used for the CSV validator. The following list of items calls attention to the core aspects of the Table Schema standard that were taken advantage of:
- Dates can be defined with string formatting (e.g.
%d/%m/%Y
stands forday/month/year
); - Constraints can indicate whether a column can contain empty values or not;
- Constraints can also specify a valid range of values (e.g.
"minimum": 0.0
and"maximum": 100.0
); - Constraints can specify an enumeration of valid values to choose from (e.g.
"enum" : ["2014-2020", "2021-2027"]
). - Constraints can be specified in custom ways, such as with regular expressions for powerful string matching capabilities;
- Data types can be enforced for any column;
- Columns can be forced to adapt a specific name and a description can be provided for each one of them.
Because these specifications can be expressed as portable text files, they became part of a multitude of tools to provide greater convenience to users and the validation process has been documented extensively. JSON code snippets from the documentation highlight the fact that this format conveys all the necessary information in a readable manner and lets users extend the original specifications as needed. In this particular instance, the CSV validator can be used as a Docker image, as part of a command-line application, inside a web application and even as a SOAP API.
In a way, the Frictionless specifications were the missing piece of the puzzle that enabled the ITB to rely on a well-documented set of standards for their data validation needs.