diff --git a/docs/antora/modules/ROOT/nav.adoc b/docs/antora/modules/ROOT/nav.adoc index 9fbecb3688..7b429788c3 100644 --- a/docs/antora/modules/ROOT/nav.adoc +++ b/docs/antora/modules/ROOT/nav.adoc @@ -1,8 +1,3 @@ -* xref:repository-structure.adoc[Repository structure] -* xref:mapping-suite-structure.adoc[Mapping suite anatomy] -* xref:code-list-resources.adoc[Code list mappings] * xref:methodology.adoc[Methodology] * xref:cli-toolchain.adoc[Toolchain] -* xref:preparing-test-data.adoc[Data samples] -* xref:mapping-priorities.adoc[Mapping priorities] -* xref:versioning.adoc[Versioning] +* xref:mapping-priorities.adoc[Mapping priorities] \ No newline at end of file diff --git a/docs/antora/modules/ROOT/pages/code-list-resources.adoc b/docs/antora/modules/ROOT/pages/code-list-resources.adoc deleted file mode 100644 index 24a9282c00..0000000000 --- a/docs/antora/modules/ROOT/pages/code-list-resources.adoc +++ /dev/null @@ -1,37 +0,0 @@ -=== Resources for Code List Mappings - -The table below provides a list of resources that are used to map the various code lists used in the XML files to URIs in the RDF representation. - -In case of mapping Standard Form `F03` the *JSON* and *CSV* format files can be found at the following location: -https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/transformation/resources[https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/transformation/resources] + -The *specific URIs* are directly used in the -https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/transformation/mappings[technical mapping files], and they can also be found in the -https://github.com/OP-TED/ted-rdf-mapping/blob/main/mappings/package_F03/transformation/conceptual_mappings.xlsx[conceptual mapping file]. - -*Imortant note:* Please ensure you adapt the above paths to the resources to match the tag and mapping suite package that you wish to check. For example, for the `tx1.3.beta` tag and for form `F20`, the links mentioned above will be: - -* https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.3.beta/mappings/package_F20/transformation/resources -* https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.3.beta/mappings/package_F20/transformation/mappings -* https://github.com/OP-TED/ted-rdf-mapping/blob/tx1.3.beta/mappings/package_F20/transformation/conceptual_mappings.xlsx - -[cols="30%,20%,~"] -|=== -|*Code List Resource*|*Resource Type*|*Reasoning* - -|at-voc:country|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:nuts|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:currency|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:cpv|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:contract-nature|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:legal-basis|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:cpvsuppl|JSON format|Used a SPARQL query to get the values from the specific EU Voc -|at-voc:main-activity|CSV format|Used this format because the XML element from XSD schema is different than the code from the specific EU Voc -|at-voc:buyer-legal-type|CSV format|Used this format because the XML element from XSD schema is different than the code from the specific EU Voc -|at-voc:award-criterion-type|URI|Used only when we want to map to a specific value from the EU voc -|at-voc:procurement-procedure-type|URI|Used only when we want to map to a specific value from the EU voc -|at-voc:winner-selection-status|URI|Used only when we want to map to a specific value from the EU voc -|at-voc:non-award-justification|URI|Used only when we want to map to a specific value from the EU voc -|at-voc:economic-operator-size|URI|Used only when we want to map to a specific value from the EU voc -|at-voc:direct-award-justification|URI|Used only when we want to map to a specific value from the EU voc -|=== - diff --git a/docs/antora/modules/ROOT/pages/mapping-suite-structure.adoc b/docs/antora/modules/ROOT/pages/mapping-suite-structure.adoc deleted file mode 100644 index 4218f098b8..0000000000 --- a/docs/antora/modules/ROOT/pages/mapping-suite-structure.adoc +++ /dev/null @@ -1,132 +0,0 @@ -= Mapping suite package structure - -In this section we describe the structure of a “mapping suite package” in GitHub. Such a package contains everything that is needed for the development and testing of a given “mapping suite” that is applicable to a certain set of notices. After the package is finalised, it can be used by a process to apply it to a large number of notices stored in a database, and would transform those notices into RDF data. - -A package is represented by a well-defined folder structure containing certain files. This folder structure is repeated for every developed mapping. Initial organisation of these packages is per Form number, but it may evolve. - -The structure of the package changes through the different phases of the mapping development process. Below we describe how such a package looks in three phases of the mapping development. - - -=== Mapping suite package description for Semantic Engineers - -In the first, initial, phase, when the Semantic Engineers start working on a new mapping suite, they will have to set up a package folder structure similar to the one described below, and will work on (or with) the files contained there. - -*Assumption:* Regarding the naming and organisation of the various mapping suites, *one package per form number* is assumed to be THE way to organise these packages. - -*Challenge:* Are there better ways to deal with certain sections (sub-sections) that repeat across multiple forms? Consider Section I, for example, which in case of forms F03, F06, F25 contains “almost” the same information, therefore only one mapping should be written for it and RE-used in “final” form-mapping-packages. The problem is also discussed in a dedicated section below. - -The structure of an example mapping package folder structure is presented below: - ----- -/package_Fxx - /transformation - conceptual_mappings.xlsx - /mappings - *.rml.ttl - /resources - *.json, *.xml, *.csv - /test_data - *.xml ----- - -* `/package_Fxx` root folder of the mapping suite - -* `/transformation/conceptual_mappings.xlsx` manually created (from the Google Sheet template described xref:methodology.adoc#_conceptual-mapping-structure[here]) - -* `/transformation/resources` additional resources possibly needed by the transformation rules; + -The content of this folder should be automatically generated by the mapping package processor, based on the "Resources" sheet of the `conceptual_mappings.xlsx`, from the "source of truth" `ted-rdf-conversion-pipeline/ted-sws/resources`. - -* `/transformation/mappings/*.rml.ttl` the relevant RML transformation rules, organized in module files, which are copied from the "source" mappings folder, according to the information specified in the "RML Modules" sheet of the `conceptual_mappings.xlsx`. **IMPORTANT!!!** In these rules the source XML is always referring to `data/source.xml`, which corresponds to the `../../data/source.xml` file that will be copied (and renamed) from the `test_data` folder at the time of the execution of the mapping. - -* `/test_data` manually and carefully selected test data possibly grouped in suborders, e.g. `/test_data/batch-D1/*.xml` - -* `technical_mappings.yarrrml.yaml` (optional) manually created, and used in earlier days of the mapping development, but currently not used - -=== Mapping suite package description for the Software Engineers - -A package provided by the semantic engineers (SE) is enriched with additional artefacts that are generated automatically using the xref:cli-toolchain.adoc[package expanding tools] which take as input the artefacts provided by the SE. Here are some examples of these additional artefacts that are being generated: - -* *Metadata* describing the parameters for selecting the notices that the mappings can be applied to, various version information, etc. -* *SPARQL queries* that can be used to validate and/or test the generated outputs -* *SHACL shapes* that can be used to validate and the structure of the generated outputs -* New ones may be added at the time of writing this document - -After the package processing/expansion, the structure of the example mapping package presented in the previous subsection would look like this: - ----- -/package_Fxx - metadata.json - /transformation - conceptual_mappings.xlsx - /mappings - *.rml.ttl - /resources - *.json, *.xml, *.csv - /data - source.xml - /output - *.rdf - /validation - /sparql - /cm_assertions - *.rq - /shacl # this is a constant, when we know what the SHACL is (currently unknown) - *.shacl.ttl # data shape file(s) - /test_data # manually and carefully selected test data - *.xml - ----- - -* `metadata.json` automatically generated from Metadata sheet of `conceptual_mapping.xlsx` - -* `/data` # this is a placeholder created at runtime to process the inputs. It serves only when the mapping suite is being tested, or executed by some script. - -* `source.xml` this file is generated during runtime by copying a given test data file - -* `/output` this is a placeholder created at runtime to store outputs. It serves only when the mapping suite is being tested, or executed by some script. - -* `/validation/sparql/cm_assertions` SPARQL queries automatically generated from the conceptual mapping - -=== Mapping suite package description for the Semantic Engineers after the expansion - -After the “execution” of a mapping, the mapping package will be further enriched, and will contain additional files, as a result of running the mapping suite on the included test data. - ----- -/package_Fxx - metadata.json - /transformation - conceptual_mappings.xlsx - /mappings - *.rml.ttl - /resources - *.json, *.xml, *.csv - /data - source.xml - /output - / - .ttl - /test_suite_report - *.ttl, *.html, *.json # e.g. sparql_cm_assertions.html, shacl_epo.html, xml_coverage.html - / - ... - / - ... - /validation - /sparql - /cm_assertions - *.rq - /shacl - /epo - ePO_shacl_shapes.rdf - shacl_result_query.rq - /test_data - .xml - .xml - .xml - *.xml ----- - -* `/output/` for each example file we create a folder that will contain all the generated artefacts for that sample file -* `/output/test_suite_report` validation reports summarising all individual reports -* `/output//.ttl` the output of the transformation -* \ No newline at end of file diff --git a/docs/antora/modules/ROOT/pages/repository-structure.adoc b/docs/antora/modules/ROOT/pages/repository-structure.adoc deleted file mode 100644 index 2a9856e2ac..0000000000 --- a/docs/antora/modules/ROOT/pages/repository-structure.adoc +++ /dev/null @@ -1,34 +0,0 @@ -= Repository structure - -Transformation rules and other artefacts for the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED Semantic Web Services (TED-SWS)] system are organised in https://github.com/OP-TED/ted-rdf-mapping[this repository]. - -The repository is organised as presented below. Next we describe the important folders and their purpose. - ----- -/docs - /antora -/mappings - /package_F03 - /package_F06 - /package_F25 - ... -/src - /mappings - *.rml.ttl -/test_data - /sampling_2014_2022 - /sampling_2021 - /sampling_manual -Makefile -requirements.txt ----- - -`/docs` folder contains this documentation. It is written in https://asciidoc.org/[AsciiDoc format] and compiled with https://antora.org/[Antora system]. - -`/mappings` folder contains xref:mapping-suite-structure.adoc[mapping suite packages] organised based on the standard forms numbers. Their name is formed based on the form number (e.g. `F03`, `F06`) prefixed with `package_` for readability. When the eForms will be mapped, then the corresponding appropriate organisation will be chosen. - -`/src/mappings` folder represents the "single source of truth" for the mapping rules across various mapping suite packages. This is necessary because of the xref:methodology.adoc#_technical-mapping-modularisation[modularisation and reuse method] adopted in this project. The basic idea is that the mapping rules are organised in modules and all are stored in the source folder. Each mapping suite provides in the xref:methodology.adoc#_conceptual-mapping-structure[conceptual mapping workbook] the list of modules that be used to compose the complete set of transformation rules of the mapped form number. - -`/test_data` folder contains sample https://ted.europa.eu/TED/browse/browseByMap.do[TED notices] generated by different selection methods. Some manually selected notices are available in the `/sampling_manual` subfolder. The automatically generated notice samples that are in the `/sampling_2021` subfolder are described xref:preparing-test-data.adoc[here]. In the `sampling_2014_2022` subfolder there are samples, generated from all available notices in the 2014-2022 period that cover the various changes in the XML Schema over the years. More automatically generated samples will follow. - - diff --git a/docs/antora/modules/ROOT/pages/versioning.adoc b/docs/antora/modules/ROOT/pages/versioning.adoc deleted file mode 100644 index 0a07cac95f..0000000000 --- a/docs/antora/modules/ROOT/pages/versioning.adoc +++ /dev/null @@ -1,73 +0,0 @@ -= Mapping Suite Versioning Rules - -This section presents the versioning rules for mapping suites which considers the mapping suite structure, the conceptual and technical mappings (which impact the structure of the output), and the metadata. These rules play a crucial role in maintaining compatibility and ensuring smooth transitions between different mapping suite versions, especially when considering the potential impact on SPARQL queries on the transformation output. - -=== Semantic versioning in a nutshell - -“Dependency hell” plagues software management and impacts models, architecture and documentation. As a project expands, the complexity of changes and dependencies increase, complicating the release of new work packages. Version lock and version promiscuity impede progress, making it difficult to move projects forward safely and efficiently. - -Semantic Versioning offers a solution by providing a set of rules for assigning and incrementing version numbers. It clearly communicates changes in artefacts through version number increments and change notes, using the *X.Y.Z (Major.Minor.Patch)* format: - -* Bug fixes increment the patch version, -* Backwards-compatible changes increment the minor version, and -* Backwards-incompatible changes increment the major version. - -This approach provides numerous benefits: - -* Precise artefact version identification -* Traceable artefact evolution for governance -* Minimised client-side impact from artefact changes -* Prevention of accidental semantic-level compatibility breaks -* Effortless detection of version incompatibility -* Clear differentiation of impact and compatibility levels for changes -* Transparent artefact evolution timeline -* Manageable artefact version governance (e.g., approval processes, quality gates, parallel versions, branches) - - -=== Backwards compatibility - -A new version of the mapping suite is considered to be backwards compatible if it can be read directly by the same software that was able to read the previous version without requiring any modifications in its code. - -=== Major version increment - -* The mapping suite structure or metadata structure changes. - -_Implications:_ - -* Applications must be aware of major releases and should not use them unless specifically designed to support them. Otherwise, a mapping suite with a major change cannot be read by an existing application (e.g. TED-SWS pipeline, Toolchain, etc.) -* An example of a change that would break backwards compatibility is renaming, moving, or removing a file in the mapping suite. Another example would be changing the format of the contents of a file in the mapping suite, e.g., switching from RML to YARRML. A more concrete example would be altering the structure of objects inside the metadata file, renaming or removing some properties, or altering the column structure in the conceptual mapping file. - -=== Minor version increment - -* When the output of the mapping execution produces different results for the same input. -** Mapping to a new version of the ontology. -** Changes in mapping rules that would impact SPARQL queries on output data. -** Deletion of a mapping rule that would impact SPARQL queries on output data. -* The structure of the mapping suite is extended to accommodate new software features - -_Implications:_ - -* If the output of the mappings impacts a SPARQL query, it is considered a minor change in the mapping suite, even if the impact on the query is major. -* At first glance, these types of changes may appear quite substantial and potentially incompatible with previous versions. However, it's important to note the definition of backward compatibility we discussed earlier in this section. Backward compatibility is considered broken only when developers are required to make modifications to enable an application to read the new version while still being able to interpret the previous version. Fortunately, in the rules mentioned above, none of the changes require any adjustments within the application itself. Hence, these modifications can be seamlessly adopted without any impact on the application's functionality. -* The RDF metadata in the output should indicate the ontology version it is compliant with, and the mapping suite version used to generate them, specifying only the major and minor versions (without the patch). - -=== Patch version increment - -* Variation in the source structure of the mapping that does not affect the output data -** Mapping to a new XSD version of the source XML schema. -** Mapping to a new version of the eForms SDK (XSD + JSON). -* Adding new mapping rules to make the mapping suite more complete. -* Editorial changes in mapping rules including comments, notes, and remarks. - -=== Release labelling - -* Pre-release (unstable) versions should be labelled with the suffix "-beta.#" (where # stands for a number e.g. 1,2,3). -* Release candidate (stable) versions should be labelled with the suffix “-rc.#” (where # stands for a number e.g. 1,2,3). Release candidate versions are issued to allow stakeholders to test and provide final remarks. - -_Implications:_ - -* This helps track unstable, in-development and release candidate versions, but does not impact precedence. - -=== Conclussion - -By adhering to the versioning rules outlined in this section, developers and maintainers can effectively manage versioning for mapping suites, ensuring compatibility and smooth transitions between different versions. These rules provide clear guidelines for when to increment major, minor and patch versions, considering the potential impact on SPARQL queries on the transformation output. Following these rules will help maintain consistency and compatibility across various systems that rely on the mapping suites.