Skip to content

Commit

Permalink
Merge branch 'feature/tx1.4' into release/tx1.4
Browse files Browse the repository at this point in the history
  • Loading branch information
csnyulas committed Feb 19, 2023
2 parents 3c0d4fb + 7862fa3 commit 7a47513
Show file tree
Hide file tree
Showing 10,410 changed files with 1,100,091 additions and 3,169,803 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade setuptools pip wheel
TED_SWS_BRANCH=main make install-custom
make install
- name: Run Tests
run: make test

Expand Down
13 changes: 8 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,6 @@ install:
@ pip install --upgrade pip
@ pip install --upgrade --force-reinstall -r requirements.txt

install-custom:
@ echo -e "$(BUILD_PRINT)Installing the requirements$(END_BUILD_PRINT)"
@ pip install --upgrade pip
@ pip install --upgrade --force-reinstall git+https://github.com/OP-TED/ted-rdf-conversion-pipeline@$(or $(TED_SWS_BRANCH),main)#egg=ted-sws

dev-dotenv-file: rml-mapper-path-add-dotenv-file saxon-path-add-dotenv-file dev-secrets-dotenv-file

local-dotenv-file: rml-mapper-path-add-dotenv-file saxon-path-add-dotenv-file local-secrets-dotenv-file
Expand Down Expand Up @@ -82,7 +77,15 @@ test:
@ mapping_suite_validator package_F03
@ mapping_suite_validator package_F03_test
@ mapping_suite_validator package_F06
@ mapping_suite_validator package_F13
@ mapping_suite_validator package_F20
@ mapping_suite_validator package_F21
@ mapping_suite_validator package_F22
@ mapping_suite_validator package_F23
@ mapping_suite_validator package_F25


clear-xml-resolver-cache:
@ echo -e "$(BUILD_PRINT)Clear XML resolver cache!$(END_BUILD_PRINT)"
@ rm -r ~/.xmlresolver.org

61 changes: 36 additions & 25 deletions docs/antora/modules/ROOT/pages/cli-toolchain.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,65 +2,76 @@

The mapping artefacts, including the various mapping suite packages that are being developed by the process described in this document, are stored and maintained in the https://github.com/OP-TED/ted-rdf-mapping[OP-TED/ted-rdf-mapping] GitHub repository.

To assist the semantic engineers in the development of mapping suites a toolchain has been developed. There are a number of command line tools (CLIs) available in the https://github.com/OP-TED/ted-rdf-conversion-pipeline[OP-TED/ted-rdf-conversion-pipeline] GitHub repository that can be run on these mapping suites to process them in various ways. In order to run these CLIs the ted-sws project needs to be installed in the rdf-mapping environment. This can be done by following the *installation instructions* provided https://github.com/OP-TED/ted-rdf-conversion-pipeline#installation[here].
To assist the semantic engineers in the development of mapping suites a toolchain has been developed. There are a number of command line tools (CLIs) available in the https://github.com/OP-TED/ted-rdf-conversion-pipeline[OP-TED/ted-rdf-conversion-pipeline] GitHub repository that can be run on these mapping suites to process them in various ways. In order to run these CLIs the ted-sws project needs to be installed in the rdf-mapping environment. This can be done by following the *installation instructions* provided https://github.com/OP-TED/ted-rdf-conversion-pipeline#installation--usage[here]. The documentation for the usage of these CLI tools can be found https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html[here], however we will provide below some more details about the most relevant ones.

After the installation, the following CLIs will be available (the name of the tools to be executed on the command line are specified in parentheses).

=== Resources Injector (resources_injector)
This CLI injects the requested resources listed on the "Resources" spreadsheet of the Conceptual Mappings into the MappingSuite. Each Form has resources list that represent the controlled value that are needed in the mapping process.

Consult the authority tables used in the EPO available from the https://op.europa.eu/en/web/eu-vocabularies/authority-tables[EU Vocabularies].

For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_resources_injector[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== RML Modules Injector (rml_modules_injector)
This CLI injects the technical mappings modules from the `src/mappings` folder (see xref:methodology.adoc#_technical-mapping-modularisation-chapter[modules chapter]) into the mapping suite. Each form has a module list that is needed in order to run the mapping_runner.
The modules name are listed on the "RML_Modules" spreadsheet of the Conceptual Mappings

For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_rml_modules_injector[dedicated section in the Mapping Suite CLI Toolchain] documentation.


=== SPARQL Test Generator (sparql_generator)

This CLI generates a set of SPARQL queries from the conceptual mapping that will be executed by the `sparql_runner` CLI (described in the next section). Each generated query will be used to test if the related conceptual mapping is correctly generating RDF data or not.

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-sparql_generator[readme file].
For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_sparql_generator[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== SPARQL Queries Runner (sparql_runner)
This CLI executes all the sparql queries generated by sparql_generator into each RDF result file. The results of this execution are one report per RDF file that contains the queries associated to indicators (True, False or Error).
This CLI executes all the sparql queries generated by sparql_generator into each RDF result file. The results of this execution are one report per RDF file that contains the queries associated to indicators (Valid, Invalid, Unverifiable, Warning or Error).

Each indicator helps semantic engineers and reviewers to figure out different information as it shown down below:

* True: the mapping corresponding to the sparql query does work.
* *Valid*: The XPATH to which the query is associated was found in the XML notice and he SPARQL query returned True. This is the ideal case.

* False: the mapping didn't generate RDF results that would match the query. This can happen either when an expected element is missing on the XML notice, or if there is a problem with the mapping rules.
* *Invalid*: The XPATH to which the query is associated was found in the XML notice and the SPARQL query returned False. This can occur for various reasons. It could be a problem in the mapping rule (which needs to be fixed in the technical mapping), or in the generated SPARQL query (which could be fixed by updating the conceptual mapping), but it also could be also a "false alarm" in cases when the validation and reporting tool is not capable to generate the more advanced SPARQL query that can capture the special cases when these queries should be executed (e.g. if there is an XPATH condition), and the query is executed even in situations when it is not necessary.

* Error: the SPARQL query is incorrect. That means that the conceptual mapping has to be reviewed.
* *Unverifiable*: The XPATH to which the query is associated was NOT found in the XML notice and the SPARQL query returned False. This is not a problem, per se. It is an expected False result. In such situation the given SPARQL query can't help us to validate the correctness of the mapping, on this input data, as it is not applicable to it.

For more detailed take a look at the following example https://github.com/OP-TED/ted-rdf-mapping/blob/main/mappings/package_F03/output/002705-2021/test_suite_report/sparql_cm_assertions.html[here].
* *Warning*: The XPATH to which the query is associated was NOT found in the XML notice but the SPARQL query returned True. This might be due to an error in the mapping or the query, but in most cases it is due to the fact that the SPARQL query is "incomplete", i.e. too localized, and it does not capture the full context of when should a certain graph patterned be matched, and it matches some valid property paths in the output that were created from other XPATHs, which have similar ending, for example.

=== Resources Injector (resources_injector)
This CLI injects the requested resources listed on the "Resources" spreadsheet of the Conceptual Mappings into the MappingSuite. Each Form has resources list that represent the controlled value that are needed in the mapping process.

Consult the authority tables used in the EPO available from the https://op.europa.eu/en/web/eu-vocabularies/authority-tables[EU Vocabularies].

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-resources_injector[readme file].

=== RML Modules Injector (rml_modules_injector)
This CLI injects the technical mappings modules from the `src/mappings` folder (see xref:methodology.adoc#_technical-mapping-modularisation-chapter[modules chapter]) into the mapping suite. Each form has a module list that is needed in order to run the mapping_runner.
The modules name are listed on the "RML_Modules" spreadsheet of the Conceptual Mappings
* *Error*: The SPARQL query is incorrect. That means that the conceptual mapping has to be reviewed.

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-rml_modules_injector[readme file].
For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_sparql_runner[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== Mapping Test Runner (mapping_runner)
This CLI applies the mapping on a certain test notice file, a batch of notice files (organised in a folder), or on all available test notices, and generates output files representing the corresponding RDF graph for each notice (see RDF output examples https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/output[here]).

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-mapping_runner[readme file].
For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_mapping_runner[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== Metadata generator (metadata_generator)
This tool extracts the relevant metadata from the conceptual mapping file (by default: `conceptual_mappings.xlsx`) and stores it in a JSON file (by default: `metadata.json`). This metadata file will be used by various other processes (both CLIs and DAGs), mainly to inform them about the applicability of this mapping to various notices.

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-metadata_generator[readme file].
For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_metadata_generator[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== YARRRML to RML Converter (yarrrml2rml_converter)
This command line tool allows the conversion of a mapping expressed in the more user-friendly YARRRML syntax to RML. This is a very useful tool, especially at initial phases of the mapping development, or for newcomers, as it is easier and faster to write YARRRML rules than RML rules.
This command line tool allows the conversion of a mapping expressed in the more user-friendly YARRRML syntax to RML. This is a very useful tool, especially at initial phases of the mapping development, or for newcomers, as it is easier and faster to write YARRRML rules than RML rules. This tool is not used any more in the current development process as due to technical reasons we develop the mappings directly in RML, not in YARRRML.

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-yarrrml2rml_converter[readme file].
For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_yarrrml2rml_converter[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== XPATH Coverage Runner (xpath_coverage_runner)
Generates reports describing XPATH coverage of the notices.

For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_xpath_coverage_runner[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== SHACL Validation Runner (shacl_runner)
Generates SHACL Validation Reports for RDF files.
For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-shacl_runner[readme file].

For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_shacl_runner[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== Mapping Suite Processor (mapping_suite_processor)
This CLI runs all the necessary CLIs mentioned above in a logical order, to fully process a mapping suite, starting with the generation of metadata and up to running the mapping on all the (specified) tests data, and generating all the possible associated validation artefacts.
This CLI runs all the necessary CLIs mentioned above in a logical order, to fully process a mapping suite, starting with the generation of metadata and up to running the mapping on all the (specified) tests data, and generating all the possible associated validation artefacts. It can be run for a certain package, (set of) notice(s), or groups of commands.

For more detailed documentation on its usage please check out the https://github.com/OP-TED/ted-rdf-conversion-pipeline#cmd-mapping_suite_processor[readme file].
For more detailed documentation on its usage please check out the https://docs.ted.europa.eu/rdf-conversion/mapping_suite_cli_toolchain.html#_cmd_mapping_suite_processor[dedicated section in the Mapping Suite CLI Toolchain] documentation.

=== Other relevant tools and libraries
Other relevant tools that are used in the mapping process that worth mentioning, are:
Expand Down
14 changes: 7 additions & 7 deletions docs/antora/modules/ROOT/pages/code-list-resources.adoc
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
=== Code List Resources for Mappings
=== Resources for Code List Mappings

The table below provides a list of resources that are used to map the various code lists used in the XML files to URIs in the RDF representation.

The *JSON* and *CSV* format files can be found at the following location:
In case of mapping Standard Form `F03` the *JSON* and *CSV* format files can be found at the following location:
https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/transformation/resources[https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/transformation/resources] +
The *specific URIs* are directly used in the
https://github.com/OP-TED/ted-rdf-mapping/tree/main/mappings/package_F03/transformation/mappings[technical mapping files], and they can also be found in the
https://github.com/OP-TED/ted-rdf-mapping/blob/main/mappings/package_F03/transformation/conceptual_mappings.xlsx[conceptual mapping file].

*Imortant note:* Please ensure you change the path of the link to the tag that you are checking. For example, for the `tx1.1.delta` tag, the links mentioned above will be:
*Imortant note:* Please ensure you adapt the above paths to the resources to match the tag and mapping suite package that you wish to check. For example, for the `tx1.3.beta` tag and for form `F20`, the links mentioned above will be:

* https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.1.delta/mappings/package_F03/transformation/resources[https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.1.delta/mappings/package_F03/transformation/resources]
* https://github.com/OP-TED/ted-rdf-mapping/blob/tx1.1.delta/mappings/package_F03/transformation/conceptual_mappings.xlsx[https://github.com/OP-TED/ted-rdf-mapping/blob/tx1.1.delta/mappings/package_F03/transformation/conceptual_mappings.xlsx]
* https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.1.delta/mappings/package_F03/transformation/resources[https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.1.delta/mappings/package_F03/transformation/resources]
* https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.3.beta/mappings/package_F20/transformation/resources
* https://github.com/OP-TED/ted-rdf-mapping/tree/tx1.3.beta/mappings/package_F20/transformation/mappings
* https://github.com/OP-TED/ted-rdf-mapping/blob/tx1.3.beta/mappings/package_F20/transformation/conceptual_mappings.xlsx
[cols="30%,20%,~"]
|===
|*Code List Resource*|*Resource Type*|*Reasoning*

Expand All @@ -22,7 +23,6 @@ https://github.com/OP-TED/ted-rdf-mapping/blob/main/mappings/package_F03/transfo
|at-voc:currency|JSON format|Used a SPARQL query to get the values from the specific EU Voc
|at-voc:cpv|JSON format|Used a SPARQL query to get the values from the specific EU Voc
|at-voc:contract-nature|JSON format|Used a SPARQL query to get the values from the specific EU Voc
|award_criterion_type.json|JSON format|Used a SPARQL query to get the values from the specific EU Voc
|at-voc:legal-basis|JSON format|Used a SPARQL query to get the values from the specific EU Voc
|at-voc:cpvsuppl|JSON format|Used a SPARQL query to get the values from the specific EU Voc
|at-voc:main-activity|CSV format|Used this format because the XML element from XSD schema is different than the code from the specific EU Voc
Expand Down
10 changes: 5 additions & 5 deletions docs/antora/modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
= TED-SWS artefacts documentation
= TED-RDF Mappings Documentation

The https://github.com/OP-TED/ted-rdf-mapping[TED-SWS artefacts] are mainly the transformation rules needed by the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED-SWS system]. These transformation rules, written in https://rml.io/specs/rml/[RML], are mapping the https://op.europa.eu/en/web/eu-vocabularies/e-procurement/tedschemas[XML structure] of TED https://simap.ted.europa.eu/web/simap/standard-forms-for-public-procurement[standard forms] notices to https://www.w3.org/RDF/[RDF], and are organised in xref:mapping-suite-structure.adoc[mapping suites].
The https://github.com/OP-TED/ted-rdf-mapping[TED-RDF Mappings] are mainly the transformation rules needed by the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED-RDF Conversion Pipeline] (both of which are part of the TED Semantic Web Services, aka TED-SWS system) to convert TED notices available in XML format to https://www.w3.org/RDF/[RDF]. These transformation rules, which are written in https://rml.io/specs/rml/[RML], map the https://op.europa.eu/en/web/eu-vocabularies/e-procurement/tedschemas[XML structure] of TED https://simap.ted.europa.eu/web/simap/standard-forms-for-public-procurement[standard forms] notices to RDF, and are organised in xref:mapping-suite-structure.adoc[mapping suites].

Also, xref:preparing-test-data.adoc[carefully selected sample data] are available for testing the mapping correctness and completeness.
Also, xref:preparing-test-data.adoc[carefully selected sample data] are available for testing the correctness and completeness of the mappings.

We foresee to map all standard forms and then eForms according to xref:mapping-priorities.adoc[this plan, prioritising some form number over other ones]. In order to be consistent in our approach, we also have established the mapping xref:methodology.adoc[methodology].
We foresee to map all standard forms, and then all eForms, according to xref:mapping-priorities.adoc[this plan], prioritising some form number over other ones. In order to be consistent in our approach, we have also established a mapping xref:methodology.adoc[methodology].

The description of the code list mappings are provided xref:code-list-resources.adoc[here].
The mappings of the various code lists are described xref:code-list-resources.adoc[here].
Loading

0 comments on commit 7a47513

Please sign in to comment.