Merge pull request #5 from OP-TED/feature/TED-1422

Feature/ted 1422
OP-TED · Sep 28, 2023 · 1d96730 · 1d96730
2 parents 9b3f199 + b7c431f
commit 1d96730
Show file tree

Hide file tree

Showing 31 changed files with 1,358 additions and 89 deletions.
diff --git a/docs/antora/antora.yml b/docs/antora/antora.yml
@@ -1,6 +1,6 @@
 name: ted-rdf-docs
 version: master
-title: TED-RDF Conversion Pipeline
+title: TED-SWS documentation
 start_page: ROOT:index.adoc
 asciidoc:
   attributes:

diff --git a/docs/antora/modules/ROOT/images/user_manual/jupyter_notebook/image3.png b/docs/antora/modules/ROOT/images/user_manual/jupyter_notebook/image3.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/jupyter_notebook/image7.png b/docs/antora/modules/ROOT/images/user_manual/jupyter_notebook/image7.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/jupyter_notebook/image8.png b/docs/antora/modules/ROOT/images/user_manual/jupyter_notebook/image8.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/ms_excell/image10.png b/docs/antora/modules/ROOT/images/user_manual/ms_excell/image10.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/ms_excell/image11.png b/docs/antora/modules/ROOT/images/user_manual/ms_excell/image11.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/ms_excell/image12.png b/docs/antora/modules/ROOT/images/user_manual/ms_excell/image12.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/ms_excell/image3.png b/docs/antora/modules/ROOT/images/user_manual/ms_excell/image3.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/ms_excell/image8.png b/docs/antora/modules/ROOT/images/user_manual/ms_excell/image8.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/ms_excell/image9.png b/docs/antora/modules/ROOT/images/user_manual/ms_excell/image9.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/sparql_queries/image1.png b/docs/antora/modules/ROOT/images/user_manual/sparql_queries/image1.png
diff --git a/docs/antora/modules/ROOT/images/user_manual/sparql_queries/image2.png b/docs/antora/modules/ROOT/images/user_manual/sparql_queries/image2.png
diff --git a/docs/antora/modules/ROOT/nav.adoc b/docs/antora/modules/ROOT/nav.adoc
@@ -1,27 +1,19 @@
+[.separated]#**TED-SWS**#
+
 * xref:index.adoc[Home]
-    ** What is TED SWS
-    ** What is sample app?
-    ** What is mapping?
-    ** How to use TED SWS
-    ** What y’ll find in this documentation
-    ** How to contribute to TED SWS
 
-* xref:mapping_suite/index.adoc[Mapping Suites]
-    ** Getting started
-    ** Who are these docs written for
-    ** Glossary
-    ** Assumptions we make about the skills of the reader
-        *** Prerequisites
-    ** what the user can achieve through these pages
+
+    * xref:mapping_suite/index.adoc[Mapping Suite Docs]
     ** xref:mapping_suite/repository-structure.adoc[Repository structure]
     ** xref:mapping_suite/mapping-suite-structure.adoc[Mapping suite anatomy]
     ** xref:mapping_suite/code-list-resources.adoc[Code list mappings]
     ** xref:mapping_suite/preparing-test-data.adoc[Data samples]
     ** xref:mapping_suite/versioning.adoc[Versioning]
-    ** References
 
-* xref:sample_app/index.adoc[TED Data Sample application]
-    ** xref:sample_app/jupyter_notebook.adoc[Jupyter Notebook]
-    ** xref:sample_app/ms_excell.adoc[MS Excel]
+    * xref:sample_app/index.adoc[Sample application Docs]
+    ** xref:sample_app/jupyter_notebook_python.adoc[Python Jupyter Notebook]
+    ** xref:sample_app/jupyter_notebook_r.adoc[R Jupyter Notebook]
+    ** xref:sample_app/ms_excel.adoc[MS Excel]
+    ** xref:sample_app/sparql_queries.adoc[SPARQL Queries]
 
 
diff --git a/docs/antora/modules/ROOT/pages/index.adoc b/docs/antora/modules/ROOT/pages/index.adoc
@@ -1,22 +1,45 @@
-= TED-RDF Conversion Pipeline Documentation
+= TED-SWS End-User Documentation
 
-The TED-RDF Conversion Pipeline, is part of the TED Semantic Web Services (TED-SWS system) and provides tools an infrastructure to convert TED notices available in XML format into RDF. This conversion pipeline is designed to work with the https://docs.ted.europa.eu/rdf-mapping/index.html[TED-SWS Mapping Suites] - self containing packages with transformation rules and resources.
+TED Semantic Web Service (TED-SWS) is a pipeline system that continuously
+converts the public procurement notices (in XML format) available on the
+TED Website into RDF format based on the eProcurement Ontology, and publishes
+them into CELLAR repository, hance making them available to the public
+through CELLAR’s SPARQL endpoint.
 
-== What is TED SWS
+The TED Semantic Web Service (TED-SWS) is plugging together
+the TED infrastructure for the collection and publication of public procurement
+notices with the infrastructure of http://data.europa.eu/[data.europa.eu]
+in order to make public procurement data accessible and reusable as
+Linked Open Data (LOD) by users and stakeholders (see xref:motivation.adoc[the detailed motivation]).
 
+== Audience
 
-== What is sample app?
+This documentation is written for a wide audience, with different interests in the TED-SWS project, and different levels of expertise Semantic Web, EU e-Procurement and software infrastructure. More specifically this documentation can be of interest to:
 
+- *End-Users*, such as *Semantic Web Practitioners* or *Experts in eProcurement Domain*, who are interested in understanding how the RDF representation of the e-procurement notices look like, and how this representation conforms to the eProcurement Ontology (ePO).
+- *Software Engineers* interested in integrating mapping suite packages into processing pipelines;
+- *Semantic Engineers* interested in understanding and writing mappings from XML to RDF, in particular in the EU eProcurement domain;
 
-== What is mapping?
+== Contents
 
+[.tile-container]
+--
 
-== How to use TED SWS
+[.tile]
+.Mapping Suites
+****
+The TED-RDF Mappings are the transformation rules needed by the TED-RDF Conversion Pipeline (both of which are part of the TED Semantic Web Services, aka TED-SWS system) to convert TED notices available in XML format to RDF.
 
+<<ted-rdf-docs:ROOT:mapping_suite/index.adoc#, Read the docs>>
+****
 
-== What y’ll find in this documentation
 
+[.tile]
+.Sample applications
+****
+Sample application represents a set of examples that shows how to interact with TED RDF Data (available in CELLAR) using tools like Python, R or Excel.
 
-== How to contribute to TED SWS
-
+<<ted-rdf-docs:ROOT:sample_app/index.adoc#, Read the docs>>
+****
 
+--
diff --git a/docs/antora/modules/ROOT/pages/mapping_suite/code-list-resources.adoc b/docs/antora/modules/ROOT/pages/mapping_suite/code-list-resources.adoc
@@ -1,4 +1,4 @@
-=== Resources for Code List Mappings
+== Resources for Code List Mappings
 
 The table below provides a list of resources that are used to map the various code lists used in the XML files to URIs in the RDF representation.
 

diff --git a/docs/antora/modules/ROOT/pages/mapping_suite/index.adoc b/docs/antora/modules/ROOT/pages/mapping_suite/index.adoc
@@ -1,16 +1,7 @@
-= What is a Mapping suite?
+= Mapping suite documentation
 
 A *mapping suite* is a set of "mappings" that defines how an XML document representing an e-Procurement Notice will be transformed to an equivalent RDF graph representation. These mappings are materialized in different forms, as it will be explained later, and a mapping suite will have all its relevant components organized in a package, which we refer to as a *mapping suite package*.
 
-== Who are these docs written for?
-
-This documentation is written for a wide audience, with different interests in the TED-SWS project, and different levels of expertise Semantic Web, EU e-Procurement and software infrastructure. More specifically this documentation can be of interest to:
-
-- *Semantic Engineers* interested in understanding and writing mappings from XML to RDF, in particular in the EU eProcurement domain;
-- *Software Engineers* interested in integrating mapping suite packages into processing pipelines;
-- *End-Users*, such as *Semantic Web Practitioners* or *Experts in eProcurement Domain*, who are interested in understanding how the RDF representation of the e-procurement notices look like, and how this representation conforms to the eProcurement Ontology (ePO).
-
-
 == Prerequisites
 
 To allow for a proper understanding of the Mapping Suite Documentation, the reader should have:
@@ -61,10 +52,8 @@ https://op.europa.eu/en/web/eu-vocabularies/e-procurement/tedschemas
 == Further readings
 Depending on the interest of the reader the following pages can be explored (in this logical order):
 
-** xref:mapping_suite/ted-sws-introduction.adoc[]
 ** xref:mapping_suite/repository-structure.adoc[GitHub Repository structure]
 ** xref:mapping_suite/mapping-suite-structure.adoc[Mapping suite anatomy]
 ** xref:mapping_suite/code-list-resources.adoc[Code list mappings]
 ** xref:mapping_suite/preparing-test-data.adoc[Data samples]
-** xref:mapping_suite/versioning.adoc[Versioning]
-** xref:mapping_suite/ [References]
+** xref:mapping_suite/versioning.adoc[Versioning]
diff --git a/docs/antora/modules/ROOT/pages/mapping_suite/preparing-test-data.adoc b/docs/antora/modules/ROOT/pages/mapping_suite/preparing-test-data.adoc
@@ -1,8 +1,8 @@
-= Representative sample data selection
+== Representative sample data selection
 
 This section describes TED notice data samples and methods used to generate them. At first a sampling is performed on the notices from 2021, and then on a wider set.
 
-== Sample TED notices from 2021
+=== Sample TED notices from 2021
 
 This data sample (`test_data/sampling_2021`) contains carefully selected TED notices based on the following criteria: maximise representativeness, minimise the number of selected documents. The selected notices are guaranteed to cover all possible XPath configurations available in the data. The sampling was performed automatically using a custom algorithm available in the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED-RDF Conversion Pipeline] repository.
 

diff --git a/docs/antora/modules/ROOT/pages/mapping_suite/repository-structure.adoc b/docs/antora/modules/ROOT/pages/mapping_suite/repository-structure.adoc
@@ -1,4 +1,4 @@
-= Repository structure
+== Repository structure
 
 Transformation rules and other artefacts for the https://github.com/OP-TED/ted-rdf-conversion-pipeline[TED Semantic Web Services (TED-SWS)] system are organised in https://github.com/OP-TED/ted-rdf-mapping[this repository].
 

diff --git a/docs/antora/modules/ROOT/pages/motivation.adoc b/docs/antora/modules/ROOT/pages/motivation.adoc
@@ -0,0 +1,62 @@
+# TED-SWS motivation
+
+In its Strategic Plan for 2020-2024, the Publications Office has
+defined specific Ojective 1 on the "European public procurement space"
+as part of its general Objective 2 "A Europe fit for the digital age".
+
+In this context the Publications Office has identified the need for reliable and
+complete data on public procurement in the EU as being essential
+for transparency and accountability of public spending. The ongoing
+investments of the Publications Office for the transition to eForms,
+and the continued development of the eProcurement Ontology are
+identified by the Strategic Plan as being central for
+improved data quality and enhanced automation of data processing
+and interoperability.
+
+Additionally, in the context of specific objective 2 on the
+"European data space", the Publications Office identifies the gap that
+still exists between the available wealth of open data, spread across
+multiple outlets, and the effort required to discover, access and reuse it.
+
+To bridge this gap, the Strategic Plan for 2020-2024, commits to
+generate and share new knowledge as linked open data, through
+an ecosystem of datasets, data models, ontologies and specialised services
+accessible through a single entry point (http://data.europa.eu/[data.europa.eu])
+following a "data-as-a-public-service" approach.
+
+Although TED notice data is already available to the general public
+through the search API provided by the TED website, the current offering
+has many limitations that impede access to and reuse of the data. One
+such important impediment is for example the current format of the data.
+
+Historical TED data come in various XML formats that evolved together
+with the standard TED XML schema. The imminent introduction of eForms
+will also introduce further diversity in the XML data formats available
+through TED's search API. This makes it practically impossible for users
+to consume and process data that span across several years, as
+their information systems must be able to process several different
+flavours of the available XML schemas as well as to keep up with the
+schema's continuous evolution. Their search capabilities are therefore
+confined to a very limited set of metadata.
+
+The TED Semantic Web Service removes these barriers by providing one
+common format for accessing and reusing all TED data. Coupled with the
+eProcurement Ontology, the TED data will also have semantics attached to
+them allowing users to directly link them with other datasets.
+Moreover, users will now be able to perform much more elaborate
+queries directly on the data source (through the SPARQL endpoint). This
+will reduce their need for data warehousing in order to perform complex
+queries.
+
+These developments, by lowering the barriers, will give rise to a vast
+number of new use-cases that will enable stakeholders and end-users to
+benefit from increased availability of analytics. The ability to perform
+complex queries on public procurement data will be equally open to large
+information systems as well as to simple desktop users with a copy of
+Excel and an internet connection.
+
+To summarize, the TED Semantic Web Service (TED SWS) is a pipeline
+system that continuously converts the public procurement notices (in XML
+format) available on the TED Website into RDF format, publishes them
+into CELLAR and makes them available to the public through CELLAR’s
+SPARQL endpoint.
diff --git a/docs/antora/modules/ROOT/pages/sample_app/index.adoc b/docs/antora/modules/ROOT/pages/sample_app/index.adoc
@@ -0,0 +1,59 @@
+= Sample app documentation
+
+A sample application, often referred to as a demo or prototype, is a functional representation of a software program or system that demonstrates its basic features, functionalities, and capabilities. In the context of TED-SWS (TED Semantic Web Services), a sample application would refer to a functional representation of how to access data processed by the system.
+
+== Glossary
+
+* *RDF* stands for Resource Description Framework. RDF is a standardized data model used to represent information on the web. RDF plays a crucial role in xref:ROOT:index.adoc[TED-SWS] because it provides a standardized and structured format for representing the procurement data made available through the service. This allows for efficient querying, processing, and integration of the data into various applications and systems.
+
+* *SPARQL Query* represents query language used to retrieve and manipulate data stored in RDF format.
+
+* *Jupyter Notebook* is an interactive computing environment that allows users to create and share documents containing live code, equations, visualizations, and explanatory text. It's particularly useful for working with data and performing data analysis, making it a valuable tool for xref:ROOT:mapping_suite/index.adoc[accessing and processing] data from TED-SWS.
+
+* *Python* is a widely used programming language that can be employed to retrieve data and xref:ROOT:sample_app/jupyter_notebook_python.adoc[perform operations] on the RDF data provided by TED-SWS.
+
+* *R language* refers to a popular programming language and environment specifically relevant for statistical computing, xref:ROOT:sample_app/jupyter_notebook_r.adoc[data analysis], and graphical representation that is used to retrieve and perform operations on the RDF data provided by TED-SWS.
+
+* *MS Excel* refers to Microsoft Excel, which is a widely used spreadsheet program developed by Microsoft used as a versatile tool for xref:ROOT:sample_app/ms_excel.adoc[handling and analysing] data obtained from TED-SWS.
+
+* *Code Editor* refers to a software tool or environment where users can write, edit, and execute code. It allows to easily create scripts or programs to retrieve data from TED-SWS and perform operations on the RDF data.
+
+* *Jupyter Notebook Kernel* refers to the computational engine that executes the code within a Jupyter Notebook. It determines which programming language is used to run the code in the notebook. For example, if you're working with TED-SWS in a Jupyter Notebook, you might choose to use a Python kernel, which means that you'll be writing and executing Python code.
+
+* *Business Questions* (BQ) refer to specific inquiries or information needs that pertain to business operations, procurement activities, or related aspects. These questions are typically posed by organizations, researchers, or individuals seeking to gain insights, make informed decisions, or conduct analyses based on the data provided by TED-SWS.
+
+== Prerequisites
+
+To use TED-SWS sample apps, you will need the following:
+
+Understanding of RDF and SPARQL:: Familiarity with RDF (Resource Description Framework) and SPARQL (SPARQL Protocol and RDF Query Language) is crucial. TED-SWS provides data in RDF format and utilizes SPARQL for querying.
+
+Access to a Programming Language:: You should have proficiency in a programming language capable of making HTTP requests and processing data. Common choices include Python or R.
+
+Knowledge of Semantic Web Technologies:: A basic understanding of Semantic Web concepts and technologies is beneficial. This includes knowledge of RDF triples, ontologies, and linked data principles.
+
+Development Environment:: Set up a development environment for your chosen programming language or at least ensure that you have installed MS Excel.
+
+Understanding of EU Procurement Data:: If your goal is to work with specific types of EU procurement data, such as contract notices or award notices, it's important to have a basic understanding of these concepts and the associated https://docs.ted.europa.eu/EPO/latest/index.html[ontology].
+
+== Using Jupyter Notebook
+
+* <<ted-rdf-docs:ROOT:sample_app/jupyter_notebook_python.adoc#, Jupyter Notebook - Python>>
+
+Example of using Python language and to access data.
+
+* <<ted-rdf-docs:ROOT:sample_app/jupyter_notebook_r.adoc#, Jupyter Notebook - R>>
+
+Example of using R language and to access data.
+
+== Using MS Excel
+
+* <<ted-rdf-docs:ROOT:sample_app/ms_excel.adoc#, MS Excel Workbook>>
+
+Example of accessing data in a MS Excel workbook.
+
+== SPARQL Query examples
+
+* <<ted-rdf-docs:ROOT:sample_app/sparql_queries.adoc#, SPARQL Query examples>>
+
+Example of accessing data using SPARQL Query examples
diff --git a/docs/antora/modules/ROOT/pages/sample_app/jupyter_notebook.adoc b/docs/antora/modules/ROOT/pages/sample_app/jupyter_notebook.adoc
diff --git a/docs/antora/modules/ROOT/pages/sample_app/jupyter_notebook_python.adoc b/docs/antora/modules/ROOT/pages/sample_app/jupyter_notebook_python.adoc
@@ -0,0 +1,92 @@
+== Jupyter Notebook - Python
+
+This document shows an example using the Jupyter Notebook in Python. The
+Jupyter Notebook is an application for creating and sharing
+computational documents. Python represents a programming language for
+writing computational documents. To realize the proposed scenario, it is
+necessary to install the special tools and use the Python code that will
+perform a query to the cellar and display the results in tabular
+form.
+
+Example query:
+
+**Who are the contract winners for a given date?**
+
+[source,sparql]
+PREFIX epo: <http://data.europa.eu/a4g/ontology#>
+PREFIX org: <http://www.w3.org/ns/org#>
+PREFIX cccev: <http://data.europa.eu/m8g/>
+select distinct
+?Lot
+?Winner
+?WinnerCountryCode
+?LotAwardetAmountValue
+?LotAwardetValueCurrency
+where {
+    values ?NoticePublicationDate {
+       "20230921"
+    }
+    ?NoticeId a epo:ResultNotice;
+                   epo:hasPublicationDate ?NoticePublicationDate;
+                   epo:refersToLot ?Lot.
+    ?Lot a epo:Lot.
+    ?LotAwardOutcome epo:describesLot ?Lot;
+                   a epo:LotAwardOutcome;
+                   epo:comprisesTenderAwardOutcome ?TenderAwardOutcome.
+    ?TenderAwardOutcome a epo:TenderAwardOutcome;
+                          epo:indicatesAwardOfLotToWinner / epo:playedBy ?Winner.
+    ?Winner a org:Organization.
+    optional {
+        ?Winner cccev:registeredAddress / epo:hasCountryCode ?WinnerCountryCode.
+    }
+    ?LotAwardOutcome epo:hasAwardedValue ?LotAwardetValue.
+    ?LotAwardetValue a epo:MonetaryValue;
+                epo:hasAmountValue ?LotAwardetAmountValue;
+                epo:hasCurrency ?LotAwardetValueCurrency.
+}
+
+To run the sample application using Python language follow the steps below:
+
+[arabic]
+. https://github.com/OP-TED/ted-rdf-docs/blob/main/notebooks/query_cellar_python.ipynb[Download Jupyter Notebook ]
+
+
+[arabic, start=2]
+. Download & Install Python 3.8
+[loweralpha]
+.. Windows 64bit:
+https://www.python.org/ftp/python/3.8.10/python-3.8.10-amd64.exe[[.underline]#download#]
+
+.. Windows 86bit:
+https://www.python.org/ftp/python/3.8.10/python-3.8.10.exe[[.underline]#download#]
+
+. Open the Jupyter Notebook file with the code editor
+
+. In the code editor, select the Python interpreter that was installed in the previous step
+
+.Interpreter selection
+image::user_manual/jupyter_notebook/image1.png[image,width=817,height=204]
+
+
+[arabic, start=5]
+. Install dependencies
+
+* Use OS command line and run the following command
+
+[source, python]
+pip3 install sparqlwrapper pandas Jinja2 matplotlib
+
+NOTE: After installation, restart kernel from Jupyter Notebook to update it with new dependencies. This can be done by clicking on the "Restart" button in your code editor.
+
+[arabic, start=6]
+. Run all Jupyter Notebook Cells
+
+.Button that runs all cells
+image::user_manual/jupyter_notebook/image2.png[image,width=501,height=84]
+
+[arabic, start=7]
+. After running successfully all the cells in the Jupyter Notebook, we can see the result table
+
+.Result table
+image::user_manual/jupyter_notebook/image3.png[image,width=987,height=420]
+