Skip to content

Commit

Permalink
ci: finalise release 2.10.0 (#706)
Browse files Browse the repository at this point in the history
  • Loading branch information
jobulcke authored Nov 12, 2024
2 parents 1d25428 + c123d93 commit daaece6
Show file tree
Hide file tree
Showing 201 changed files with 4,841 additions and 861 deletions.
2 changes: 2 additions & 0 deletions .github/ldio.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ COPY ./ldi-orchestrator/ldio-connectors/ldio-version-object-creator/target/ldio-
COPY ./ldi-orchestrator/ldio-connectors/ldio-geojson-to-wkt/target/ldio-geojson-to-wkt-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-http-enricher/target/ldio-http-enricher-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-change-detection-filter/target/ldio-change-detection-filter-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-skolemisation-transformer/target/ldio-skolemisation-transformer-jar-with-dependencies.jar ./lib/

COPY ./ldi-orchestrator/ldio-connectors/ldio-console-out/target/ldio-console-out-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-http-out/target/ldio-http-out-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-noop-out/target/ldio-noop-out-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-repository-sink/target/ldio-repository-sink-jar-with-dependencies.jar ./lib/
COPY ./ldi-orchestrator/ldio-connectors/ldio-http-sparql-out/target/ldio-http-sparql-out-jar-with-dependencies.jar ./lib/


RUN mkdir "state"
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/deploy-documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name: Build Docs
on:
push:
branches:
- master
- main
- develop
workflow_dispatch:

Expand Down Expand Up @@ -50,14 +50,14 @@ jobs:
with:
node-version: 16
registry-url: https://npm.pkg.github.com/
- run: npm i -g @koumoul/gh-pages-multi
- run: npm i -g @yalz/gh-pages-multi
- run: |
git config --global user.email "[email protected]"
git config --global user.name "VSDS CI"
git config --global url.https://${{ env.PAT }}@github.com/.insteadOf https://github.com/
env:
PAT: ${{secrets.DEPLOY_DOCS_PAT}}
- run: |
gh-pages-multi deploy --title ${{env.title}} -t ${{steps.version.outputs.version}} -s docs/_site
gh-pages-multi deploy --title "${{env.title}}" -t ${{steps.version.outputs.version}} -s docs/_site
env:
NODE_AUTH_TOKEN: ${{secrets.GITHUB_TOKEN}}
10 changes: 0 additions & 10 deletions docs/_ldio/ldio-inputs/ldio-ldes-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,6 @@ within a fragment.
When the fragment is marked as immutable, and no members can be added anymore, the LDES Client will stop keeping track
of members processed within that fragment.

Members within a fragment can be processed in order of time based on a timestamp. The path to this timestamp needs to be
configured.
If the patch is missing, members will be processed in random order.

### Filtering

#### Exactly-once-filter
Expand Down Expand Up @@ -98,7 +94,6 @@ CPU ([source](https://www.sqlite.org/faq.html#q19)).
| _source-format_ | The 'Content-Type' that should be requested to the server | No | text/turtle | application/n-quads | Any type supported by [Apache Jena](https://jena.apache.org/documentation/io/rdf-input.html#determining-the-rdf-syntax) |
| _state_ | 'memory', 'sqlite' or 'postgres' to indicate how the state should be persisted | No | memory | sqlite | 'memory', 'sqlite' or 'postgres' |
| _keep-state_ | Indicates if the state should be persisted on shutdown (n/a for in memory states) | No | false | false | true or false |
| _timestamp-path_ | The property-path used to determine the timestamp on which the members will be ordered, and used for the `latest-state-filter` when enabled | No | N/A | http://www.w3.org/ns/prov#generatedAtTime | A property path |
| _enable-exactly-once_ | Indicates whether a member must be sent exactly once or at least once | No | true | true | true or false |

{: .note }
Expand All @@ -115,13 +110,8 @@ api
| Property | Description | Required | Default | Example | Supported values |
|:--------------------------------------|:--------------------------------------------------------------------------------------|:---------|:-------------------------------------|:-------------------------------------|:-----------------|
| _materialisation.enabled_ | Indicates if the client should return state-objects (true) or version-objects (false) | No | false | true | true or false |
| _materialisation.version-of-property_ | Property that points to the versionOfPath | No | http://purl.org/dc/terms/isVersionOf | http://purl.org/dc/terms/isVersionOf | true or false |
| _materialisation.enable-latest-state_ | Indicates whether all state or only the latest state must be sent | No | true | false | true or false |

{: .note }
Don't forgot to provide a timestamp-path in the general properties, as this property is not required, but necessary for
this filter to work properly!

{% include ldio-core/http-requester.md %}

### SQLite properties
Expand Down
40 changes: 40 additions & 0 deletions docs/_ldio/ldio-outputs/ldio-http-sparql-out.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
layout: default
parent: LDIO Outputs
title: HTTP Sparql Out
---

# HTTP Sparql Out

***Ldio:HttpSparqlOut***

The HTTP SPARQL Out component can be used to write data to a SPARQL host, with Virtuoso as the most common known one.

## Config

| Property | Description | Required | Default | Example | Supported values |
|:-----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------|:---------|:--------|:-----------------------------------------|:-----------------|
| _endpoint_ | The url of the sparql host | Yes | N/A | http://localhost:8890/sparql | URL |
| _graph_ | The graph whereto must be written | No | N/A | http://example.graph.com | String |
| _skolemisation.skolemDomain_ | If the skolem domain is set, skolemisation will be triggered before the triples are written to the sparql host | No | N/A | http://example.org | Any valid IRI |
| _replacement.enabled_ | Whether the old nodes must be replaced by the new ones | No | true | false | Boolean value |
| _replacement.depth_ | How deep the default delete query must delete nested nodes from the existing subject, will be ignored if `replacement.deleteFunction`is set | No | 10 | 15 | Integer |
| _replacement.deleteFunction_ | If this property is set, then the default delete function will be overridden with this delete function | No | N/A | `DELETE { ?s ?p ?o} WHERE { ?s ?p ?o }` | String |

{% include ldio-core/http-requester.md %}

### Replacement

Replacement includes that all old nodes from certain subjects must be deleted before the new nodes with the same subject
can be inserted. \
By default, a delete query is constructed by the service that delete all nodes, including nested nodes to a level,
specified by the `replacement.depth` property, deep. If for some reason, the constructed delete query is not sufficient,
or the query is too complex, a custom delete query can be configured. This query will override the default query created
by the service, which also mean the `replacement.depth` property will be ignored.

### Skolemisation

Not all sparql hosts can deal that well with blank nodes, therefore, those nodes can first be skolemised. However, to
skolemise nodes, a skolem domain is required, which can be set by the `skolemisation.skolemDomain` property, which
directly enables the service. More information about skolemisation can be found on
the [skolemisation-transformer page](./../ldio-transformers/ldio-skolemisation-transformer)
6 changes: 5 additions & 1 deletion docs/_ldio/ldio-transformers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,8 @@ has_toc: true
nav_order: 5
---

# Linked Data Interactions Orchestrator Transformers
# Linked Data Interactions Orchestrator Transformers

The LDI Core module contains the components maintained by the VSDS team in order to accommodate the onboarding of LDES onboarders.

Each component can be wrapped in a desired implementation framework (LDI-orchestrator, NiFi, ...) to be used.
75 changes: 75 additions & 0 deletions docs/_ldio/ldio-transformers/ldio-skolemisation-transformer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
layout: default
parent: LDIO Transformers
title: Skolemisation Transformer
---

# LDIO Skolemisation Transformer

***Ldio:SkolemisationTransformer***

A transformer which skolemises the incoming model.

## What is Skolemisation

In the context of Linked Data, Skolemisation is a process used to handle blank nodes or anonymous nodes in RDF
(Resource Description Framework) graphs.
These nodes, which lack unique identifiers, are frequently employed in RDF/S knowledge bases to represent complex
attributes or resources with known properties but unknown identities.

Skolemisation in Linked Data involves the transformation of these blank nodes into Skolem Uniform Resource Identifiers (
URIs).
The process enhances the clarity makes it easier to reference these nodes in future datasets.

This process is particularly useful when dealing with substantial volumes of unstructured data distributed across
diverse sources.
By improving the accuracy and relevance of RDF summaries in relation to original datasets,
Skolemisation enhances the efficiency and effectiveness of subsequent queries against these summaries.

In summary, Skolemisation in Linked Data provides a way to handle the complexity introduced by blank nodes in RDF
graphs,
thereby enhancing the clarity, interoperability, and usability of the data.

### Example

Suppose we have the following RDF triples with a blank node represented as `_:`:

```
_:bnode <http://purl.org/dc/terms/title> "The Lord of the Rings" .
_:bnode <http://purl.org/dc/terms/creator> "J.R.R. Tolkien" .
```

In this example, `_:` is a blank node that represents a resource (a book in this case) with known properties (title and
creator) but an unknown identity.

Through Skolemisation, we can replace the blank node with a Skolem URI.
The Skolem URI is typically a URL that is unique to the blank node and is generated by the system handling the RDF data.
Here’s how it might look:

```
<http://example.com/.well-known/genid/123456> <http://purl.org/dc/terms/title> "The Lord of the Rings" .
<http://example.com/.well-known/genid/123456> <http://purl.org/dc/terms/creator> "J.R.R. Tolkien" .
```

In this Skolemized version, the blank node has been replaced with the Skolem
URI http://example.com/.well-known/genid/123456.
This URI is unique to the resource previously represented by the blank node, and can now be used to reference this
resource in other datasets.
This is a simple example, but it illustrates the basic process of Skolemisation in Linked Data.

## Config

| Property | Description | Required | Default | Example | Supported values |
|----------------|----------------------|----------|---------|--------------------|------------------|
| _skolemDomain_ | Skolemisation domain | true | N/A | http://example.com | Any valid URI |

### Configuration

The YAML configuration of this example would be as follows:

```yaml
transformers:
- name: Ldio:SkolemisationTransformer
config:
skolemDomain: http://example.com
```
98 changes: 1 addition & 97 deletions docs/_ldio/pipeline-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,100 +5,4 @@ has_children: true
has_toc: true
nav_order: 1
---

# Management of Pipelines

Pipelines in LDIO can be created in YAML or JSON configuration (although all example configurations are made in YAML,
these can also be formatted in JSON).

A default pipeline looks as follows:

```yaml
name: my-first-pipeline
input:
name: fully-qualified name of LDI Input
config:
foo: bar
adapter:
name: fully-qualified name of LDI Adapter
config:
foo: bar
transformers:
- name: fully-qualified name of LDI Transformer
config:
foo: bar
outputs:
- name: fully-qualified name of LDI Transformer
config:
foo: bar
```
- Note that one orchestrator can have multiple pipelines
- Note that one pipeline can have multiple LDI Transformers and LDI Outputs
## Anatomy of a pipeline
Each pipeline is built up of the following components:
* [LDIO Input](ldi-inputs): A component that will receive data (not necessarily LD) to then feed the LDIO pipeline.
* [LDIO Adapter](ldi-adapters): To be used in conjunction with the LDIO Input, the LDIO Adapter will transform the
provided content into and internal Linked Data model and sends it down the pipeline.
* [LDIO Transformer](ldi-transformers): A component that takes in a Linked Data model, transforms/modifies it and then
puts it back on the pipeline.
* [LDIO Output](ldi-outputs): A component that will take in Linked Data and will export it to external sources.
````mermaid
stateDiagram-v2
direction LR
LDI_Input --> LDI_Transformer : LD
LDI_Transformer --> LDI_Output : LD
state LDI_Input {
direction LR
[*] --> LDI_Adapter : Non LD
state LDI_Adapter {
direction LR
[*] --> adapt
adapt --> [*]
}
LDI_Adapter --> [*] : LD
}
state LDI_Transformer {
direction LR
[*] --> transform
transform --> [*]
}
state LDI_Output {
direction LR
[*] --> [*]
}
````
## Persistence of Pipelines
By default, all pipelines defined after startup (via management API) will be lost on restart.
To prevent this behaviour, add the `orchestrator.directory` property as follows:

```yaml
orchestrator:
directory: "{directory in application folder}"
```

If this directory does not exist, it will be created.

> **_NOTE:_** An application config can be defined by creating an application YAML file in the LDIO directory
(in docker, this correlates to `/ldio/application.yml`).


## Pausing & Resuming LDIO

Sometimes it might be preferred to pause an LDIO pipeline instead of deleting and recreating it.
The endpoints to manage pipelines can be found [here](pipeline-api.md)

The exact behaviour of a paused pipeline depends on its input component and can be found in the [documentation of these components](docs/_ldio/ldio-inputs/index.md).
However, it will always complete its current run through the pipeline and then seize sending any output.
# Pipeline Management
45 changes: 45 additions & 0 deletions docs/_ldio/pipeline-management/ldes-client-status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
layout: default
parent: Pipeline Management
title: LDES Client Status
nav_order: 4
---

# LDES Client Status

Just like the LDIO pipelines have a status, so does the [`Ldio:LdesClient`](../ldio-inputs/ldio-ldes-client). The client
status can be fetched when a pipeline that has a running status, and of course when it contains an LDES client as input
component.

## Overview Of The Status Flow

```mermaid
graph LR
;
REPLICATING --> SYNCHRONISING;
REPLICATING --> COMPLETED;
SYNCHRONISING --> COMPLETED;
SYNCHRONISING --> ERROR;
REPLICATING --> ERROR;
```

The above diagram shows the flow between the different statuses of the client.

## REPLICATING

The startup status of the client. This status indicates that the LDES client have not yet fetched all the available
fragments of a view (or views if so configured)

## SYNCHRONISING

This status indicates that all the fragments of the configured view(s) have been fetched at least once, and there is at
least one fragment that does not have an immutable state yet.

## ERROR

This status indicates that an error has occurred somewhere while `REPLICATING` or `SYNCHRONISING`

## COMPLETED

This status indicates that all the fragments of the configured view has been fetched at least once and all those have an
immutable state, or in other words, the end of the LDES has been reached.
Loading

0 comments on commit daaece6

Please sign in to comment.