Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL policy about MAS and BASE #98

Open
VladimirAlexiev opened this issue Oct 6, 2024 · 12 comments
Open

URL policy about MAS and BASE #98

VladimirAlexiev opened this issue Oct 6, 2024 · 12 comments
Assignees
Labels
instance Pertains to instance data urlpolicy Considerations about URL/namespace/folder/filename design/carving

Comments

@VladimirAlexiev
Copy link
Collaborator

I think figuring out instance URLs, Models as named graphs, and how instances could be served, is an important topic.

Problem

Instances use relative URLs. but don't specify a BASE, which leads to "random" URLs depending on tool used.

BASE=MAS

One way is to set BASE based on the MAS. Eg (in turtle):

@base <http://www.Statnett.no/IGM/Nordic44_CGM#>.
<e2f56599-a78e-494f-8db3-c0b0bdab1d70> a cim:OperationalLimitSet

Causes instance URL <http://www.Statnett.no/IGM/Nordic44_CGM#e2f56599-a78e-494f-8db3-c0b0bdab1d70>

But there are questions (leading to what you may call a URL Policy):

  • Is it reasonable to ever hope that http://www.Statnett.no will return RDF data? Maybe use data.statnett.no or similar
  • Why http not https? Your sysadmins may not be happy.
  • By using # we preclude a client from requesting a single resource (semantic URL). Now consider that the model may have 1M resources and 1B triples...
  • Why the leading underscore in local names? I've removed it.

BASE=MAS=model

Maybe the model URN and the MAS should be the same? Don't they represent one and the same thing, namely that set of triples?
After consideration, that is not the case:

  • If I understand correctly, a MAS describes some resources (nodes), and a variety of models (sets of statements) on top of them.
    • Nodes have URLs, statements don't have URLs
  • A triple may live in several models.
    • This is very clear if you think of models as Named Graphs,
    • Can be illustrated by example: a triple in the shared part of a base model B and several differential models D1, D2...

Use URNs for Instances

Another way is to reformat instance URIs to be urn:uuid: just like the model URN.

  • But then we preclude semantic resolution, i.e. proper publication of models as Linked Data
@VladimirAlexiev VladimirAlexiev added instance Pertains to instance data urlpolicy Considerations about URL/namespace/folder/filename design/carving labels Oct 6, 2024
@Sveino
Copy link
Owner

Sveino commented Oct 25, 2024

if we can replace rdf:ID="_f1769b90-9aeb-11e5-91da-b8763fd99c5f" with rdf:ID="urn:uuid:f1769b90-9aeb-11e5-91da-b8763fd99c5f" there is no issue for validation. There is a bit more of challenged to use rdf:about

The use of MAS URL is just that we do not need to a lookup table for creating the URL. ENTSO-E does have some power by defining the URL for the return RDF data will be very difficult. However, we (ENTSO-E) have required the domain:

  • The www.model4powersystem.eu domain intends at referencing the power system models, or Individual Grid Model (IGM) and Common Grid Model (CGM) as stated in the network code.

We have also decided to use https rather than http. What is not clear is if ENTSO-E will host this or it is hosted by individual entity. Most likely this would be a transition where some advance TSO is hosting their own. We can then replace it with their URL or we redirect.

@griddigit-ci
Copy link
Collaborator

I guess, this is another good discussion point.
-We need to see where do we need base and where not now instance data does not have base and RDFS and SHACL has

  • I think we are also going in direction to use rdf:about only

@VladimirAlexiev
Copy link
Collaborator Author

I thought we already agreed to use BASE=MAS instead of urn:uuid

  • If you use urn:uuid, that can never be resolved
  • I implemented cim-trig.pl, which produces <http://www.Statnett.no/IGM/Nordic44_CGM#_e2f56599-a78e-494f-8db3-c0b0bdab1d70>
    (I left the underscore as is).
  • I agree the MAS should use http not https

@VladimirAlexiev
Copy link
Collaborator Author

Answering considerations that @Sveino raised in #94 (comment):

rdf:ID vs rdf:about... to tell the receiver that you should already have the object

But there's no such semantics in RDF (in fact you can't create a node in RDF, you create only triples).
And you can't control a standard tool to emit one or the other (most use only rdf:about).

or we go for the way we would like it to be for JSON-LD

We'll use the same instance URNs in XML and JSON, right?
I recommend URLs that some day might resolve, whereas URIs will never resolve.

with a base that is http://model4powersystem.eu/Statnett

Ok, but you need to change the MAS in the specific files.
And let's use https not http.

do base need to be the same for all the releated instance files or could/should it be named graph

The base determines instance node URLs. The same PSR should have the same URL, regardless in how many models or profiles it appears.
Named graphs group triples (so turn them into quads), for model management purposes: add/delete model, patch (through a DifferenceModel), validate.

(MAS is replaced with) <dcat:isVersionOf rdf:resource="https://energy.referencedata.eu/Model/Statnett-EQOP"/>

I dislike two things here:

  • PSRs are real-world things (eg a transmission line managed by Statnett).
    CIM records actual or hypothetical information about PSRs in models.
    The same PSR may appear in multiple models.
  • dcat:isVersionOf is inappropriate. It says "a dataset is a version of another".
    • But the MAS is not a dataset nor a model!
    • It just sets the URL base for PSRs.
    • In the very first model that uses a certain MAS, you still need to specify MAS: would you then say that it's a version of itself?

dcterms:references is refering to the instance dataset that is a named graph. This could be:
<dcterms:references rdf:resource="http://model4powersystem.eu/Statnett-EQCO/urn:uuid:99ae9f41-0a91-4d21-a483-7398c160da96"

I dislike two things here:

  • Your spec says that models use urn:uuid:. But the above is not a URN, it's a http URL. There's no point to use the words urn:uuid: in a http URL
  • that URN should be the URN of the model, not a property of the node. In other words:
urn:uuid:99ae9f41-0a91-4d21-a483-7398c160da96 a md:FullModel;  ## right
  ## this URL is "wrong"  and the pseudo self-reference is useless:
  dcterms:references <http://model4powersystem.eu/Statnett-EQCO/urn:uuid:99ae9f41-0a91-4d21-a483- 7398c160da96>

@Sveino
Copy link
Owner

Sveino commented Oct 30, 2024

We are not starting to have very much the same dialog on multiple issues. I do not know how to explain the transition of CIM syntax update. Let me try with this diagram:

image

We'll use the same instance URNs in XML and JSON, right?
Not between CIM JSON-LD and CIM XML (2016), but between CIM JSON-LD and standard RDF/XML.

The current header information md:Model is not sufficient for the future. Rather then develop our own we would like to reuse DCAT. We are there for missing information in the current CIM XML based on IEC 61970-552:2016. This lead us back to the strategy decision in #116

The comment regarding urn:uuid was related to "_". In CIM JSON-LD we will most like support resolvable and non-resolvable. For non-resolvable we are just using urn:uuid: for resolvable we include base URL.

The use of https://model4powersystem.eu/Statnett/ , https://energy.referencedata.eu/Statnett/ or https://statnett.model4powersystem.eu/ is a bit depending on how we can deal with re-direct and access rights- I did not see that we need to make that decision now. However, in regards to JSON-LD instance data specification we need to understand This should not necessary be a topic. How the combination of resolve and non-resolvable will work. Remember that we might not be able to have all TSO ready at the same time. Is this a possible approached:

{
  "@context": {
    "base": "https://energy.referencedata.eu/Model/Statnett-EQOP/",
    "dcterms": "http://purl.org/dc/terms/"
  },
  "@graph": [
    {
      "@id": "urn:uuid:99ae9f41-0a91-4d21-a483-7398c160da96",
      "dcterms:identifier": "99ae9f41-0a91-4d21-a483-7398c160da96",
      "dcterms:description": "Example description of a non-resolvable resource",
      "dcterms:resolvableURL": {
        "@id": "https://energy.referencedata.eu/Model/Statnett-EQOP/urn:uuid:99ae9f41-0a91-4d21-a483-7398c160da96"
      }
    }
  ]
}

For CIM XML conversion I am in principle OK if we use BASE=MAS or urn:uuid. But the current MAS is not particularly good....

I implemented cim-trig.pl, which produces http://www.Statnett.no/IGM/Nordic44_CGM#_e2f56599-a78e-494f-8db3-c0b0bdab1d70
(I left the underscore as is).
We have said that "_" is technical addition. It should be replaced by urn:uuid or urn:eic if we decided to use that in the URL.

(MAS is replaced with) <dcat:isVersionOf rdf:resource="https://energy.referencedata.eu/Model/Statnett-EQOP"/>
This was a simplification that I regrated when I wrote it. Each instance file, including the DifferenceSet is a version of the abstact reference defined in dcat:isVersionOf. This is explained here: https://www.w3.org/TR/vocab-dcat-3/#ex-version-chain-and-hierarchy

dcterms:references
I do not understand the comments on this. dcterms:references shall refer to the model/dataset that this dataset/graph is depending on for doing full validation. it should not be self-refering.

@VladimirAlexiev
Copy link
Collaborator Author

As discussed, I think it should be like this:

{
  "@context": {
    "base": "https://energy.referencedata.eu/Statnett/",
    "dcterms": "http://purl.org/dc/terms/"
  },
  "@graph": [
    {
      "@id": "99ae9f41-0a91-4d21-a483-7398c160da96", // URL resolved against BASE
      "dcterms:identifier": "99ae9f41-0a91-4d21-a483-7398c160da96", // string not URL!
      "dcterms:description": "Example description of a resolvable resource"
    }
  ]
}
  • Instead of a custom prop resolvableURL, we make the model URI resolvable (a URL)
  • we use / not #, which is recommended for bigger collections
  • Removed Model and EQOP from the base because that is the prefix of node URLs not triple sets

@arne-bdt
Copy link

arne-bdt commented Dec 6, 2024

I’d like to add to the discussion with some points regarding the use of urn:uuid and the challenges posed by current practices in CIM XML (IEC 61970-552).

Context and Issue with Namespaces

In CIM XML (IEC 61970-552), the specification does not define an xml:base, leaving implementations to address this gap individually. Over time, users have introduced various namespaces such as:

  • http://iec.ch/TC57/2013/CIM-schema-cim16#
  • http://iec.ch/TC57/CIM100#
  • xx://#
  • http://fullgrid.eu/CGMES/3.0#

While these solutions aim to address compatibility issues with RDF/XML readers like Apache Jena, they introduce a significant problem: the same mRID (UUID) ends up with different URI representations in different systems even when read from the same CIM XML file.

Impact on Interoperability and Queries

This lack of consistency results in different SPARQL query results for the same logical object across systems. For example, an ACLineSegment with a particular UUID might be referenced as:

  • http://iec.ch/TC57/2013/CIM-schema-cim16#_26cc8d71-3b7e-4cf8-8c93-8d9d557a4846 in one system,
  • http://iec.ch/TC57/CIM100#_26cc8d71-3b7e-4cf8-8c93-8d9d557a4846 in another,
  • or even http://fullgrid.eu/CGMES/3.0#_26cc8d71-3b7e-4cf8-8c93-8d9d557a4846 in yet another.

These discrepancies undermine the interoperability CIM XML is meant to facilitate, especially when transitioning between profiles or versions like CGMES 2.4.15 and CGMES 3.0.0.

The Case for urn:uuid

The CIM XML specification itself (IEC 61970-552 Ed.1 / 57/1262/CDV), describes a clear alternative:

“URN form: urn:namespace:specification where the namespace in CIM XML is uuid.”

--> has this become invalid?

Using urn:uuid ensures the same object, identified by a globally unique UUID, has a single, consistent representation regardless of the namespace or system. For example:

  • urn:uuid:26cc8d71-3b7e-4cf8-8c93-8d9d557a4846

This approach eliminates the variability caused by namespaces. Since UUIDs are already guaranteed to be unique, they do not need an additional namespace.

Challenges of URLs as Identifiers

While URLs could theoretically provide resolvable references, the lack of a public registry for CIM objects prevents this. The current reality is that different implementations reference the same UUID with differing namespaces, causing compatibility and migration headaches between versions like CGMES 2.4.15 and CGMES 3.0.0.

Conclusion

To maintain consistency and simplify format migrations, urn:uuid provides the most stable and interoperable solution. It avoids namespace variability, ensures consistent referencing, and preserves the utility of UUIDs as globally unique identifiers across systems.

I recommend reconsidering the use of urn:uuid as the primary identifier format in CIM XML to mitigate these persistent interoperability issues.

(For transparency: I took a little help from ChatGPT to structure my arguments and improve my spelling and wording)

@Sveino
Copy link
Owner

Sveino commented Dec 9, 2024

@arne-bdt What you are raising is definitely a problem that has different layers! The general goal is that all elements in CIM (metadata, instance data and profile definition) are all resolvable. In addition, we can use urn:uuid for instance data that does not support to be resolvable. We are testing out how this can be done in this repository for upcoming version of CIMJSON-LD (new standard that might become IEC 61970-553) for instance data, next edition of IEC 61970-501 (Ed2) for metadata (CIM metamodel). There is an updated version of IEC 61970-552:2016 (ED2). This version was intended to reflect how this was used in CGMES (IEC 61970-600-1/2), but ended up not being a good version so IEC 61970-600-1/2 is deviating about from this standard. The plan is to publish IEC 61970-552 (ED3) that will be inline with CIMJSON-LD in regards to namespace.

You can find more information on this in this repo, but this is work in progress.
You can find relevant sources on ENTSO-E website: https://www.entsoe.eu/data/cim/cim-for-grid-models-exchange/

RDF-Syntax User Guide v1.1.0 Talks about general discrepancy between CIM and the updated RDF Syntax standards,

Metadata and Document Header Data Exchange Specification v2.4.0 The direction of support DCAT as metadata for dataset.

Regional Coordination Process Data Exchange Specification (RCP DES) 2.3.1 In the version note you will find the new namespace that are used in the CGMES Network Code (NC) extension. The plan is that CIM18 will follow the same pattern.

@VladimirAlexiev
Copy link
Collaborator Author

@arne-bdt

Note: a search for label:urlpolicy here finds 13 issues.

does not define an xml:base

That is indeed a big problem. It's issue #87 and fixed in https://github.com/Sveino/Inst4CIM-KG/tree/develop/rdf-improved#fix-resource-urls

http://iec.ch/TC57/2013/CIM-schema-cim16#
http://iec.ch/TC57/CIM100#
xx://#

These are not suitable values for xml:base:

  • The first two are ontology namespaces, and it's not a good idea to mix ontology terms and individuals
  • The last one uses an invalid/unregistered URI scheme
  • All use trailing hash, which is not a good idea for a large collection of resources if they would ever be resolvable, since a client doesn't send the part after the hash. I've now added use slash not hash in instance URLs (and remove parasitic underscore) #143.

especially when transitioning between profiles or versions like CGMES 2.4.15 and CGMES 3.0.0.

Indeed!
#123 discusses the problems caused by versioned ontology namespaces.
If one uses versioned instance namespaces, that's even worse.

The Case for urn:uuid... While URLs could theoretically provide resolvable references, the lack of a public registry for CIM objects prevents this.

urn:uuid can never be resolved.
You don't need a public registry to resolve URLs: the DNS (host name system) does that.
All it takes is willingness and discipline by data owners to establish semantic resolution, and to keep their data current.

urn:uuid ... eliminates the variability caused by namespaces
The current reality is that different implementations reference the same UUID with differing namespaces

Yes, but there is a way to fix this problem without resorting to non-resolvable solutions.

The CIM/CGMES community is still largely "document-oriented": it thinks of CIM XML files.
But if it starts thinking about resources and model graphs, then it can use Linked Data principles to fetch data at a frequency and granularity decided by the data client, not the data owner.
By making resource and model graph URLs resolvable, clients can get on demand up-to-date info about things that are of interest to them.

@arne-bdt
Copy link

@Sveino
Thank you for the detailed explanation and for linking those documents—I wasn’t aware of the last two and they were very helpful!

@VladimirAlexiev
Thanks for picking up and explaining key aspects of this issue.

So, has it been decided that md:Model.modelingAuthoritySet should be used as xml:base when reading CIM/XML files? I can see the benefit of having a resolvable URL, but wouldn’t changing the md:Model.modelingAuthoritySet effectively reset all mRIDs for a TSO?

In our work, we’ve moved away from treating CIM/XML as files and instead treat it as RDF, primarily using Apache Jena. We ensure stable mRIDs by using data warehouses, which allows us to:

  • Maintain constant mRIDs for the same instances over time.
  • Extend CGMES data with custom attributes or references using mRIDs.
  • Join timeseries data to CIM instances via mRIDs.

If the full instance URI depends on md:Model.modelingAuthoritySet, changing the MAS could break a lot of these workflows—especially in applications where the full URI is used instead of just the UUID part.

Looking forward to hearing your thoughts on this!

@VladimirAlexiev
Copy link
Collaborator Author

VladimirAlexiev commented Dec 17, 2024

Hi @arne-bdt !
I've read the goals of your OpenCGMES, very interesting. This repo (folder rdf-improved) tackles some of the same problems: attaching datatypes, conversion to trig and jsonld...

Formally speaking, the mRID is the pure UUID (or EIC) without the namespace.
But you are right that URI stability is just as important, since URIs are the resource keys in a semantic repo.
So an authority should never change its MAS. And the MAS should not depend on profile. If it is impractical or impossible to use a single MAS per authority (or at least per transfer file), that is an argument not to use URLs but only URNs.

@Sveino , @arne-bdt , @griddigit-ci , @tviegut
Can we elaborate arguments why a single and stable MAS per authority may be impractical or impossible?

  • Authorities may tend to use different MAS per profile (model kind) but models of a profile (eg DY) depend on resources from another profile (eg EQ).
  • Cross-TSO resources (eg a cross-border transmission line) will tend
  • more???

I also posted #144 that muses whether a global resolver is possible for electrical data.

@arne-bdt
Copy link

Hi @VladimirAlexiev,

Thanks for your response. I see your point, but I’m concerned that relying on a fixed MAS as a stable base for all URIs may be problematic for several reasons:

  1. TSO changes over time:
    A TSO might change its company name and/or the domain and naming scheme over time. Examples like http://rte-france.fr/Planning/CGMES (seen in certain conformity assessment test configurations) were differently adopted. In the past, I have seen CGMES, where different departments ('Planning', 'Operations') used different MAS's. Such choices could evolve — first by removing the department name (Planning) in favor of http://rte-france.fr/CGMES — later by shifting to a dedicated subdomain (http://cgmes.rte-france.fr). This evolution would frequently invalidate previously derived URIs.

  2. Different approaches by different TSOs:
    As more TSOs adopt resolvable URIs, we’ll likely see diverse approaches. Some might introduce categorization, using different paths for profiles (e.g., EQ/, GL/, AE/), or taking https://energy.referencedata.eu/ as an example with BaseVoltage/, PropertyReference/, PowerFlowSettings/ etc.. These structures change over time as new profiles and/or categories emerge. Maintaining a single MAS as a stable base in each profile would limit the flexibility and categorization that evolving systems might require.

  3. How would boundaries and merged profiles work with BASE=MAS?:
    For example, in MicroGrid-Type1-Merged, the EQ file has a base http://elia.be/CGMES, and the Boundary file uses http://entsoe.eu/boundary. If we rely on MAS to form the complete URI, the same mRID 63893f24-5b4e-407c-9a1e-4ff71121f33c would resolve differently, depending on the chosen MAS. This inconsistency breaks the intended stability and cross-file data linkage. Similarly, for SV and DL profiles in assembled models, the recommended practice (per CGMES 2.4.15 [R.4.8.10.] and [R.4.8.11.]) is that these profiles do not even define MAS in the header.

  4. Long-term stability and external applications:
    Some operational solutions (e.g., the German redispatch solution) rely on stable URIs over long periods. They store RDF data and run SPARQL queries against stable URIs to correlate equipment, regions, generating units and loads across multiple revisions and datasets. Changing the xml:base or MAS-based URI pattern would effectively create new identifiers and break these existing references, causing operational issues.

Considering these points, introducing an optional property—like eu:IdentifiedObject.referenceURL—could offer the necessary flexibility. This would support categorization and allow TSO-specific or profile-specific URI structures without forcing a single stable MAS-based URI scheme for all instances. In short, relying solely on MAS as a stable base seems too restrictive and may lead to long-term maintenance and consistency problems as domains and schemes inevitably evolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
instance Pertains to instance data urlpolicy Considerations about URL/namespace/folder/filename design/carving
Projects
None yet
Development

No branches or pull requests

4 participants