Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping bidirectional UML relation to LinkML #146

Open
bartkl opened this issue Dec 20, 2024 · 5 comments
Open

Mapping bidirectional UML relation to LinkML #146

bartkl opened this issue Dec 20, 2024 · 5 comments

Comments

@bartkl
Copy link

bartkl commented Dec 20, 2024

Hey all,

Now that we want to create normative reference implementations and a specification of how to map UML to LinkML, I want to finally resolve some of the conundrums that have been plaguing me in my mind, and I could definitely use your help.

Problem statement

Suppose I have a bidirectional relation R between classes C and D, multiplicities on either side:

classDiagram
    C "i..j" -- "k..l" D : R
Loading

How would you represent this in LinkML?

Candidate solutions

Only two potential solutions come to mind, both of which are also described in the LinkML FAQ:

Relation class

One way is to acknowledge the relation as a first-class citizen and represent it as a class that is designated to represent a relation:

classes:
  C:
  D:
  R:
    represents_relationship: true
    attributes:
      predicate:
        enum_range:
          permissible_values:
            r:
        relational_role: NODE
      subject:
        range: C
        minimum_cardinality: i
        maximum_cardinality: j
        relational_role: NODE
      object:
        range: D
        minimum_cardinality: k
        maximum_cardinality: l
        relational_role: NODE

Caution

I haven't tested this, and feel quite confident it won't work without some tweaking. This only emphasizes the point I'll make later on that this is just too complicated. That, or maybe I'm just dumb 😉.

Benefits

  • Relationships have a clear, singular identity
  • Information about the relation can be expressed on the relationship instance (for example whether it is bidirectional using SUBJECT, PREDICATE and OBJECT values for relational_role, and of the aggregate or composite type, etc.)
  • Potentially capable of representing n-ary relations, although these are rare and the CIM does not use them at all

Drawbacks

  • Complex and ruther ugly representation that does not fit LinML's tree shape well
  • Human readability and usability is impacted severely
  • Dubious whether many generators support this idiom

In conclusion: it's too complex and unreliable. Absolutely not an option in my opinion.

Two attributes

A solution that is much more natural to a tree shaped metamodel such as LinkML, is to represent the relationship as two attributes, one on each of the related classes:

classes:
    C:
      attributes:
        r_d:
          range: D
            minimum_cardinality: k
            maximum_cardinality: l
          inverse: C.d  # PROBLEM: In LinkML you cannot reference other attributes
    D:
        r_c:
          range: D
            minimum_cardinality: i
            maximum_cardinality: j

Benefits

  • Very intuitive and easy to follow
  • Generators are guaranteed to understand this and there will be fewer metamodel impedance mismatches

Drawbacks

  • Relationships do not have a singular identity
  • It is not possible to store relationship metadata (perhaps conventions could alleviate the pain)
  • Can represent binary relations only (this is not a huge problem)
  • Stating that the attributes are each other's inverse is essential for a valid representation and this is not possible (unless we use top-level slots instead of attributes as we'll see later)

Despite its drawbacks, this solution is probably the more practical one. Representing the relation as two attributes goes a long way with the CIM in particular, since - at least to my knowledge - it uses no relationship names, only role names, which makes this very smooth. Losing information about the relationship type is probably not too bad, although in the worst case it can be stored on both attributes for example.

However, the issue with the inverse statement is absolutely essential. Let me explain further, and later follow up with a possible solution.

Note

The reason why stating that both attributes are each other's inverse is essential, is because otherwise their independence could result in invalid data.

For example, suppose the following simple UML model:

classDiagram
    Person "1" -- "1" Body : own
Loading

We could represent this as two attributes owns and isOwnedBy as follows (pseudostatements):

  • Each <Person> <owns> exactly one <Body>
  • Each <Body> <isOwnedBy> exactly one <Person>

Then, suppose in some dataset we have the information:

  • <Bart> <owns> <1234>
  • <1234> <isOwnedBy> <Todd>

This is fine according to the two attributes model, but the original UML would not admit this (assuming Unique Naming Assumption). That's because the attributes should not be independent.

The issue with inverse and attributes

So, the inverse statements are essential, but why can't we express it? It's really only because we have used attributes to represent the relation ends, and in LinkML these cannot be referenced. This is because they are scoped to classes and are not part of the global namespace of model elements such as classes, (top-level) slots, types and enums. So there is literally no way to point to another attribute (see comment in the code above).

This brings us to the final solution candidate, where we fix this issue:

Two slots

So, what if we simply replace the attributes with slots? Let's try.

classes:
  C:
    slots:
    - C.r
  D:
    slots:
    - D.r

slots:
  C.r:
    domain: C  # Not essential, but more explicit.
    range: D
    minimum_cardinality: k
    maximum_cardinality: l
    inverse: D.r
  D.r:
    domain: D  # Not essential, but more explicit.
    range: C
    minimum_cardinality: i
    maximum_cardinality: j
    inverse: C.r  # It's safer to express this on both slots since it's possible no reasoner is used.

Benefits

  • Same as the two attributes solution
  • Clear syntactical distinction between attributes and relations can be nice for humans

Drawbacks

  • Same as two attributes solution except the issue with inverse is fixed
  • Relations are actualy scoped to classes, but are defined at the top-level
    • This comes with the need for a convention for encoding the class owner name in the slot name (in the example a prefix <CLASS>. is used)

Note

Although this naming convention for slots seems hard to manage, it can actually follow the exact same convention used for the URIs we map onto model elements! For example: the relation ConductingEquipment from Terminal to ConductingEquipment would then get the name Terminal.ConductingEquipment, and the CIM URI cim:Terminal.ConductingEquipment, which aligns exactly!

Actual CIM example of what this could look like
classes:
  ConductingEquipment:
    class_uri: cim:ConductingEquipment
    slots:
    - ConductingEquipment.Terminals

  Terminal:
    class_uri: cim:Terminal
    slots:
    - Terminal.ConductingEqupment
    attributes:
      phases:
        slot_uri: cim:Terminal.phases
          range: PhaseCode
          required: false
          multivalued: false

slots:
  ConductingEquipment.Terminals:
    slot_uri: cim:ConductingEquipment.Terminals
    range: Terminal
    required: false
    multivalued: true
    inverse: Terminal.ConductingEquipment
  Terminal.ConductingEqupment
    slot_uri: cim:Terminal.ConductingEqupment
    range: Terminal
    required: false
    multivalued: true
    inverse: ConductingEqupment.Terminals

Conclusion

Honestly, after having typed out all of this, I don't see how any option other than the final one, i.e. the two slots solution. it is semantically correct, retains the human friendiliness and comprehensibility for both humans and generators. The only drawbacks are the lack of a single identity for the relation to put metadata on (which is hardly an issue), and the fact that what should actually be class-owned slots are now non-local, i.e. a global slot with a conventional prefix.

Anyway, I would love some input, so feel free to go ahead and tear this apart 🙂.

Related issues

@Sveino
Copy link
Owner

Sveino commented Dec 20, 2024

Oy, it is big. I would need to read the detail in the New Year. I am working on a small test model that shall include all the relevant UML cases.
The first step - which I am working really hard on - is to avoid having many-to-many. In general, this is the case in Grid package (61970). My goal for CIM18 is to have this be the case for all classes needed for CDPSM (this include Common, Asset, AssetInfo from Enterprise (61968). The CIM rules is that in the profile you make one side singular. For a workaround I would sugges that the vocabulary also include the same restriction.

@bartkl bartkl changed the title Mapping UML relation to LinkML Mapping bidirectional UML relation to LinkML Dec 20, 2024
@VladimirAlexiev
Copy link
Collaborator

@bartkl from the RDF point of view things are pretty straightforward: you need a Relation node only if you need to put some info in it (qualifiers, validity period, provenance, confidence...).
So I agree that "two slots" is the right approach.

There are relevant things in this repo:

  • Express Multiplicity in OWL explains how we emit the standard characteristics owl:FunctionalProperty and owl:InverseFunctionalProperty when the forward resp backward cardinality is max=1
  • Reasoning discusses what reasoning is needed for CIM
  • Maybe: Inverse Reasoning says that
    • Inverse reasoning is not required (inverse triples are redundant anyway). CIM has an extra annotation cims:AssociationUsed "Yes" stating which direction is present. How could we capture this in LinkML?
    • But inverse reasoning is desirable for querying

@bartkl
Copy link
Author

bartkl commented Dec 23, 2024

Thanks @VladimirAlexiev, also for pointing to those ideas you wrote down.

The more I think about it, the more it does indeed seem to be clear how to move forward.

However, in the case of bidirectional relations - so not just two inverse slots for convenience sake, but actually a relation with no direction and two different multiplicities on both ends - you do have to state both slots are each other's inverse, no? (see the example I give above for what could go wrong otherwise).

I will probably read comments here in the new year.

Happy holidays and a happy new year,
Bart

@bartkl
Copy link
Author

bartkl commented Dec 23, 2024

Btw, I'm writing up a beginning of this mapping in AsciiDoc. Will share as soon as something starts materializing.

@VladimirAlexiev
Copy link
Collaborator

@bartkl
"a relation with no direction" means Symmetric, and symmetric relations are self-Inverse, thus are a subset of inverses.
Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants