Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove confusing type Enum or String #215

Open
davidB opened this issue May 9, 2024 · 25 comments
Open

remove confusing type Enum or String #215

davidB opened this issue May 9, 2024 · 25 comments
Assignees
Milestone

Comments

@davidB
Copy link
Contributor

davidB commented May 9, 2024

With version 0.4, ticket's schema define type of some field (priority, resolution, ticketType) like:

"anyOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  },
  {
    "type": "string"
  }
]

But this definition is "conflicting" as the second is a superset of the first. (if oneOf is used instead of anyOf a jsonschema linter will warm about this issue). I mean for any language implementation string will be used (or an automatic generator/serializer+deserializer will be confused).

What is the purpose/value of the enum if any value can be used? (eg if a system is uppercase like "LOW")

Suggestion: replace the anyOf by single"type": "string"

@sbrennan4 WDYT?

@afrittoli
Copy link
Contributor

@davidB we added the anyOf(string, enum) on purpose so that we could suggest some known values in the JSON schema (and not only in the docs), but still accept others. We are aware that validation can in fact only validate that values are a string but we use this as a way to document well known values.

The initial approach was to only have an enum with other as one of the values, which was limiting because it would not let users carry the value they needed in their event.
An alternative I considered was keeping other and adding another field, to be used only when other is used. With this option, a consumer does not know in advance which field contains the expected value, which is not ideal. Additionally this would be harder to validate.

@davidB
Copy link
Contributor Author

davidB commented May 19, 2024

If the purpose is to document, why not use the field description or examples (or a custom field x-...)?

https://json-schema.org/understanding-json-schema/reference/annotations

You should not only consider those schemas as a way to "validate" (in this case there is no validation), but also as definitions used to:

  • create parser/generator/serializer/deserializer & types into other programming languages (and how to do automatic (or semi-auto) translation)
  • generate documentation
  • ...

What it will look like in an API? A anyOf is often translated into an inheritance or an Union or a Enum. And parser will need a way to select the correct (sub)type. Eg for the value "low" in json is it a enum's member or a string? and same question for "LOW".

@xibz
Copy link
Contributor

xibz commented May 19, 2024

@davidB does this definition affect some tooling or something you are using? Im trying to understand what the exact problem is other than it can be more efficiently written.

And it isn't just documenting. It's specifically saying, this is a string, and here are the list of possible values. The latter point is very important. If we just say string, then that means anything, but we want to limit that.

If there is tooling that is being used that is having trouble with this definition, can we get some more information on that? That may help us figure out a way to approach the problem.

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

@xibz, Yes this definition affect tooling (currently I customize it to say String only). Try to answer the question:

  • How to you think this json schema should be translated into typed languages (try with your favorite)?
  • How a parser/deserializer should convert the json into your types? (how the parser select the right definition String or Enum (but the enum is a string))
  • What is the look of a generated documentation (from the type and from the json schema, eg when this json schema is imported into an OpenAPI documentation)?
  • is "low" (enum) same as "low" (string) in typed language and what about "LOW"?

With a tool like https://app.quicktype.io/, it becomes a simple String, but if I do word to word manual translation of this definition in rust it should be something like:

Enum Priority { // anyOf
  PriorityEnum(PriorityEnum),
  String(String),
}

Enum PriorityEnum {
  Low,
  Medium,
  High,
}

But on parsing, deserialization PriorityEnum will never be used, because it requires custom code to say if the string value in ... then use enum, else use string. (My issue is the need of this custom code for each type/field)

And for user it also complexity the usage.

json enum are for exhausted list.

@xibz
Copy link
Contributor

xibz commented May 20, 2024

After playing around some with the JSON schema, I believe anyOf should be oneOf. I noticed you did mention that oneOf would complain about the issue, but would still be fine. Would that work for you, @davidB?

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

oneOf is better IMHO in term of types and how to deserialize data.
oneOf is like XOR, a value can be A or B but not both. I guess that a json validator will failed with the current definition because the enum and the string are not mutually exclusive (eg with a "low"). oneOf use discriminator or "duck typing" to check that there is no overlap between possible type.

IMO, oneOf highlights the issue. Using the enum to show some possible value is not the right approach.

To be clear what will work for me (and for tool and reader) is something like:

    "priority": {
      "description": "An indicator of the importance of the ticket",
      "examples": [
          "low",
          "medium",
          "high"
      ],    
      "type": "string"
    }

@xibz
Copy link
Contributor

xibz commented May 20, 2024

I think anyOf is still valid in this case, but I could see why it is confusing. I think the more confusing portion is here

"anyOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  },
  { 
    "type": "string" <- This is not needed
  }
]

I will go ahead and mark this as a bug, and we can fix it to look like,

"oneOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  }
]

Would that work for you? I think this will solve all the issues you are seeing at least with the tool

@xibz xibz added the bug Something isn't working label May 20, 2024
@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

in this case why using a oneOf? (there is only one possible type the enum)

@xibz
Copy link
Contributor

xibz commented May 20, 2024

@davidB Because a string'd enum is different than an normal enum, e.g. C where it is an integer, and we also want to explicitly say it is a string, and not anything else. It's mostly for clarity.

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

in json-schema enum are string.
What I mean is why

"oneOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  }
]

and not

    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]

Sorry I don't understand the value of oneOf

@xibz
Copy link
Contributor

xibz commented May 20, 2024

@davidB No, they don't have to be strings. We are explicitly saying that they are strings. This is important because anyone wanting to make a change to the spec, will know that it is explicitly as a string.

"oneOf": [
  {
    "enum": [
      1, <- this is valid
      "low",
      "medium",
      "high"
    ]
  }
]

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

My question is why using a "oneOf" with an array of 1 value (the enum)?

@xibz
Copy link
Contributor

xibz commented May 20, 2024

@davidB Oh, sorry, I misunderstood what you are asking. It's not an array in practice. That's just part of the spec.

Basically it is saying, we have some enum value, and it can be a, b, or c. The a, b, or c is just a list in the schema, but the list itself is not represented in code to be generated, etc.

I hope that makes sense?

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

oneOf and enum are unrelated.

@xibz
Copy link
Contributor

xibz commented May 20, 2024

Oh! I completely mistook what you were saying! Gotcha! hahah, now I see the confusion. Okay, yea, this is definitely not modeled correctly. Okay, I will have this fixed with

"enum": [
      "low",
      "medium",
      "high"
    ]

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

if a, b, c should not be represented in code, then enum is not the right json-schema medium. It's why I suggest examples.
Also note that using examples will also help to be in sync in the md files (and maybe to generate part of them) and tool like schemathesis,...

@xibz
Copy link
Contributor

xibz commented May 20, 2024

Okay, I think I understand what was trying to be modeled. So basically we want any string, but want to provide some defaults, like low, medium, high in the SDKs. I think this is primarily used to generate some defaults in the SDK.

Just providing string is fine, and having it documented that the string is low, medium, or high, but that isn't as useful as providing that in an SDK.

@xibz xibz removed the bug Something isn't working label May 20, 2024
@afrittoli
Copy link
Contributor

oneOf is better IMHO in term of types and how to deserialize data. oneOf is like XOR, a value can be A or B but not both. I guess that a json validator will failed with the current definition because the enum and the string are not mutually exclusive (eg with a "low"). oneOf use discriminator or "duck typing" to check that there is no overlap between possible type.

IMO, oneOf highlights the issue. Using the enum to show some possible value is not the right approach.

To be clear what will work for me (and for tool and reader) is something like:

    "priority": {
      "description": "An indicator of the importance of the ticket",
      "examples": [
          "low",
          "medium",
          "high"
      ],    
      "type": "string"
    }

Interesting - is examples here a custom field? Does jsonschema allow for that?
I would like the SDK to include several constants which correspond to the values in examples, so that SDK users can benefit from those or add their own value - would it be possible to achieve that with a custom schema field?

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

examples is part of json-schema draft 6, see https://json-schema.org/understanding-json-schema/reference/annotations

and used by some OpenAPI and contract-testing tool like schemathesis to validate API or generate samples or validate API

@xibz
Copy link
Contributor

xibz commented May 20, 2024

@davidB Examples are nice, but I think it's really important to have the values modeled in the SDKs. I believe examples, will not be used in code generation? So it may not be as useful

@davidB
Copy link
Contributor Author

davidB commented May 20, 2024

@xibz Can you provide an example (in a programming language) about how you think those values could be modeled in the SDKs?

@xibz
Copy link
Contributor

xibz commented May 20, 2024

@davidB

type Priority string

var Low Priority = "low"
var Medium Priority = "medium"
var High Priority = "high"

Something like this would be sufficient

Then if a user wanted a custom value they could do something like

fmt.Println(cdevents.Priority("my-custom-value"))
// or use a predefined value we've specified to help consistency
fmt.Println(cdevents.Low)

@afrittoli afrittoli added this to the v0.5 milestone May 20, 2024
@afrittoli
Copy link
Contributor

@xibz We discussed this during the working group today and came to this proposal, let us know what you think:

  • for v0.4.0 SDKs, we will hardcode string as the type for fields that use the anyOf
  • in v0.5 we will change the jsonschema to what @davidB proposed:
"priority": {
      "description": "An indicator of the importance of the ticket",
      "examples": [
          "low",
          "medium",
          "high"
      ],    
      "type": "string"
    }
  • in v0.5 SDKs can expose values from the examples field using some custom code in the generator, e.g.
type Priority string

var Low Priority = "low"
var Medium Priority = "medium"
var High Priority = "high"

FYI @e-backmark-ericsson @rjalander

@xibz
Copy link
Contributor

xibz commented May 20, 2024

@afrittoli I think that makes sense except for generating based on examples. Mostly that examples does not seem like the appropriate place to generate defaults from. However, if we want to change the semantics of examples then Im fine with it.

This isn't used for validation, but may help with explaining the effect and purpose of the schema to a reader

While we could generate from there, other enums from examples may cause issues if we do not plan to have them as generated. So it feels hacky.

Im personally fine with the current definition, as it is valid. However, if there is a more correct way to write it, I'd be for it. I don't know if examples is the correct way though. With that said, though, Im not too overly worried about it and I think if it unblocks @davidB, we can move forward with it in v0.5.

@afrittoli
Copy link
Contributor

Different modelling languages might allows us to express this better, so that's something we should look into.

@afrittoli afrittoli moved this from Todo to Backlog in CDEvents Releases Oct 21, 2024
@xibz xibz moved this from Backlog to Todo in CDEvents Releases Nov 11, 2024
@xibz xibz self-assigned this Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Status: No status
Development

No branches or pull requests

3 participants