Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove JSON customization for anyOf elements #264

Draft
wants to merge 12 commits into
base: rf-unionany
Choose a base branch
from

Conversation

candleindark
Copy link
Member

@candleindark candleindark commented Nov 5, 2024

This PR removes JSON customization for anyOf elements. The changes in this PR will allow the tests to pass for #257.

I think the JSON customization for the anyOf elements were put in for the specific needs of the web UI. The web UI may no longer need the customization at this point. (@satra Is the customization still needed).

Regardless of the need of the web UI, this customization generates schema element such as:

        "genotype": {
          "anyOf": [
            {
              "$ref": "#/$defs/GenotypeInfo"
            },
            {
              "type": "string"
            }
          ],
          "default": null,
          "description": "Genotype descriptor of participant or subject if available",
          "nskey": "dandi",
          "title": "Genotype",
          "type": "object"
        }

which is inconsistent since a string is not an object in JSON.

If such a customization is still needed, I think we can change the existing customization from

            if anyOf is not None:
                if len(anyOf) > 1 and any(["$ref" in val for val in anyOf]):
                    value["type"] = "object"

to

            if anyOf is not None:
                if len(anyOf) > 1 and all(["$ref" in val for val in anyOf]):
                    value["type"] = "object"

Note:
This PR contains the merge with the master branch so that the support of Python 3.8 is dropped.

pre-commit-ci bot and others added 11 commits October 8, 2024 00:24
updates:
- [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0)
- [github.com/psf/black: 24.8.0 → 24.10.0](psf/black@24.8.0...24.10.0)
before we went for -12 since needed 3.8. Let's see if would work for 3.9
Drop Python 3.8 support (remove typing_extensions from depends)
We already use DOI: prefix in few places seems, to chose it over direct
URL.
Add support for detection and addition of ome/ngff "standard" into assets summary
This eliminate error in validating an `anyOf` element containing
a `string` element for example
Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (rf-unionany@b2205ac). Learn more about missing BASE report.

Files with missing lines Patch % Lines
dandischema/metadata.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             rf-unionany     #264   +/-   ##
==============================================
  Coverage               ?   91.78%           
==============================================
  Files                  ?       16           
  Lines                  ?     1740           
  Branches               ?        0           
==============================================
  Hits                   ?     1597           
  Misses                 ?      143           
  Partials               ?        0           
Flag Coverage Δ
unittests 91.78% <90.90%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@satra
Copy link
Member

satra commented Nov 5, 2024

@candleindark - check with the meditor. that customization was for use with the vue schema editor. i would suggest checking against: https://koumoul-dev.github.io/vuetify-jsonschema-form/latest/ (you can paste the schema into their codepen examples and see if those sections where info is customized gives those options.

@candleindark
Copy link
Member Author

@candleindark - check with the meditor. that customization was for use with the vue schema editor. i would suggest checking against: https://koumoul-dev.github.io/vuetify-jsonschema-form/latest/ (you can paste the schema into their codepen examples and see if those sections where info is customized gives those options.

Thanks for the info. I assume the codepen you are referring to is the one at https://koumoul-dev.github.io/vuetify-jsonschema-form/latest/editor. I tried it with DANDI's released schemas at https://github.com/dandi/schema/tree/master/releases/0.6.8, but it didn't accept any of the schema without modifications. Even with modifications to eliminate some of the errors, I was not able get all the fields displayed in a form. The image below depicts the codepen with a modified version of the PublishedDandiset schema.

Screenshot 2024-11-06 at 10 02 19 AM

Do the aforementioned schemas at https://github.com/dandi/schema/tree/master/releases/0.6.8 undergo further transformation before they are used for defining the web UI?

@satra
Copy link
Member

satra commented Nov 6, 2024

the version of the ui that's used in the meditor is fixed, so upstream may have diverged.

@mvandenburgh - two questions for you:

  1. can the meditor be used in standalone mode to test schemas?
  2. do you know if updating the version results in any issues for the meditor?

having a standalone testable component would be good for @candleindark - could you please advise him on how to do so?

@yarikoptic
Copy link
Member

those customizations were added in a bulk commit 771e390 without clear reasoning. Thus it is unclear indeed how meditor UI would use it (if at all) e.g. if it is the case of anyOf between object and string, and what did it do before when there is an array with objects -- I would assume it would have validated every object individually and thus annotating type at the level higher containing that thing should not help anyhow.

@dandi/dandiarchive

  • how is metadata validation is happening in the meditor?
    • in particular whenever it is "type": "object"
  • how meditor UI elements created
    • if it has "type": "object" vs when it doesn't?
FTR: if we remove this customization - effects
diff -Naur ../schema/releases/0.6.8/asset.json /tmp/schema-remove-anyOf-1/0.6.8/asset.json
--- ../schema/releases/0.6.8/asset.json	2024-11-06 14:35:48.925363658 -0500
+++ /tmp/schema-remove-anyOf-1/0.6.8/asset.json	2024-11-06 14:38:32.554578757 -0500
@@ -1375,8 +1375,7 @@
           "default": null,
           "description": "A commonly used identifier forthe characteristic represented by the property. For example, a known prefix like DOI or a full URL.",
           "nskey": "schema",
-          "title": "Property ID",
-          "type": "object"
+          "title": "Property ID"
         }
       },
       "required": [
diff -Naur ../schema/releases/0.6.8/published-asset.json /tmp/schema-remove-anyOf-1/0.6.8/published-asset.json
--- ../schema/releases/0.6.8/published-asset.json	2024-11-06 14:35:48.925363658 -0500
+++ /tmp/schema-remove-anyOf-1/0.6.8/published-asset.json	2024-11-06 14:38:32.586578985 -0500
@@ -1375,8 +1375,7 @@
           "default": null,
           "description": "A commonly used identifier forthe characteristic represented by the property. For example, a known prefix like DOI or a full URL.",
           "nskey": "schema",
-          "title": "Property ID",
-          "type": "object"
+          "title": "Property ID"
         }
       },
       "required": [
@@ -2054,8 +2053,7 @@
       "description": "The URL should contain the provenance of the publishing process.",
       "nskey": "dandi",
       "readOnly": true,
-      "title": "Published by",
-      "type": "object"
+      "title": "Published by"
     },
     "datePublished": {
       "format": "date-time",
diff -Naur ../schema/releases/0.6.8/published-dandiset.json /tmp/schema-remove-anyOf-1/0.6.8/published-dandiset.json
--- ../schema/releases/0.6.8/published-dandiset.json	2024-11-06 14:35:48.925363658 -0500
+++ /tmp/schema-remove-anyOf-1/0.6.8/published-dandiset.json	2024-11-06 14:38:32.522578528 -0500
@@ -1318,8 +1318,7 @@
       "description": "The URL should contain the provenance of the publishing process.",
       "nskey": "dandi",
       "readOnly": true,
-      "title": "Published by",
-      "type": "object"
+      "title": "Published by"
     },
     "datePublished": {
       "format": "date-time",

We decided to explore the validity overall of specifying "type": "object" at the level above. I think it is invalid -- then jsonschema validator seems to validate value against any of the anyOf and if passes individual type validation, jumps to validate for the property where "type": "object" is defined. And e.g. in this

extracted minimal propertyId definition + support of a list of them
{
  "$defs": {
      "IdentifierType": {
          "description": "An enumeration of identifiers",
          "enum": [
            "dandi:doi",
            "dandi:orcid",
            "dandi:ror",
            "dandi:dandi",
            "dandi:rrid"
          ],
          "title": "IdentifierType",
          "type": "string"
      }
  },
  "properties": {
      "propertyID": {
          "anyOf": [
            {
              "$ref": "#/$defs/IdentifierType"
            },
            {
              "format": "uri",
              "minLength": 1,
              "type": "string"
            }
          ],
          "default": null,
          "description": "A commonly used identifier forthe characteristic represented by the property. For example, a known prefix like DOI or a full URL.",
          "nskey": "schema",
          "title": "Property ID",
          "type": "object"
        },
     "propertyIDs": {
        "items": {
            "$ref": "#/properties/propertyID"
          },
        "type": "array"
    }
    }
}

validation of this

{
    "propertyIDs": [
        "dandi:doi",
        "http://example.com",
        "illegalstringvalue"
    ]
}

leads to following errors:

❯ check-jsonschema --verbose --schemafile ../trash/conflicting-schema.json ../trash/conflicting-schema-examples.json
Schema validation errors were encountered.
  ../trash/conflicting-schema-examples.json::$.propertyIDs[0]: 'dandi:doi' is not of type 'object'
  ../trash/conflicting-schema-examples.json::$.propertyIDs[1]: 'http://example.com' is not of type 'object'
  ../trash/conflicting-schema-examples.json::$.propertyIDs[2]: 'illegalstringvalue' is not valid under any of the given schemas
  Underlying errors caused this.

  Best Match:
    $.propertyIDs[2]: 'illegalstringvalue' is not one of ['dandi:doi', 'dandi:orcid', 'dandi:ror', 'dandi:dandi', 'dandi:rrid']
  All Errors:
    $.propertyIDs[2]: 'illegalstringvalue' is not one of ['dandi:doi', 'dandi:orcid', 'dandi:ror', 'dandi:dandi', 'dandi:rrid']
    $.propertyIDs[2]: 'illegalstringvalue' is not a 'uri'
  ../trash/conflicting-schema-examples.json::$.propertyIDs[2]: 'illegalstringvalue' is not of type 'object'

thus failing all 3 cases, even when valid. Our luck is that nobody yet used any of those seems to me.

If we change schema and remove "type": "object" as this PR suggests (dirty, need to be cleanedup) then we get following output

❯ check-jsonschema --verbose --schemafile ../trash/conflicting-schema.json ../trash/conflicting-schema-examples.json
Schema validation errors were encountered.
  ../trash/conflicting-schema-examples.json::$.propertyIDs[2]: 'illegalstringvalue' is not valid under any of the given schemas
  Underlying errors caused this.

  Best Match:
    $.propertyIDs[2]: 'illegalstringvalue' is not one of ['dandi:doi', 'dandi:orcid', 'dandi:ror', 'dandi:dandi', 'dandi:rrid']
  All Errors:
    $.propertyIDs[2]: 'illegalstringvalue' is not one of ['dandi:doi', 'dandi:orcid', 'dandi:ror', 'dandi:dandi', 'dandi:rrid']
    $.propertyIDs[2]: 'illegalstringvalue' is not a 'uri'

so, as desired, failure only for the "illegalstringvalue".

Moreover, note also that underlying type was not even object -- the current code of

            if anyOf is not None:
                if len(anyOf) > 1 and all(["$ref" in val for val in anyOf]):
                    value["type"] = "object"

assumes that everything $refed would be object, but it is not really -- it has its own type, which is e.g. a "str" in case of our IdentifierType json serialization (enum in Python).

So I think this PR is on the correct path overall if we care about making jsonschema validation work. But will wait for @mvandenburgh and others to chime in.

Meanwhile @candleindark please extend the test with example (ideally adjusting already existing one) which would lead to jsonschema validation error, unless you remove that "type": "object" assignment.

@satra
Copy link
Member

satra commented Nov 6, 2024

@yarikoptic - this is very specific to the vuejs widget. the version we are using needed those changes to display them. there is no specific rationale except trial and error that led to that solution. (there was an issue that should have been largely handled with ajv integration - koumoul-dev/vuetify-jsonschema-form#278). however, i have vague memory that taking out that transform on the json schema resulted in the meditor breaking. technically, we shouldn't have to modify any part of the json schema from that generated by pydantic, but needed to for vjsf to work. a lot of it is historic, and we don't have an easy mechanism to check impact on meditor.

@candleindark
Copy link
Member Author

@satra Since this modification is specifically for implementing the web UI and can deviate the behavior of the JSON schema from the behavior of the corresponding Pydantic model, wouldn't it be better to remove such modifications from the JSON schema generation by the Pydantic model altogether? That is we generate a JSON schema for a Pydantic model without any special modification for the purpose of implementing the web UI, but when implementing the web UI, we take this JSON schema and modify it to meet the requirements of the web UI. Doing this will allow use to separate the needs of web UI implementation from the model. (Because this and other special modifications for implementing the web UI, the release schemas, https://github.com/dandi/schema/tree/master/releases/0.6.8, do not behave exactly as the corresponding Pydantic models that generated them.)

@satra
Copy link
Member

satra commented Nov 7, 2024

@candleindark - in general i agree. however, i think the meditor just pulls in the schema from the releases. so i would say something would need to still generate this modified schema. i'm fine with it being two stage process. so if you want to turn that into a utility function for now that would be fine.

@candleindark
Copy link
Member Author

@candleindark - in general i agree. however, i think the meditor just pulls in the schema from the releases. so i would say something would need to still generate this modified schema. i'm fine with it being two stage process. so if you want to turn that into a utility function for now that would be fine.

It seems to me that the UI related modifications should be done outside of the dandi-schema project, and the dandi-schema project should always publish the JSON schemas as close to the originating Pydantic model as possible at https://github.com/dandi/schema/tree/master/releases. Can we have the meditor or some other part of dandi-archive modify the published schemas at https://github.com/dandi/schema/tree/master/releases for UI purposes?

@satra
Copy link
Member

satra commented Nov 8, 2024

as i said, ideally yes done elsewhere, but even more ideally no changes. we should get @mvandenburgh input here though.

@mvandenburgh
Copy link
Member

I'm not quite sure I understand the motivation behind this PR yet, I need to read through some of the history here and maybe ask some follow up questions before I can provide any input. But just to unblock things, to answer your first two questions @satra -

  1. can the meditor be used in standalone mode to test schemas?

You can test VJSF's compatibility with the schema with this codepen. That is the closest thing to a standalone component we currently have. It's not perfect, since we do some customizations to VJSF in the dandi web app, but it is a decent way to test compatibility and it's caught schema-related issues for us before.

  1. do you know if updating the version results in any issues for the meditor?

What version are you referring to here? VJSF? If so - I'm not sure, but I can make a PR to bump it to the latest and see if it causes any issues in the deploy preview. We also have some E2E browser tests for the meditor.

One thing I will mention that we need to be careful of is that, as far as I know, everything discussed in this issue is still true dandi/dandi-archive#1975. So, we should make sure to coordinate between the dandi-archive and dandi-schema/dandi-cli before doing any releases of dandi-schema with these changes when we get to that point.

@yarikoptic
Copy link
Member

@mvandenburgh if would be of help, we would be happy to jump on a zoom call some time together to discuss it interactively. In a nutshell, my summary is: as I have demonstrated current jsonschema can never validate some items with "type": "object" which have anyOf some $ref pointing to non-"object" types. @satra says it was done for VJSF.

@mvandenburgh if you could point us to place or codediff to try a specific different version of dandischema (e.g. if we patch and make available from some temp URL) as in our web UI, then we could with @candleindark try it out with current and then patched one to see how UI is effected if at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants