-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic model for deployment config #211
base: main
Are you sure you want to change the base?
Conversation
Pydantic has a rich system for validation, so I recommend we work inside that https://docs.pydantic.dev/latest/concepts/validators/#before-after-wrap-and-plain-validators. |
So... I don't think this is in the spirit of how pyglider is meant to work. We are not in the business of enforcing the metadata that gets put into to netcdf files. Indeed we want to be as flexible as possible to allow users to write metadata to suit whatever purpose they have. Perhaps I'm misunderstanding, but this pr seems to be enforcing the metadata standard we currently have an example of. I think glider can and should provide examples of metadata compliant yaml files, I don't think we want to be another place where compliance is checked and try to keep that compliance up to date with changes to other compliance checkers. |
I should have explained better in the original description. It was close to end of day when I made this so it was a bit rushed :) . Most of the models in this PR were populated by an LLM based on the example config. This is only to act as a starting point for discussion. I thought it would be good to put make this PR for the tests, config loading, and to act as a place to discuss the models.
I 100% agree. The motivation of this PR isn't to validate the metadata, but more-so to document and act as a single source of truth for defining the deployment config schema (and data types). So that users and developers can look at one file and see:
By having a single source of truth it:
This is to provide more intuitive error messages for users (e.g., getting a "Your YAML file is incorrect. Missing field ...") rather than getting a key error, value error, or type error at runtime. If (for example in the case of (5)), a date needed to be read in for data processing purposes, and that date is expected to be of form I'm not familiar with this project, so I need help to define the models/schema. Hopefully that makes sense. I don't think we'd need to use the full validation capabilities of pydantic (i.e., (5) - since I expect the only data used in actual data processing would be calibration numbers, with the rest mainly for metadata) but I do think its handy for clearly defining the schema. |
c6705bc
to
eaefbdf
Compare
Adding Pydantic model for ingesting and validating the deployment config.
This is a first draft so that maintainers can provide feedback and insight into how the fields are used. We can use Pydantic to provide custom validation.
Feel free to push to this branch. You have a better idea on what the configs should be. I highly recommend that we use the objects provided in the YAML specification as much as possible (e.g., for dates we can provide date objects rather than strings. See "Example 2.22 Timestamps" of the specification) to streamline ingestion.
@callumrollo @jklymak