Pydantic model for deployment config #211

VeckoTheGecko · 2024-12-12T14:15:49Z

Adding Pydantic model for ingesting and validating the deployment config.

This is a first draft so that maintainers can provide feedback and insight into how the fields are used. We can use Pydantic to provide custom validation.

Feel free to push to this branch. You have a better idea on what the configs should be. I highly recommend that we use the objects provided in the YAML specification as much as possible (e.g., for dates we can provide date objects rather than strings. See "Example 2.22 Timestamps" of the specification) to streamline ingestion.

@callumrollo @jklymak

VeckoTheGecko · 2024-12-12T14:16:36Z

Pydantic has a rich system for validation, so I recommend we work inside that https://docs.pydantic.dev/latest/concepts/validators/#before-after-wrap-and-plain-validators.

jklymak · 2024-12-12T15:31:00Z

So... I don't think this is in the spirit of how pyglider is meant to work. We are not in the business of enforcing the metadata that gets put into to netcdf files. Indeed we want to be as flexible as possible to allow users to write metadata to suit whatever purpose they have.

Perhaps I'm misunderstanding, but this pr seems to be enforcing the metadata standard we currently have an example of. I think glider can and should provide examples of metadata compliant yaml files, I don't think we want to be another place where compliance is checked and try to keep that compliance up to date with changes to other compliance checkers.

VeckoTheGecko · 2024-12-13T10:40:42Z

I should have explained better in the original description. It was close to end of day when I made this so it was a bit rushed :) .

Most of the models in this PR were populated by an LLM based on the example config. This is only to act as a starting point for discussion. I thought it would be good to put make this PR for the tests, config loading, and to act as a place to discuss the models.

We are not in the business of enforcing the metadata that gets put into to netcdf files. ... Indeed we want to be as flexible as possible to allow users to write metadata to suit whatever purpose they have.

I 100% agree.

The motivation of this PR isn't to validate the metadata, but more-so to document and act as a single source of truth for defining the deployment config schema (and data types). So that users and developers can look at one file and see:

which parts of the schema are free form key: value pairs
which parts have optional keys
which parts have expected keys
which parts have expected keys and types for those values, and
which parts have expected keys, but also values that have to meet certain criteria (e.g., being an integer)

By having a single source of truth it:

Separates the loading and type casting from the using of the data.
Makes the schema explicit rather than implicit. This prevents breaking changes (e.g., deep in the codebase, a variable suddenly becomes expected since it becomes referenced directly)

This is to provide more intuitive error messages for users (e.g., getting a "Your YAML file is incorrect. Missing field ...") rather than getting a key error, value error, or type error at runtime. If (for example in the case of (5)), a date needed to be read in for data processing purposes, and that date is expected to be of form "dd/mm/yyyy" it would be good to error out at the point of config loading instead of runtime if that date is improperly provided (e.g., yyyy-mm-dd or "yyyy-mm-dd").

I'm not familiar with this project, so I need help to define the models/schema.

Hopefully that makes sense. I don't think we'd need to use the full validation capabilities of pydantic (i.e., (5) - since I expect the only data used in actual data processing would be calibration numbers, with the rest mainly for metadata) but I do think its handy for clearly defining the schema.

VeckoTheGecko added 2 commits December 17, 2024 13:09

Add pydantic to environment.yml

f30a801

Add pydantic model for deployment config

eaefbdf

VeckoTheGecko force-pushed the config branch from c6705bc to eaefbdf Compare December 17, 2024 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydantic model for deployment config #211

Pydantic model for deployment config #211

VeckoTheGecko commented Dec 12, 2024

VeckoTheGecko commented Dec 12, 2024

jklymak commented Dec 12, 2024

VeckoTheGecko commented Dec 13, 2024 •

edited

Loading

Pydantic model for deployment config #211

Are you sure you want to change the base?

Pydantic model for deployment config #211

Conversation

VeckoTheGecko commented Dec 12, 2024

VeckoTheGecko commented Dec 12, 2024

jklymak commented Dec 12, 2024

VeckoTheGecko commented Dec 13, 2024 • edited Loading

VeckoTheGecko commented Dec 13, 2024 •

edited

Loading