Pydantic config #976

benmalef · 2024-11-22T10:15:21Z

Fixes #758

Proposed Changes

This is the first implementation of parameters configuration using pydantic.
Please only review it, It is not ready for merging yet.
It passed the test but may cause problems because it has not defined all the configuration parameters.

Checklist

github-actions · 2024-11-22T10:15:34Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sarthakpati · 2024-11-22T14:59:02Z

GANDLF/models/imagenet_unet.py

@@ -253,7 +253,7 @@ def __init__(self, parameters) -> None:
        )

        # all BatchNorm should be replaced with InstanceNorm for DP experiments
-        if "differential_privacy" in parameters:
+        if parameters["differential_privacy"] is not None:


Perhaps this check can be simpler, like this?

Suggested change

if parameters["differential_privacy"] is not None:

if parameters.get("differential_privacy"):

Would that work?

sarthakpati · 2024-11-22T15:01:18Z

GANDLF/utils/pydantic_config.py

This file badly needs documentation and comments.

sarthakpati · 2024-11-22T16:45:28Z

Also, please take a look at the warnings from Codacy to see if any can be fixed: https://app.codacy.com/gh/mlcommons/GaNDLF/pull-requests/976/issues

benmalef · 2024-11-22T19:31:17Z

@sarthakpati, what do you think about the current approach? Right now, we only use the Parameters object for parsing, and convert to dictionary.

For me, it is better to return a Parameters object with all the necessary parameters, and not a dictionary,
but maybe it requires a lot of changes in the source code.

sarthakpati · 2024-11-22T21:20:05Z

better to return a Parameters object with all the necessary parameters, and not a dictionary

Can you provide a brief pro/con reasoning as to why this would be the case? Especially since changing to a different data structure would require significant amount of changes throughout the code.

benmalef · 2024-11-23T15:58:42Z

better to return a Parameters object with all the necessary parameters, and not a dictionary

Can you provide a brief pro/con reasoning as to why this would be the case? Especially since changing to a different data structure would require significant amount of changes throughout the code.

I took a look at the source code, and it is required a lot of changes. So, it is not a good idea. I thought it might offer a better code experience, because Pydantic provides some extra features.

So, as regards the workflow, is it something like this?

szmazurek · 2024-11-23T19:15:46Z

Hey there!
So, @sarthakpati asked for a few words from my side here. Generally, I think it is a super nice idea to use Pydantic here, as this can automate a lot of checks. I think the approach is ok, and I agree with your todo comments @benmalef in the Parameters class, that some of the fields should themselves be Pydantic models and define their own structure and incorporate ALL possible options that can be specified there (of course, if such list is finite). For example, for the loss function, we can define something like this:

class LossFunction(BaseModel):
    name: Literal["ce","cel" ... ]  # (and so on for all the values available in the losses dict)
    reduction: Optional[Literal["mean", "sum"]] = "mean" # here we allow to specify only those we support in our losses impl and also provide default values
    some_other_param_our_losses_use : Optional[Literal["some_value_we_allow",...] = "some_default_value"
    
class Parameters:
      ....
      loss_function: LossFunction

Not sure if this will work if the loss is only a string without a name keyword, but you get the point.
Having that in such form, the workflow can run as follows:

Open yaml and load it as a dict
Pass the loaded dict it to the pydantic Parameters object
Parameters automatically validates if the values passed are correct and inserts defaults
params = pydantic_params_object.model_dump()

And this way we have the parameters in the dict form, validated and ready to use. Additionally, such models hierarchy naturally will become a kind of documentation, where you see all available params and values.

What you guys think? If I was unclear on something ping me, and we can discuss it further. Cheers!

sarthakpati · 2024-11-25T20:28:15Z

Thanks for the detailed explanation, @szmazurek! I updated your answer with some edits to make it clearer for myself. BTW, since you are working on the Lightning integration, this might be relevant since could perhaps be some overlap between that and this PR. Also, we should do 2 separate tags for them so that we have distinct points of references for the users.

@benmalef: any thoughts on what Szymon has put forth?

benmalef · 2024-11-26T10:17:43Z

Thanks for the detailed explanation, @szmazurek! I updated your answer with some edits to make it clearer for myself. BTW, since you are working on the Lightning integration, this might be relevant since could perhaps be some overlap between that and this PR. Also, we should do 2 separate tags for them so that we have distinct points of references for the users.

@benmalef: any thoughts on what Szymon has put forth?

Thanks a lot for the explanation, I agree with that.

My only concern is that:
In this (_parserConfig file), it loads the YAML file and has some validations and some code to regulate or init the parameters. So, might be better to pass the loaded dict to Parameters after this. Here is my proposed workflow:

Open yaml and load it as a dict.
Use the _parserConfig to manipulate the dict (initialize parameters as needed).
Pass the loaded dict it to the pydantic Parameters object.
Parameters automatically validates if the values passed are correct and inserts defaults. This is the second round of initialization.
params = pydantic_params_object.model_dump()

Any thoughts.. ? @szmazurek @sarthakpati

sarthakpati · 2024-11-26T20:27:32Z

@benmalef: there seems to be 2 rounds of initialization in your proposed workflow (points 2 and 4). I think this is going to lead to confusion between both end users and developers, and we need to figure out a way to have all parameter initialization happening in a single place.

benmalef and others added 3 commits November 22, 2024 11:30

Add pyndantic config (#29)

8c4c6dd

Add pyndantic config (#30)

07d0a92

made some code changes

b949c41

sarthakpati reviewed Nov 22, 2024

View reviewed changes

benmalef added 3 commits November 22, 2024 19:22

fix codacy issues

fecfda8

fix codacy issues

deb2d19

fix codacy issues

2c28d45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydantic config #976

Pydantic config #976

benmalef commented Nov 22, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading

sarthakpati Nov 22, 2024

sarthakpati Nov 22, 2024

sarthakpati commented Nov 22, 2024 •

edited

Loading

benmalef commented Nov 22, 2024 •

edited

Loading

sarthakpati commented Nov 22, 2024

benmalef commented Nov 23, 2024

szmazurek commented Nov 23, 2024 •

edited by sarthakpati

Loading

sarthakpati commented Nov 25, 2024 •

edited

Loading

benmalef commented Nov 26, 2024 •

edited by sarthakpati

Loading

sarthakpati commented Nov 26, 2024

	if parameters["differential_privacy"] is not None:
	if parameters.get("differential_privacy"):

Pydantic config #976

Are you sure you want to change the base?

Pydantic config #976

Conversation

benmalef commented Nov 22, 2024 • edited Loading

Proposed Changes

Checklist

github-actions bot commented Nov 22, 2024 • edited Loading

sarthakpati Nov 22, 2024

Choose a reason for hiding this comment

sarthakpati Nov 22, 2024

Choose a reason for hiding this comment

sarthakpati commented Nov 22, 2024 • edited Loading

benmalef commented Nov 22, 2024 • edited Loading

sarthakpati commented Nov 22, 2024

benmalef commented Nov 23, 2024

szmazurek commented Nov 23, 2024 • edited by sarthakpati Loading

sarthakpati commented Nov 25, 2024 • edited Loading

benmalef commented Nov 26, 2024 • edited by sarthakpati Loading

sarthakpati commented Nov 26, 2024

benmalef commented Nov 22, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading

sarthakpati commented Nov 22, 2024 •

edited

Loading

benmalef commented Nov 22, 2024 •

edited

Loading

szmazurek commented Nov 23, 2024 •

edited by sarthakpati

Loading

sarthakpati commented Nov 25, 2024 •

edited

Loading

benmalef commented Nov 26, 2024 •

edited by sarthakpati

Loading