Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for static models #373

Merged
merged 49 commits into from
Dec 13, 2024
Merged

Adds support for static models #373

merged 49 commits into from
Dec 13, 2024

Conversation

dalonsoa
Copy link
Collaborator

@dalonsoa dalonsoa commented Jan 23, 2024

Description

This PR setups the models such that they can be frozen or static, i.e. they are setup once and they are used the same, without updating, for the rest of the simulation. This is controlled by a static flag included in the configuration.

There are two options for static models:

  • A frozen model: updated once in the first iteration and then use those values for any future update. In this case, NONE of the variables setup during the update can be provided.
  • A pre-set model: pulling data from a dataset that contains the relevant variables for the whole time range of the simulations. In this case ALL of the data variables that the model is supposed to update - or provide during the setup process - need to be provided.

Documentation about using the flag has been added to the config.md file, but I'm not sure if there's a better place.

Most modifications to the test have consisted in bypassing the static checks because, as the tests were designed, they were all failing. In all cases, only some of the variables required for init or for update were present in the data object, typically provided as a fixture, which were making the checks to decide on the staticity of the models fail. Options are to leave the patches used as they are, or to add all the missing variables. However, that might be a lot of work and I'm not entirely sure if it is worth the effort.

Fixes # N/A

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@dalonsoa dalonsoa marked this pull request as draft January 23, 2024 17:09
Copy link
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got slightly lost in one bit of the pseudocode, but the general approach seems sensible and flexible.

"""
if self._static_data is None:
self._update(time_index, **kwargs)
self._build_static_data(time_index)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I follow this bit, does the update happen and then new static data gets built for the next time step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. This is run only once, the first time that we are required to run the update and, thereafter the updated values are used for updating the data object again and again. We need to run update at least once in this case since, otherwise, we do not know what to update the data with.

If pre-set data is provided as an external file/s, then the static_data object is not none and this step is not needed.

Keeping aside changes in initialisation/validation, the only change to existing models would be to replace update by _update, since they are overwritting only one of the two options for updating the model. The _update_static option is not model specific, and therefore is coded directly in the base model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right that makes sense now, cheers!

Copy link
Collaborator

@vgro vgro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so familiar with self._, but the approach looks generally good to me.

@davidorme
Copy link
Collaborator

This makes a lot sense to me - the flow of the logic for the "update once and then repeat" is very clean.

I wonder if the static_data could be use_static_data and the model class itself is then responsible for locating the update_vars in Data and checking all is well. That static data assembly could be outside of the BaseModel but it seems like a more natural home? One thing this should support is the data providing only a single time step, which then just gets re-used. Users could always just repeat the same values across all time steps, but it would be good to avoid that! It was behind my thinking on the original constant option in the Data configuration, which might be worth reviving - if a grid is loaded as a constant, use that as a constant, but it has to be explicitly flagged as constant.

There are no end of repeating patterns and sequences we might want to push in (La Niña, El Niño), but that seems like an evolution 😄

@dalonsoa
Copy link
Collaborator Author

I wonder if the static_data could be use_static_data and the model class itself is then responsible for locating the update_vars in Data and checking all is well. That static data assembly could be outside of the BaseModel but it seems like a more natural home?

I'm not entirely sure what you mean here. Do you mean having a method called use_static_data that pulls the static data itself from somewhere else - outside of the model class - and put it in the relevant place in the Data object? Isn't that the intention of the _update_with_static_data method I sketched? Maybe I'm misunderstanding something.

One thing this should support is the data providing only a single time step, which then just gets re-used. Users could always just repeat the same values across all time steps, but it would be good to avoid that! It was behind my thinking on the original constant option in the Data configuration, which might be worth reviving - if a grid is loaded as a constant, use that as a constant, but it has to be explicitly flagged as constant.

I guess that the model can hold a single copy of that single step data, but then in the Data object it will need to update timestep after timestep to create the time series (when relevant), so other models can use it. I think that models that use that data should not need to worry about it being static or not, but just take the appropriate time from the Data object and go on with it, right?

@davidorme
Copy link
Collaborator

davidorme commented Jan 31, 2024

I'm not entirely sure what you mean here.

Yeah - wasn't a model of clarity was it. So:

  • If users are providing frozen data (rather than looping the first update) then the most obvious current way to do that is for them to add those variables in the configuration. Those variable names normally wouldn't be included in the data configuration because they are calculated internally, but it is a simple existing route to getting the frozen data into the simulation.
  • If the Data instance then contains the variables, we don't need to pass in an xr.Dataset to the frozen model. We can replace:
    static_data: Optional[xr.Dataset] = None,
    with
    use_static_data: bool = False,
    and then the _build_static_data method is expecting to be able to find the right variables in self.Data. use_static_data is then simply an explicit switch between using the first update versus looking for all the required frozen data in the Data object.

@davidorme
Copy link
Collaborator

I guess that the model can hold a single copy of that single step data, but then in the Data object it will need to update timestep after timestep to create the time series (when relevant), so other models can use it. I think that models that use that data should not need to worry about it being static or not, but just take the appropriate time from the Data object and go on with it, right?

Yup - agreed.

@dalonsoa
Copy link
Collaborator Author

Next week, this one. If you have any update or comment on what was said above, please let me know.

Copy link
Collaborator

@TaranRallings TaranRallings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@dalonsoa
Copy link
Collaborator Author

I've been trying to progress on this before going on leave but I've hit a blocker - precisely because of the brand new variables system.

For the case of the frozen model - updates once and then repeat for future iterations - things are clear as the inputs are the same as a normal model. I have not implemented it but I do not foresee any problem there: it will make a local copy of the relevant data (the vars_updated) after that first iteration and just keep using that down the line.

Now, for the pre-set option things have gotten more complicated after implementing the variables system. As discussed above, that preset data will need to be passed as part of the data object, but that would mean that the model has different vars required for update when it is static or when it is not: in the first case, the pre-set data will need to be present, but in the normal case it will not, in general (or at least, not necessarily). As these variables are set before the model instances are initialised, when collecting the models, there is no way to modify them at runtime based on the model input settings (the static flag).

There are a couple of options that come to my mind, none of them satisfactory:

  • Create static model subclasses with the right variables, eg PlantStaticModel, but that defeats the purpose of having these simple flags to switch between model types.
  • Postpone the validation step to after the models have been initialised, so the flags can be used to define one or other set of variables depending on the model being preset or not, but that means risking initialising models that are incompatible, in addition not not allowing the variable system to define the model execution order (future work).
  • Pass the pre-set data directly to the model, without using the data object. It will be a question of updating the from_config methods of the different models to load that data if the appropriate flags and paths to the relevant file are provided in the config. This will create a local static_data: Data object that can be used to update the main data object each iteration. This might be the less problematic option if we use the same tooling of the data system to load and validate the preset data.

It will be good to have your feedback on this, so I can work on it as soon as I'm back from leave.

@davidorme
Copy link
Collaborator

Yeah - that's a pain. First thoughts:

  • I don't like the static subclasses or postponed validation options - admittedly a knee jerk response though.
  • The last option feels cleaner.

Having said that, I haven't got my head around how we work with avoiding running code in static mode models.

  • If we insist that a static model can either run in frozen mode or in preset, but you have to preset all variables required by the model, then I can see a route for the static data object for preset variables. If we have a static_data attribute for BaseModel, then the super.__init__ and super.update methods could include an automatic check if all the required variables for that method are present in static_data and then simply copy them across to data and set a flag that allows the subclass to shortcut the actual method code. Something like:
def __init__(self,  ...):

    super().__init__(data=data, static_data=static_data, core_components=core_components, **kwargs)
    
    if self.entirely_static_init:
        return
  • However that can't work if a user doesn't provide all the required variables, because the actual model code needs to run to fill in the missing variables for the method. At that point, if we are to avoid a whole ton of unpleasant if var not in self.static_data statements, we would have to run the actual model and then overwrite the calculations from static_data at the end of the method.

  • If we're accepting that computational expense - and I'm not sure we should - then another way we could do this might be to have a "static" flag in the data setup. That could then be added as an attribute to the variable in Data and the code to set data could then simply refuse to set "static" variables. We've gone to the expense of calculating it, but the model isn't allowed to set it. This doesn't raise an error, just logs that it is refusing to overwrite a static variable. That removes the need for code at the end of the method overwriting with the preset data.

This reminds me a little of #188!

Thinking about it - that "static" flag might allow us to avoid the extra static_data object.

  • A model might start and immediately run the super.__init__,
  • That runs a BaseModel._check_entirely_static method and that looks at Data noting if all the required variables are present and flagged and static.
  • If they are all there and static, we can set a flag to shortcut the whole method.
  • If only some are there and static, then we have to calculate them at each time step but refuse to actually set them and hold the static values.
  • And if any are there but not static - we throw an error? Attempt to provide an internally calculated variable.

@dalonsoa
Copy link
Collaborator Author

I've been working on this and after a few trials and errors I think there are only 3 scenarios involving the static flag that make sense:

  1. The flag is provided and ALL the variables that would have been created/updated in init and/or update are provided as inputs: In this case, we just bypass init and/or update. This applies to init and update independently.
  2. The flag is provided and NOT ALL of those variables are provided, only some: We rise an error. Updating some and not others risk inconsistencies between the variables updated by the model. This applies to init and update independently.
  3. The flag is provided and NONE of those variables are provided: In this case we run the methods once and then bypass them in any future call. This only applies to update, since init is run only once, anyway. For those variables that are a time series, we might need to update the Data object with the previous value - which is always the same - but I'm not sure.

I will implement a prototype of how that bypassing can be done with minimal fuss.

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Aug 21, 2024

Ok, so here you have another take at this. The code seems verbose because I tried to be thorough with the error messages, but actual lines changed are very few.

This implements two methods:

  • One that checks if the setup (the new init, see below) needs to be bypassed based on the presence of the static flags and the variables available in the data object.
  • Another that checks if the update method should be run once when if the model is static and the relevant variables are not present in the data object.

For this to work, the following changes will need to be done to all the models:

  • Everything in init after the super().__init__(...) call should be moved to a new abstract method called _setup. I guess we can use setup if you don't have any use for that one anymore. This method should take as input everything that is not already passed to super().__init__(...), as that will be already available.
  • The subclasses init method should be removed. So no custom init anymore, just custom _setup. If it is present, there will be an exception.
  • Subclasses will need to override _update and NOT update, as it is now. If they override update, there will be an error, as with __init__.
  • In the from_config method, the models will need to read a static flag from the configuration and pass it to the class constructor. If the flag is not present, the default is not being static.
  • Optionally, the input files will need to contain the data that will be used by the static model when it is operating in a preset model - i.e. all data provided externally.

OK, so give it some thoughts and we can move from here. The pending things/open questions are:

  • Do we need to update the data object even when the model is static, coping the previous entry in the case of time series?
  • Do we need to do anything to load the data files related to preset data?
  • I guess that we will need to update the schemas to allow for a static flag, right?

@jacobcook1995
Copy link
Collaborator

  • Do we need to update the data object even when the model is static, coping the previous entry in the case of time series?

The data object currently stores the present state of the model variables as well as time series data for inputs, i.e. regional climate projections over time. The variables that represent the current state get output at every time step (I think we currently output all of them, but it could be a user defined subset). There's nothing within the data object that records the previous state of the model itself. I feel like this means the data object doesn't need to be updated for static models as the present state is by definition unchanging.

I guess we might want to change how the data object is used so that previous values are stored (for at least some variables) at some point in the future. But I feel like this would be a fairly big refactor which would also inevitably change some of the static model logic so probably isn't worth hedging against

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Sep 3, 2024

@davidorme , did you have a chance to look into the new version of this?

@dalonsoa dalonsoa mentioned this pull request Sep 3, 2024
9 tasks
@davidorme
Copy link
Collaborator

@dalonsoa I hadn't had a chance to review this new version yet. I'll try and have a look this week - for a mix of school inset days and thesis vivas to juggle.

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Sep 3, 2024

No rush. I was mentioning just in reference to #541

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Oct 24, 2024

OK, I got the errors:

For "soil":

Non-static model soil requires none of the variables in vars_populated_by_first_update or vars_updated to be present in the data object. Present variables: ['soil_c_pool_microbe', 'soil_c_pool_necromass', 'soil_c_pool_lmwc', 'soil_enzyme_maom', 'soil_enzyme_pom', 'soil_c_pool_pom', 'soil_c_pool_maom']

For "litter:

Non-static model litter requires none of the variables in vars_populated_by_first_update or vars_updated to be present in the data object. Present variables: ['c_p_ratio_below_structural', 'c_n_ratio_above_structural', 'litter_pool_above_structural', 'litter_pool_above_metabolic', 'c_p_ratio_above_structural', 'litter_pool_below_structural', 'lignin_below_structural', 'lignin_above_structural', 'c_p_ratio_below_metabolic', 'c_p_ratio_above_metabolic', 'c_n_ratio_below_metabolic', 'c_n_ratio_below_structural', 'litter_pool_below_metabolic', 'c_n_ratio_woody', 'c_p_ratio_woody', 'litter_pool_woody', 'c_n_ratio_above_metabolic', 'lignin_woody']

So, the solution would be to get rid of those from the data object in the example, but chances are that will break a lot of things.

@davidorme
Copy link
Collaborator

Or @jacobcook1995 is this simply that you've now implemented the calculation of things we had to feed in by hand before, and we just need to update the data? That seems quite likely - there's been a lot of movement on those models.

@dalonsoa dalonsoa changed the title Adds psudocode for static models Adds support for static models Oct 25, 2024
@jacobcook1995
Copy link
Collaborator

This error seems to be related to the content of vars_updated (rather than vars_populated_by_first_update). They are all things that the soil and litter models track but no other model uses. I'm not sure if I have misunderstood vars_updated but I assumed it was about the output from the models rather than the inputs to the models?

@jacobcook1995
Copy link
Collaborator

jacobcook1995 commented Oct 25, 2024

This error seems to be related to the content of vars_updated (rather than vars_populated_by_first_update). They are all things that the soil and litter models track but no other model uses. I'm not sure if I have misunderstood vars_updated but I assumed it was about the output from the models rather than the inputs to the models?

Actually this isn't right sorry, the animal model needs to know about the size of the litter pools so there is a dependance there

@jacobcook1995
Copy link
Collaborator

All of those variables do need to stay in the example data though as they are required for model initialisation. Maybe it's that they are all in vars_required_for_init, vars_required_for_update and vars_updated which might not be the correct approach

Copy link
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me, just had a query about the error message that was causing problems above

virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I've put some small comments and suggestions.

docs/source/using_the_ve/configuration/config.md Outdated Show resolved Hide resolved
virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved
virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved
virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved
virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved
virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved
virtual_ecosystem/core/base_model.py Show resolved Hide resolved
tests/core/test_base_model.py Show resolved Hide resolved
tests/models/abiotic/test_abiotic_model.py Show resolved Hide resolved
@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Nov 6, 2024

I'm planning to work on #581 from next week, but for that I really need this PR agreed and merged. Is there any barrier for that to happen?

Copy link
Collaborator

@vgro vgro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed response, I was pretty much out of office for the last 6 weeks. I do not see any reason why you shouldn't merge this branch, in particular if it stops you from proceeding.

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Dec 11, 2024

Yes, please. We really need to merge this - or ditch it altogether. I've already had to fix merge conflicts several times because - obviously - you keep working and adding stuff incompatible with the changes here, and today I've wasted 3h trying to incorporate the latest changes to the animal model without success because in the data object there's some data that should not be there.

@TaranRallings , I get this error:

virtual_ecosystem.core.exceptions.ConfigurationError: Non-static model animal requires none of the variables in vars_populated_by_first_update or vars_updated to be present in the data object. Present variables: total_animal_respiration

Either total_animal_respiration should not be in the dummy_animal_data object used by the prepared_animal_model_instance fixture or total_animal_respiration should not be one of the vars_populated_by_first_update neither vars_updated. I cannot fix that myself.

So, keeping aside this being a barrier or not, once @TaranRallings has fixed the above issue, please merge or close this PR without merging, but do not leave it lingering around for longer.

@TaranRallings TaranRallings merged commit 1a92bef into develop Dec 13, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants