Adds support for static models #373

dalonsoa · 2024-01-23T17:09:29Z

Description

This PR setups the models such that they can be frozen or static, i.e. they are setup once and they are used the same, without updating, for the rest of the simulation. This is controlled by a static flag included in the configuration.

There are two options for static models:

A frozen model: updated once in the first iteration and then use those values for any future update. In this case, NONE of the variables setup during the update can be provided.
A pre-set model: pulling data from a dataset that contains the relevant variables for the whole time range of the simulations. In this case ALL of the data variables that the model is supposed to update - or provide during the setup process - need to be provided.

Documentation about using the flag has been added to the config.md file, but I'm not sure if there's a better place.

Most modifications to the test have consisted in bypassing the static checks because, as the tests were designed, they were all failing. In all cases, only some of the variables required for init or for update were present in the data object, typically provided as a fixture, which were making the checks to decide on the staticity of the models fail. Options are to leave the patches used as they are, or to add all the missing variables. However, that might be a lot of work and I'm not entirely sure if it is worth the effort.

Fixes # N/A

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

jacobcook1995

I got slightly lost in one bit of the pseudocode, but the general approach seems sensible and flexible.

jacobcook1995 · 2024-01-24T13:56:54Z

virtual_rainforest/core/base_model.py

+        """
+        if self._static_data is None:
+            self._update(time_index, **kwargs)
+            self._build_static_data(time_index)


Not sure that I follow this bit, does the update happen and then new static data gets built for the next time step?

Exactly. This is run only once, the first time that we are required to run the update and, thereafter the updated values are used for updating the data object again and again. We need to run update at least once in this case since, otherwise, we do not know what to update the data with.

If pre-set data is provided as an external file/s, then the static_data object is not none and this step is not needed.

Keeping aside changes in initialisation/validation, the only change to existing models would be to replace update by _update, since they are overwritting only one of the two options for updating the model. The _update_static option is not model specific, and therefore is coded directly in the base model.

Ahh right that makes sense now, cheers!

vgro

I am not so familiar with self._, but the approach looks generally good to me.

davidorme · 2024-01-29T10:38:08Z

This makes a lot sense to me - the flow of the logic for the "update once and then repeat" is very clean.

I wonder if the static_data could be use_static_data and the model class itself is then responsible for locating the update_vars in Data and checking all is well. That static data assembly could be outside of the BaseModel but it seems like a more natural home? One thing this should support is the data providing only a single time step, which then just gets re-used. Users could always just repeat the same values across all time steps, but it would be good to avoid that! It was behind my thinking on the original constant option in the Data configuration, which might be worth reviving - if a grid is loaded as a constant, use that as a constant, but it has to be explicitly flagged as constant.

There are no end of repeating patterns and sequences we might want to push in (La Niña, El Niño), but that seems like an evolution 😄

dalonsoa · 2024-01-31T10:38:46Z

I wonder if the static_data could be use_static_data and the model class itself is then responsible for locating the update_vars in Data and checking all is well. That static data assembly could be outside of the BaseModel but it seems like a more natural home?

I'm not entirely sure what you mean here. Do you mean having a method called use_static_data that pulls the static data itself from somewhere else - outside of the model class - and put it in the relevant place in the Data object? Isn't that the intention of the _update_with_static_data method I sketched? Maybe I'm misunderstanding something.

One thing this should support is the data providing only a single time step, which then just gets re-used. Users could always just repeat the same values across all time steps, but it would be good to avoid that! It was behind my thinking on the original constant option in the Data configuration, which might be worth reviving - if a grid is loaded as a constant, use that as a constant, but it has to be explicitly flagged as constant.

I guess that the model can hold a single copy of that single step data, but then in the Data object it will need to update timestep after timestep to create the time series (when relevant), so other models can use it. I think that models that use that data should not need to worry about it being static or not, but just take the appropriate time from the Data object and go on with it, right?

davidorme · 2024-01-31T12:11:28Z

I'm not entirely sure what you mean here.

Yeah - wasn't a model of clarity was it. So:

If users are providing frozen data (rather than looping the first update) then the most obvious current way to do that is for them to add those variables in the configuration. Those variable names normally wouldn't be included in the data configuration because they are calculated internally, but it is a simple existing route to getting the frozen data into the simulation.
If the Data instance then contains the variables, we don't need to pass in an xr.Dataset to the frozen model. We can replace:
```
static_data: Optional[xr.Dataset] = None,
```
with
```
use_static_data: bool = False,
```
and then the _build_static_data method is expecting to be able to find the right variables in self.Data. use_static_data is then simply an explicit switch between using the first update versus looking for all the required frozen data in the Data object.

davidorme · 2024-01-31T12:15:00Z

I guess that the model can hold a single copy of that single step data, but then in the Data object it will need to update timestep after timestep to create the time series (when relevant), so other models can use it. I think that models that use that data should not need to worry about it being static or not, but just take the appropriate time from the Data object and go on with it, right?

Yup - agreed.

dalonsoa · 2024-07-12T11:00:35Z

Next week, this one. If you have any update or comment on what was said above, please let me know.

TaranRallings

Makes sense to me.

dalonsoa · 2024-07-24T05:38:10Z

I've been trying to progress on this before going on leave but I've hit a blocker - precisely because of the brand new variables system.

For the case of the frozen model - updates once and then repeat for future iterations - things are clear as the inputs are the same as a normal model. I have not implemented it but I do not foresee any problem there: it will make a local copy of the relevant data (the vars_updated) after that first iteration and just keep using that down the line.

Now, for the pre-set option things have gotten more complicated after implementing the variables system. As discussed above, that preset data will need to be passed as part of the data object, but that would mean that the model has different vars required for update when it is static or when it is not: in the first case, the pre-set data will need to be present, but in the normal case it will not, in general (or at least, not necessarily). As these variables are set before the model instances are initialised, when collecting the models, there is no way to modify them at runtime based on the model input settings (the static flag).

There are a couple of options that come to my mind, none of them satisfactory:

Create static model subclasses with the right variables, eg PlantStaticModel, but that defeats the purpose of having these simple flags to switch between model types.
Postpone the validation step to after the models have been initialised, so the flags can be used to define one or other set of variables depending on the model being preset or not, but that means risking initialising models that are incompatible, in addition not not allowing the variable system to define the model execution order (future work).
Pass the pre-set data directly to the model, without using the data object. It will be a question of updating the from_config methods of the different models to load that data if the appropriate flags and paths to the relevant file are provided in the config. This will create a local static_data: Data object that can be used to update the main data object each iteration. This might be the less problematic option if we use the same tooling of the data system to load and validate the preset data.

It will be good to have your feedback on this, so I can work on it as soon as I'm back from leave.

davidorme · 2024-08-06T14:57:37Z

Yeah - that's a pain. First thoughts:

I don't like the static subclasses or postponed validation options - admittedly a knee jerk response though.
The last option feels cleaner.

Having said that, I haven't got my head around how we work with avoiding running code in static mode models.

If we insist that a static model can either run in frozen mode or in preset, but you have to preset all variables required by the model, then I can see a route for the static data object for preset variables. If we have a static_data attribute for BaseModel, then the super.__init__ and super.update methods could include an automatic check if all the required variables for that method are present in static_data and then simply copy them across to data and set a flag that allows the subclass to shortcut the actual method code. Something like:

def __init__(self,  ...):

    super().__init__(data=data, static_data=static_data, core_components=core_components, **kwargs)
    
    if self.entirely_static_init:
        return

However that can't work if a user doesn't provide all the required variables, because the actual model code needs to run to fill in the missing variables for the method. At that point, if we are to avoid a whole ton of unpleasant if var not in self.static_data statements, we would have to run the actual model and then overwrite the calculations from static_data at the end of the method.
If we're accepting that computational expense - and I'm not sure we should - then another way we could do this might be to have a "static" flag in the data setup. That could then be added as an attribute to the variable in Data and the code to set data could then simply refuse to set "static" variables. We've gone to the expense of calculating it, but the model isn't allowed to set it. This doesn't raise an error, just logs that it is refusing to overwrite a static variable. That removes the need for code at the end of the method overwriting with the preset data.

This reminds me a little of #188!

Thinking about it - that "static" flag might allow us to avoid the extra static_data object.

A model might start and immediately run the super.__init__,
That runs a BaseModel._check_entirely_static method and that looks at Data noting if all the required variables are present and flagged and static.
If they are all there and static, we can set a flag to shortcut the whole method.
If only some are there and static, then we have to calculate them at each time step but refuse to actually set them and hold the static values.
And if any are there but not static - we throw an error? Attempt to provide an internally calculated variable.

dalonsoa · 2024-08-21T15:09:48Z

I've been working on this and after a few trials and errors I think there are only 3 scenarios involving the static flag that make sense:

The flag is provided and ALL the variables that would have been created/updated in init and/or update are provided as inputs: In this case, we just bypass init and/or update. This applies to init and update independently.
The flag is provided and NOT ALL of those variables are provided, only some: We rise an error. Updating some and not others risk inconsistencies between the variables updated by the model. This applies to init and update independently.
The flag is provided and NONE of those variables are provided: In this case we run the methods once and then bypass them in any future call. This only applies to update, since init is run only once, anyway. For those variables that are a time series, we might need to update the Data object with the previous value - which is always the same - but I'm not sure.

I will implement a prototype of how that bypassing can be done with minimal fuss.

dalonsoa · 2024-08-21T17:21:32Z

Ok, so here you have another take at this. The code seems verbose because I tried to be thorough with the error messages, but actual lines changed are very few.

This implements two methods:

One that checks if the setup (the new init, see below) needs to be bypassed based on the presence of the static flags and the variables available in the data object.
Another that checks if the update method should be run once when if the model is static and the relevant variables are not present in the data object.

For this to work, the following changes will need to be done to all the models:

Everything in init after the super().__init__(...) call should be moved to a new abstract method called _setup. I guess we can use setup if you don't have any use for that one anymore. This method should take as input everything that is not already passed to super().__init__(...), as that will be already available.
The subclasses init method should be removed. So no custom init anymore, just custom _setup. If it is present, there will be an exception.
Subclasses will need to override _update and NOT update, as it is now. If they override update, there will be an error, as with __init__.
In the from_config method, the models will need to read a static flag from the configuration and pass it to the class constructor. If the flag is not present, the default is not being static.
Optionally, the input files will need to contain the data that will be used by the static model when it is operating in a preset model - i.e. all data provided externally.

OK, so give it some thoughts and we can move from here. The pending things/open questions are:

Do we need to update the data object even when the model is static, coping the previous entry in the case of time series?
Do we need to do anything to load the data files related to preset data?
I guess that we will need to update the schemas to allow for a static flag, right?

jacobcook1995 · 2024-08-22T08:46:31Z

Do we need to update the data object even when the model is static, coping the previous entry in the case of time series?

The data object currently stores the present state of the model variables as well as time series data for inputs, i.e. regional climate projections over time. The variables that represent the current state get output at every time step (I think we currently output all of them, but it could be a user defined subset). There's nothing within the data object that records the previous state of the model itself. I feel like this means the data object doesn't need to be updated for static models as the present state is by definition unchanging.

I guess we might want to change how the data object is used so that previous values are stored (for at least some variables) at some point in the future. But I feel like this would be a fairly big refactor which would also inevitably change some of the static model logic so probably isn't worth hedging against

dalonsoa · 2024-09-03T04:43:43Z

@davidorme , did you have a chance to look into the new version of this?

davidorme · 2024-09-03T08:40:42Z

@dalonsoa I hadn't had a chance to review this new version yet. I'll try and have a look this week - for a mix of school inset days and thesis vivas to juggle.

dalonsoa · 2024-09-03T15:47:07Z

No rush. I was mentioning just in reference to #541

dalonsoa · 2024-10-24T15:40:09Z

OK, I got the errors:

For "soil":

Non-static model soil requires none of the variables in vars_populated_by_first_update or vars_updated to be present in the data object. Present variables: ['soil_c_pool_microbe', 'soil_c_pool_necromass', 'soil_c_pool_lmwc', 'soil_enzyme_maom', 'soil_enzyme_pom', 'soil_c_pool_pom', 'soil_c_pool_maom']

For "litter:

Non-static model litter requires none of the variables in vars_populated_by_first_update or vars_updated to be present in the data object. Present variables: ['c_p_ratio_below_structural', 'c_n_ratio_above_structural', 'litter_pool_above_structural', 'litter_pool_above_metabolic', 'c_p_ratio_above_structural', 'litter_pool_below_structural', 'lignin_below_structural', 'lignin_above_structural', 'c_p_ratio_below_metabolic', 'c_p_ratio_above_metabolic', 'c_n_ratio_below_metabolic', 'c_n_ratio_below_structural', 'litter_pool_below_metabolic', 'c_n_ratio_woody', 'c_p_ratio_woody', 'litter_pool_woody', 'c_n_ratio_above_metabolic', 'lignin_woody']

So, the solution would be to get rid of those from the data object in the example, but chances are that will break a lot of things.

davidorme · 2024-10-24T15:53:13Z

Or @jacobcook1995 is this simply that you've now implemented the calculation of things we had to feed in by hand before, and we just need to update the data? That seems quite likely - there's been a lot of movement on those models.

jacobcook1995 · 2024-10-25T10:29:52Z

This error seems to be related to the content of vars_updated (rather than vars_populated_by_first_update). They are all things that the soil and litter models track but no other model uses. I'm not sure if I have misunderstood vars_updated but I assumed it was about the output from the models rather than the inputs to the models?

jacobcook1995 · 2024-10-25T10:33:30Z

This error seems to be related to the content of vars_updated (rather than vars_populated_by_first_update). They are all things that the soil and litter models track but no other model uses. I'm not sure if I have misunderstood vars_updated but I assumed it was about the output from the models rather than the inputs to the models?

Actually this isn't right sorry, the animal model needs to know about the size of the litter pools so there is a dependance there

jacobcook1995 · 2024-10-25T10:38:04Z

All of those variables do need to stay in the example data though as they are required for model initialisation. Maybe it's that they are all in vars_required_for_init, vars_required_for_update and vars_updated which might not be the correct approach

jacobcook1995

In general looks good to me, just had a query about the error message that was causing problems above

virtual_ecosystem/core/base_model.py

alexdewar

LGTM! I've put some small comments and suggestions.

docs/source/using_the_ve/configuration/config.md

virtual_ecosystem/core/base_model.py

virtual_ecosystem/models/abiotic/abiotic_model.py

tests/core/test_base_model.py

tests/models/abiotic/test_abiotic_model.py

docs/source/using_the_ve/configuration/config.md

Co-authored-by: Alex Dewar <[email protected]>

dalonsoa · 2024-11-06T09:21:08Z

I'm planning to work on #581 from next week, but for that I really need this PR agreed and merged. Is there any barrier for that to happen?

vgro

Apologies for the delayed response, I was pretty much out of office for the last 6 weeks. I do not see any reason why you shouldn't merge this branch, in particular if it stops you from proceeding.

dalonsoa · 2024-12-11T11:17:16Z

Yes, please. We really need to merge this - or ditch it altogether. I've already had to fix merge conflicts several times because - obviously - you keep working and adding stuff incompatible with the changes here, and today I've wasted 3h trying to incorporate the latest changes to the animal model without success because in the data object there's some data that should not be there.

@TaranRallings , I get this error:

virtual_ecosystem.core.exceptions.ConfigurationError: Non-static model animal requires none of the variables in vars_populated_by_first_update or vars_updated to be present in the data object. Present variables: total_animal_respiration

Either total_animal_respiration should not be in the dummy_animal_data object used by the prepared_animal_model_instance fixture or total_animal_respiration should not be one of the vars_populated_by_first_update neither vars_updated. I cannot fix that myself.

So, keeping aside this being a barrier or not, once @TaranRallings has fixed the above issue, please merge or close this PR without merging, but do not leave it lingering around for longer.

Adds psudocode for static models

a96e90a

dalonsoa requested review from davidorme, jacobcook1995, vgro and robewers01 January 23, 2024 17:09

dalonsoa marked this pull request as draft January 23, 2024 17:09

jacobcook1995 approved these changes Jan 24, 2024

View reviewed changes

vgro approved these changes Jan 29, 2024

View reviewed changes

davidorme approved these changes Jan 29, 2024

View reviewed changes

jacobcook1995 requested a review from TaranRallings July 12, 2024 12:03

TaranRallings approved these changes Jul 15, 2024

View reviewed changes

Integrate renamed project

45fb958

dalonsoa added 3 commits August 21, 2024 17:58

Add bypassing of methods

6f9b5c5

Prevent overriding init

f0f90f7

Prevent overriding update

fcdc3ee

Merge branch 'develop' into frozen_models

32a08d4

dalonsoa mentioned this pull request Sep 3, 2024

Retiring BaseModel.setup #541

Open

9 tasks

dalonsoa changed the title ~~Adds psudocode for static models~~ Adds support for static models Oct 25, 2024

jacobcook1995 reviewed Oct 25, 2024

View reviewed changes

virtual_ecosystem/core/base_model.py Outdated Show resolved Hide resolved

dalonsoa added 5 commits October 25, 2024 14:21

♻️ Remove vars requires for init from update check.

05ff05b

♻️ Adapt cli and main tests.

5d3a200

♻️ Adapt litter tests.

0171525

♻️ Adapt soil tests.

763289f

♻️ Clean up a bit hydrology tests.

7a7e27b

alexdewar approved these changes Oct 29, 2024

View reviewed changes

dalonsoa commented Oct 30, 2024

View reviewed changes

docs/source/using_the_ve/configuration/config.md Outdated Show resolved Hide resolved

dalonsoa and others added 5 commits October 30, 2024 16:17

Apply suggestions from code review

d691a34

Co-authored-by: Alex Dewar <[email protected]>

♻️ Include reviewers comments.

75f1253

✅ Fix failing tests.

1ce3aa4

Merge branch 'develop' into frozen_models

cc0590d

Merge branch 'develop' into frozen_models

2bcb3e9

dalonsoa and others added 2 commits November 6, 2024 09:25

♻️ Fix issue resulting from merge conflict.

4948a80

Merge branch 'develop' into frozen_models

e70adb2

vgro approved these changes Dec 9, 2024

View reviewed changes

dalonsoa added 2 commits December 11, 2024 11:00

🔀 Fix conflicts and finish merge.

23ab6d7

♻️ Remove unnecessary call to setup.

d513125

TaranRallings added 2 commits December 13, 2024 11:55

Fixing error with animal respiration in testing data.

78d1a2a

Merge branch 'develop' into frozen_models

732cea3

TaranRallings merged commit 1a92bef into develop Dec 13, 2024
13 checks passed

davidorme mentioned this pull request Jan 1, 2025

Function to generate data / folder structure from a template #649

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds support for static models #373

Adds support for static models #373

dalonsoa commented Jan 23, 2024 •

edited

Loading

jacobcook1995 left a comment •

edited

Loading

jacobcook1995 Jan 24, 2024

dalonsoa Jan 24, 2024

jacobcook1995 Jan 24, 2024

vgro left a comment

davidorme commented Jan 29, 2024

dalonsoa commented Jan 31, 2024

davidorme commented Jan 31, 2024 •

edited

Loading

davidorme commented Jan 31, 2024

dalonsoa commented Jul 12, 2024

TaranRallings left a comment

dalonsoa commented Jul 24, 2024

davidorme commented Aug 6, 2024

dalonsoa commented Aug 21, 2024

dalonsoa commented Aug 21, 2024 •

edited

Loading

jacobcook1995 commented Aug 22, 2024

dalonsoa commented Sep 3, 2024

davidorme commented Sep 3, 2024

dalonsoa commented Sep 3, 2024

dalonsoa commented Oct 24, 2024 •

edited

Loading

davidorme commented Oct 24, 2024

jacobcook1995 commented Oct 25, 2024

jacobcook1995 commented Oct 25, 2024 •

edited

Loading

jacobcook1995 commented Oct 25, 2024

jacobcook1995 left a comment

alexdewar left a comment

dalonsoa commented Nov 6, 2024

vgro left a comment

dalonsoa commented Dec 11, 2024 •

edited

Loading

Adds support for static models #373

Adds support for static models #373

Conversation

dalonsoa commented Jan 23, 2024 • edited Loading

Description

Type of change

Key checklist

Further checks

jacobcook1995 left a comment • edited Loading

Choose a reason for hiding this comment

jacobcook1995 Jan 24, 2024

Choose a reason for hiding this comment

dalonsoa Jan 24, 2024

Choose a reason for hiding this comment

jacobcook1995 Jan 24, 2024

Choose a reason for hiding this comment

vgro left a comment

Choose a reason for hiding this comment

davidorme commented Jan 29, 2024

dalonsoa commented Jan 31, 2024

davidorme commented Jan 31, 2024 • edited Loading

davidorme commented Jan 31, 2024

dalonsoa commented Jul 12, 2024

TaranRallings left a comment

Choose a reason for hiding this comment

dalonsoa commented Jul 24, 2024

davidorme commented Aug 6, 2024

dalonsoa commented Aug 21, 2024

dalonsoa commented Aug 21, 2024 • edited Loading

jacobcook1995 commented Aug 22, 2024

dalonsoa commented Sep 3, 2024

davidorme commented Sep 3, 2024

dalonsoa commented Sep 3, 2024

dalonsoa commented Oct 24, 2024 • edited Loading

davidorme commented Oct 24, 2024

jacobcook1995 commented Oct 25, 2024

jacobcook1995 commented Oct 25, 2024 • edited Loading

jacobcook1995 commented Oct 25, 2024

jacobcook1995 left a comment

Choose a reason for hiding this comment

alexdewar left a comment

Choose a reason for hiding this comment

dalonsoa commented Nov 6, 2024

vgro left a comment

Choose a reason for hiding this comment

dalonsoa commented Dec 11, 2024 • edited Loading

dalonsoa commented Jan 23, 2024 •

edited

Loading

jacobcook1995 left a comment •

edited

Loading

davidorme commented Jan 31, 2024 •

edited

Loading

dalonsoa commented Aug 21, 2024 •

edited

Loading

dalonsoa commented Oct 24, 2024 •

edited

Loading

jacobcook1995 commented Oct 25, 2024 •

edited

Loading

dalonsoa commented Dec 11, 2024 •

edited

Loading