You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While our current validation functionality works well enough on smaller files / hubs with less complex config files, it can be much slower on larger files / more complex config. This has also been noted and reported by the community (e.g. #86)
The most time consuming functions are checking that the combination of values are valid and validating that all required value combinations have been submitted.
There are a number of reasons/bottlenecks
The size of the expanded grid of all possible values in a complex hub can be very large.
The expansion of value combinations which are effectively invalid because their value is related to the value of another variable (e.g target_end_date which only has a single valid value dependant on origin_date and horizon - see Columns where values are dependent on the value of other columns cause problems in value combination validation. #38.) Expanding the values of such task ids unnecessarily increases the size of the expanded value grid while the actual validation is performed via optional validation check hubValidations::opt_check_tbl_horizon_timediff().
These are likely the most effective areas to direct effort to improve performance.
Specific Actions
Perform memory intensive validations in a piecemeal way: Once Allow subsetting of expand_model_out_val_grid() by output type #98 is implemented, we should refactor any checks making use of expand_model_out_val_grid() to perform the checks one output type at a time. This way we avoid burdening memory with the full expanded grid at any one time.
Memoise expand_model_out_val_grid(): As this function is called a number of times but always returns the same result for the same config, it's a good candidate for memoisation (Memoise expand_model_out_vals_grid #85) .
Optimise conc_rows: I've already tried and failed to improve the performance of this function but given it's the main bottleneck to check_tbl_values_required, it feels important to revisit and try again.
Introduce mechanism for excluding task ids from expanded grid of valid values: This relates to task ids like target_end_date in which expanding their values is meaningless yet can be very memory consuming. For such task ids, validation would involve:
validating that the unique values in the task id column are valid with respect to the config (instead of checking them as part of combinations)
Using custom/optional functions to validate expected properties/relationships of such variables.
In expanded grids such task ids would likely be encoded as NAs
The text was updated successfully, but these errors were encountered:
Background
While our current validation functionality works well enough on smaller files / hubs with less complex config files, it can be much slower on larger files / more complex config. This has also been noted and reported by the community (e.g. #86)
The most time consuming functions are checking that the combination of values are valid and validating that all required value combinations have been submitted.
There are a number of reasons/bottlenecks
target_end_date
which only has a single valid value dependant onorigin_date
andhorizon
- see Columns where values are dependent on the value of other columns cause problems in value combination validation. #38.) Expanding the values of such task ids unnecessarily increases the size of the expanded value grid while the actual validation is performed via optional validation checkhubValidations::opt_check_tbl_horizon_timediff()
.conc_rows
to then split the submitted table and check for required values incheck_tbl_values_required
.These are likely the most effective areas to direct effort to improve performance.
Specific Actions
expand_model_out_val_grid()
to perform the checks one output type at a time. This way we avoid burdening memory with the full expanded grid at any one time.expand_model_out_val_grid()
: As this function is called a number of times but always returns the same result for the same config, it's a good candidate for memoisation (Memoiseexpand_model_out_vals_grid
#85) .conc_rows
: I've already tried and failed to improve the performance of this function but given it's the main bottleneck tocheck_tbl_values_required
, it feels important to revisit and try again.target_end_date
in which expanding their values is meaningless yet can be very memory consuming. For such task ids, validation would involve:NA
sThe text was updated successfully, but these errors were encountered: