-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
93/PR 3/3 - Propagate performance optimisation throughout checks & validation functions #109
93/PR 3/3 - Propagate performance optimisation throughout checks & validation functions #109
Conversation
…task-ids. Related to #93
Merge branch '98/subset-grid-by-out-type' into 93/3-batch-value-validation # Conflicts: # R/check_tbl_spl_compound_tid.R # R/v3-sample-utils.R # tests/testthat/test-check_tbl_spl_compound_tid.R # tests/testthat/test-check_tbl_spl_non_compound_tid.R
# } | ||
# details <- details_bullets_div(details) | ||
# } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just obsolete commented out code which I just removed
) | ||
} | ||
tbl | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two function also seem not to be being used anymore so deleted
@@ -5,18 +5,23 @@ | |||
#' @inherit check_tbl_colnames params | |||
#' @inherit check_tbl_col_types return | |||
#' @export | |||
check_tbl_values_required <- function(tbl, round_id, file_path, hub_path) { | |||
check_tbl_values_required <- function(tbl, round_id, file_path, hub_path, | |||
derived_task_ids = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this function cannot be batched by output type. Rather it needs to continue to be batched by model task to ensure that if optional task ID values are supplied for a given model task, the correct required output types are also supplied. As such, we need to evaluate model tasks as a whole rather than splitting them unless we complicate this already very complicated function even more. I experimented a bit but decided it wasn't worth the effort at this time. Being able to ignore derived task ids already confers speed ups and I'm hoping I'll be able to gain more through speeding up conc_rows()
in the next round of optimisations.
"i" = "{.arg output_types} must be members of: {.val {round_output_types}}" | ||
), | ||
call = call | ||
) | ||
} | ||
valid_output_types | ||
output_types | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix requested by @elray1 here: #107 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good!
- I read through the unit tests fairly carefully. Had a couple of minor questions about whether we might want to check that the
derived_task_ids
are truly ignored (not that I doubt they are) - I read the code changes moderately carefully. Most looked clear. I don't know all the places where we might want/need to add the new
derived_task_ids
argument, but I trust that you found them. There is a lot of logic in some places, I'm not fresh on this code base, and it's not all documented as thoroughly as I would find helpful. So basically I gave up on trying to understand all the code updates. Asked a question in one place. The fact that the unit tests are passing is reassuring :) - Made a suggestion in the documentation but then saw you had written similar text elsewhere.
I will wait to approve changes till I see what your thoughts are on a couple of these, but also not specifically requesting changes.
@@ -319,3 +319,31 @@ test_that("validate_submission handles overriding output type id data type corre | |||
)[["col_types"]] | |||
) | |||
}) | |||
|
|||
test_that("Ignoring derived_task_ids in validate_submission works", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar question to in another test -- to test whether derived_task_ids
were actually ignored, it seems like a test would involve setting a derived task id to something invalid and then seeing that an error was not raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think maybe the question is not relevant where I asked it first, but may be relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points about the tests! I sort of knew from the benchmarks that it was working but admittedly, this wasn't being that explicitly tested throughout (mainly just in the expand_model_out_grid()
tests.
So added a bunch of tests throughout following your suggestion of introducing a deliberate error in 23b4655
Co-authored-by: Evan Ray <[email protected]>
This PR completes the first stage of optimising validation performance described in #93
It builds on the functionality introduced to:
and propagates these to relevant
check_*()
andvalidate_*()
functions to enable output type validation batching as well as the creation of more performant expanded grids during validation.This has lead to significant performance improvements, especially when dealing with complex configs with derived task IDs (see below)
I've also added a section to the documentation to help admins configure their validation workflows to make use of the
derived_taks_ids
feature as at the moment this can only be done manually (but see hubverse-org/schemas#96 for future plans).Benchmarks
So far I have tested with a test model output file and new Respicast config file supplied by @M-7th which includes a derived
target_end_date
task id and have got extremely promising results with these fixes!!View Respicast config
System Info
Created on 2024-08-16 with reprex v2.1.0