Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

98/ Subset expanded grid of valid values by output type #107

Merged
merged 40 commits into from
Aug 16, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
70fcdc2
Add argument to create output type subsets of valid value grids
annakrystalli Aug 8, 2024
4468d1b
streamline subsetting of round_config
annakrystalli Aug 8, 2024
6db473c
Document
annakrystalli Aug 8, 2024
3ef92ff
Move config_tasks round utilities to separate file
annakrystalli Aug 9, 2024
21d2f85
Add arg to specify derived task-ids to ignore. Relates to #93
annakrystalli Aug 9, 2024
2aa70e9
optimise check_tbl_values by output_type batching & ignoring derived …
annakrystalli Aug 9, 2024
93a4bf2
Ignore derived task-ids in check_tbl_values_required
annakrystalli Aug 12, 2024
76c8254
Add some comments in test
annakrystalli Aug 13, 2024
27ef0e6
Remove deprecated TODO note
annakrystalli Aug 12, 2024
cead075
delete dplyr loading
annakrystalli Aug 13, 2024
c9ea7e0
Don't lint helper
annakrystalli Aug 13, 2024
a6d0981
style to fix lint issues
annakrystalli Aug 13, 2024
8815afc
Fix lint issues
annakrystalli Aug 13, 2024
2a2e948
add match_tbl_to_model_task function
annakrystalli Aug 14, 2024
2bc332c
refactor check_tbl_value_col to work on smaller subsets of data
annakrystalli Aug 14, 2024
7fe78b4
propagate output subsetting and derived_task_ids arg to spl checks
annakrystalli Aug 14, 2024
1b003f2
propagate output subsetting and derived_task_ids arg to higher level …
annakrystalli Aug 14, 2024
0686ff4
Fix error_tbl bug by using original tbl rowids
annakrystalli Aug 14, 2024
df0815e
remove unused functions
annakrystalli Aug 14, 2024
4f18bca
Fix linter issues
annakrystalli Aug 14, 2024
809a340
Bump version
annakrystalli Aug 14, 2024
451b7a2
Add more detail to NEWS
annakrystalli Aug 14, 2024
d204dae
Merge branch '98/subset-grid-by-out-type' into 93/add-derived-tid-arg
annakrystalli Aug 14, 2024
c6c5423
Add missing early return.
annakrystalli Aug 15, 2024
70afb1d
Make ignored derived task id message clearer
annakrystalli Aug 15, 2024
74ad9a4
rerun snapshot
annakrystalli Aug 15, 2024
1811ede
remove superfluous check
annakrystalli Aug 15, 2024
2cce180
Merge pull request #108 from hubverse-org/93/add-derived-tid-arg
annakrystalli Aug 15, 2024
4a45645
merge earlier PRs
annakrystalli Aug 15, 2024
edbf939
Throw error if invalid output_type supplied instead of ignoring.
annakrystalli Aug 15, 2024
28715b3
add output_types and derived_task_ids arguments to submission_tmpl
annakrystalli Aug 15, 2024
8ec8261
Add note on derived task IDs in pkgdown docs
annakrystalli Aug 15, 2024
2163c5c
Add fn comments
annakrystalli Aug 16, 2024
23b4655
add tests that explicitly check derived_task_ids are ignored
annakrystalli Aug 16, 2024
7b8a953
Update R/check_tbl_values.R
annakrystalli Aug 16, 2024
29c3909
Merge pull request #109 from hubverse-org/93/3-batch-value-validation
annakrystalli Aug 16, 2024
50a1cf9
Merge branch 'main' into 98/subset-grid-by-out-type
annakrystalli Aug 16, 2024
d3af4a7
Fix merge errors
annakrystalli Aug 16, 2024
3359fd9
Use inst example hub, not testdata one in submission_tmpl examples
annakrystalli Aug 16, 2024
bc18984
Add note in NEWS re submission_tmpl args. Correct formating.
annakrystalli Aug 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .lintr
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ linters: linters_with_defaults(
object_length_linter = object_length_linter(length = 50L),
cyclocomp_linter = cyclocomp_linter(complexity_limit = 20L)
)
exclusions: list(
"tests/testthat/helper.R"
)
6 changes: 5 additions & 1 deletion R/check_tbl_spl_compound_taskid_set.R
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,11 @@ check_tbl_spl_compound_taskid_set <- function(tbl, round_id, file_path, hub_path
details = compile_msg(compound_taskid_set),
errors = compile_errors(compound_taskid_set),
error = TRUE,
compound_taskid_set = if (check) { compound_taskid_set } else { NA }
compound_taskid_set = if (check) {
compound_taskid_set
} else {
NA
}
)
}

Expand Down
2 changes: 0 additions & 2 deletions R/check_tbl_spl_compound_tid.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ check_tbl_spl_compound_tid <- function(tbl, round_id, file_path, hub_path,
}

hash_tbl <- spl_hash_tbl(tbl, round_id, config_tasks, compound_taskid_set)
# TODO: Currently, samples must strictly match the compound task ID set expectations
# and cannot handle coarser-grained compound task ID sets.
n_tbl <- hash_tbl[hash_tbl$n_compound_idx > 1L, ]

check <- nrow(n_tbl) == 0L
Expand Down
77 changes: 67 additions & 10 deletions R/expand_model_out_grid.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
#' in the round. Can be used to override the compound task ID set defined in the
#' config. If `NULL` is provided for a given modeling task, a compound task ID set of
#' all task IDs is used.
#' @param output_types character vector of output type names to include.
#' Use to subset for grids for specific output types.
#'
#' @return If `bind_model_tasks = TRUE` (default) a tibble or arrow table
#' containing all possible task ID and related output type ID
Expand Down Expand Up @@ -126,16 +128,12 @@ expand_model_out_grid <- function(config_tasks,
as_arrow_table = FALSE,
bind_model_tasks = TRUE,
include_sample_ids = FALSE,
compound_taskid_set = NULL) {
round_idx <- hubUtils::get_round_idx(config_tasks, round_id)
compound_taskid_set = NULL,
output_types = NULL) {
checkmate::assert_list(compound_taskid_set, null.ok = TRUE)
output_type_id_datatype <- rlang::arg_match(output_type_id_datatype)

round_config <- purrr::pluck(
config_tasks,
"rounds",
round_idx
)
output_types <- validate_output_types(output_types, config_tasks, round_id)
round_config <- get_round_config(config_tasks, round_id)

task_id_l <- purrr::map(
round_config[["model_tasks"]],
Expand All @@ -158,7 +156,13 @@ expand_model_out_grid <- function(config_tasks,
output_type_l <- purrr::map(
round_config[["model_tasks"]],
function(.x) {
.x[["output_type"]]
out <- .x[["output_type"]]
if (is.null(output_types)) {
out
} else {
mt_output_types <- output_types[output_types %in% names(out)]
out[mt_output_types]
}
}
) %>%
purrr::map(
Expand Down Expand Up @@ -258,7 +262,6 @@ process_mt_grid_outputs <- function(x, config_tasks, all_character,
as_arrow_table = TRUE,
bind_model_tasks = TRUE,
output_type_id_datatype = output_type_id_datatype) {

if (bind_model_tasks) {
# To bind multiple modeling task grids together, we need to ensure they contain
# the same columns. Any missing columns are padded with NAs.
Expand Down Expand Up @@ -304,6 +307,9 @@ process_mt_grid_outputs <- function(x, config_tasks, all_character,

# Pad any columns in all_cols missing in x of with new NA columns
pad_missing_cols <- function(x, all_cols) {
if (ncol(x) == 0L) {
return(x)
}
if (inherits(x, "data.frame")) {
x[, all_cols[!all_cols %in% names(x)]] <- NA
return(x[, all_cols])
Expand Down Expand Up @@ -472,3 +478,54 @@ extract_mt_output_type_ids <- function(x, config_tid) {
}
)
}


get_round_config <- function(config_tasks, round_id) {
round_idx <- hubUtils::get_round_idx(config_tasks, round_id)
purrr::pluck(
config_tasks,
"rounds",
round_idx
)
}

get_round_output_types <- function(config_tasks, round_id) {
round_config <- get_round_config(config_tasks, round_id)
purrr::map(
round_config[["model_tasks"]],
~ .x[["output_type"]]
)
}

get_round_output_type_names <- function(config_tasks, round_id,
collapse = TRUE) {
out <- get_round_output_types(config_tasks, round_id) %>%
purrr::map(names)

if (collapse) {
purrr::flatten_chr(out) %>%
unique()
} else {
out
}
}

validate_output_types <- function(output_types, config_tasks, round_id,
call = rlang::caller_call()) {
checkmate::assert_character(output_types, null.ok = TRUE)
if (is.null(output_types)) {
return(NULL)
}
round_output_types <- get_round_output_type_names(config_tasks, round_id)
valid_output_types <- intersect(output_types, round_output_types)
if (length(valid_output_types) == 0L) {
cli::cli_abort(
c(
"x" = "{.val {output_types}} {?is/are} not valid output type{?s}.",
"i" = "{.arg output_types} must be members of: {.val {round_output_types}}"
),
call = call
)
}
valid_output_types
}
2 changes: 1 addition & 1 deletion R/v3-sample-utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ spl_hash_tbl <- function(tbl, round_id, config_tasks, compound_taskid_set = NULL
)
) %>%
purrr::compact() %>%
purrr::imap( ~ dplyr::mutate(.x, mt_id = as.integer(.y))) %>% # add mt_id
purrr::imap(~dplyr::mutate(.x, mt_id = as.integer(.y))) %>% # add mt_id
purrr::list_rbind()
}

Expand Down
6 changes: 5 additions & 1 deletion man/expand_model_out_grid.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

100 changes: 100 additions & 0 deletions tests/testthat/_snaps/expand_model_out_grid.md
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,106 @@
10 2022-12-26 wk ahead inc flu h~ 1 01 sample 4
# i 32 more rows

# expand_model_out_grid output type subsetting works

Code
expand_model_out_grid(config_tasks, round_id = "2022-12-26",
include_sample_ids = TRUE, bind_model_tasks = FALSE, output_types = c("pmf",
"sample"), )
Output
[[1]]
# A tibble: 6 x 6
forecast_date target horizon location output_type output_type_id
<date> <chr> <int> <chr> <chr> <chr>
1 2022-12-26 wk ahead inc flu ho~ 2 US sample 1
2 2022-12-26 wk ahead inc flu ho~ 2 01 sample 1
3 2022-12-26 wk ahead inc flu ho~ 2 02 sample 1
4 2022-12-26 wk ahead inc flu ho~ 1 US sample 2
5 2022-12-26 wk ahead inc flu ho~ 1 01 sample 2
6 2022-12-26 wk ahead inc flu ho~ 1 02 sample 2

[[2]]
# A tibble: 30 x 6
forecast_date target horizon location output_type output_type_id
<date> <chr> <int> <chr> <chr> <chr>
1 2022-12-26 wk flu hosp rate c~ 2 US pmf large_decrease
2 2022-12-26 wk flu hosp rate c~ 1 US pmf large_decrease
3 2022-12-26 wk flu hosp rate c~ 2 01 pmf large_decrease
4 2022-12-26 wk flu hosp rate c~ 1 01 pmf large_decrease
5 2022-12-26 wk flu hosp rate c~ 2 02 pmf large_decrease
6 2022-12-26 wk flu hosp rate c~ 1 02 pmf large_decrease
7 2022-12-26 wk flu hosp rate c~ 2 US pmf decrease
8 2022-12-26 wk flu hosp rate c~ 1 US pmf decrease
9 2022-12-26 wk flu hosp rate c~ 2 01 pmf decrease
10 2022-12-26 wk flu hosp rate c~ 1 01 pmf decrease
# i 20 more rows


---

Code
expand_model_out_grid(config_tasks, round_id = "2022-12-26",
include_sample_ids = TRUE, bind_model_tasks = FALSE, output_types = "sample", )
Output
[[1]]
# A tibble: 6 x 6
forecast_date target horizon location output_type output_type_id
<date> <chr> <int> <chr> <chr> <chr>
1 2022-12-26 wk ahead inc flu ho~ 2 US sample 1
2 2022-12-26 wk ahead inc flu ho~ 2 01 sample 1
3 2022-12-26 wk ahead inc flu ho~ 2 02 sample 1
4 2022-12-26 wk ahead inc flu ho~ 1 US sample 2
5 2022-12-26 wk ahead inc flu ho~ 1 01 sample 2
6 2022-12-26 wk ahead inc flu ho~ 1 02 sample 2

[[2]]
# A tibble: 0 x 0


---

Code
expand_model_out_grid(config_tasks, round_id = "2022-12-26",
include_sample_ids = TRUE, bind_model_tasks = TRUE, output_types = "sample", )
Output
# A tibble: 6 x 6
forecast_date target horizon location output_type output_type_id
<date> <chr> <int> <chr> <chr> <chr>
1 2022-12-26 wk ahead inc flu ho~ 2 US sample 1
2 2022-12-26 wk ahead inc flu ho~ 2 01 sample 1
3 2022-12-26 wk ahead inc flu ho~ 2 02 sample 1
4 2022-12-26 wk ahead inc flu ho~ 1 US sample 2
5 2022-12-26 wk ahead inc flu ho~ 1 01 sample 2
6 2022-12-26 wk ahead inc flu ho~ 1 02 sample 2

---

Code
expand_model_out_grid(config_tasks, round_id = "2022-12-26",
include_sample_ids = FALSE, bind_model_tasks = TRUE, output_types = c(
"random", "sample"), )
annakrystalli marked this conversation as resolved.
Show resolved Hide resolved
Output
# A tibble: 6 x 6
forecast_date target horizon location output_type output_type_id
<date> <chr> <int> <chr> <chr> <chr>
1 2022-12-26 wk ahead inc flu ho~ 2 US sample <NA>
2 2022-12-26 wk ahead inc flu ho~ 1 US sample <NA>
3 2022-12-26 wk ahead inc flu ho~ 2 01 sample <NA>
4 2022-12-26 wk ahead inc flu ho~ 1 01 sample <NA>
5 2022-12-26 wk ahead inc flu ho~ 2 02 sample <NA>
6 2022-12-26 wk ahead inc flu ho~ 1 02 sample <NA>

---

Code
expand_model_out_grid(config_tasks, round_id = "2022-12-26",
include_sample_ids = FALSE, bind_model_tasks = FALSE, output_types = c(
"random"), )
Condition
Error in `expand_model_out_grid()`:
x "random" is not valid output type.
i `output_types` must be members of: "sample", "mean", and "pmf"

# expand_model_out_grid errors correctly

Code
Expand Down
3 changes: 0 additions & 3 deletions tests/testthat/helper.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
suppressPackageStartupMessages({
library(dplyr)
})
# Function to create a valid sample submission for the testdata/hub-spl hub
create_spl_file <- function(round_id, compound_taskid_set = NULL,
hub_path = test_path("testdata/hub-spl"),
Expand Down
22 changes: 11 additions & 11 deletions tests/testthat/test-check_tbl_spl_compound_taskid_set.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,17 +93,17 @@ test_that("Different compound_taskid_sets work", {
# Mock the config file to include all task ids a derived task id depends on
# in the compound_taskid_set but exclude the derived task id itself.
# Currently will fail
config_tasks_full_ctids <- purrr::modify_in(
hubUtils::read_config_file(
fs::path(hub_path, "hub-config", "tasks.json")
),
list(
"rounds", 1, "model_tasks", 2,
"output_type", "sample",
"output_type_id_params", "compound_taskid_set"
),
~ c("reference_date", "horizon", "location", "variant")
)
config_tasks_full_ctids <- purrr::modify_in(
hubUtils::read_config_file(
fs::path(hub_path, "hub-config", "tasks.json")
),
list(
"rounds", 1, "model_tasks", 2,
"output_type", "sample",
"output_type_id_params", "compound_taskid_set"
),
~ c("reference_date", "horizon", "location", "variant")
)

mockery::stub(
check_tbl_spl_compound_taskid_set,
Expand Down
13 changes: 7 additions & 6 deletions tests/testthat/test-check_tbl_spl_compound_tid.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ test_that("Overriding compound_taskid_set in check_tbl_spl_compound_tid works",
c("reference_date", "horizon")
)
tbl_coarse <- create_spl_file("2022-10-22",
compound_taskid_set = compound_taskid_set,
write = FALSE,
out_datatype = "chr",
n_samples = 1L
compound_taskid_set = compound_taskid_set,
write = FALSE,
out_datatype = "chr",
n_samples = 1L
)
elray1 marked this conversation as resolved.
Show resolved Hide resolved


Expand All @@ -65,9 +65,10 @@ test_that("Overriding compound_taskid_set in check_tbl_spl_compound_tid works",
)
)

# Validation providing coarser compound taskid set succeeds
# Validation providing coarser compound taskid set succeeds
expect_snapshot(
check_tbl_spl_compound_tid(tbl_coarse, round_id, file_path, hub_path,
compound_taskid_set = compound_taskid_set)
compound_taskid_set = compound_taskid_set
)
elray1 marked this conversation as resolved.
Show resolved Hide resolved
)
})
14 changes: 7 additions & 7 deletions tests/testthat/test-check_tbl_spl_n.R
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ test_that("Overriding compound_taskid_set in check_tbl_spl_compound_tid works",
c("reference_date", "horizon")
)
tbl_coarse <- create_spl_file("2022-10-22",
compound_taskid_set = compound_taskid_set,
write = FALSE,
out_datatype = "chr",
n_samples = 1L
compound_taskid_set = compound_taskid_set,
write = FALSE,
out_datatype = "chr",
n_samples = 1L
)
elray1 marked this conversation as resolved.
Show resolved Hide resolved

# Validation of coarser files should return check failure.
Expand All @@ -104,9 +104,9 @@ test_that("Overriding compound_taskid_set in check_tbl_spl_compound_tid works",

# Create 100 spls of each compound idx
tbl_full <- create_spl_file("2022-10-22",
compound_taskid_set = compound_taskid_set,
write = FALSE,
out_datatype = "chr"
compound_taskid_set = compound_taskid_set,
write = FALSE,
out_datatype = "chr"
)
elray1 marked this conversation as resolved.
Show resolved Hide resolved

# This succeeds!
Expand Down
3 changes: 2 additions & 1 deletion tests/testthat/test-check_tbl_spl_non_compound_tid.R
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ test_that("Overriding compound_taskid_set in check_tbl_spl_compound_tid works",
# also succeeds.
expect_snapshot(
check_tbl_spl_non_compound_tid(tbl_coarse, round_id, file_path, hub_path,
compound_taskid_set = compound_taskid_set)
compound_taskid_set = compound_taskid_set
)
elray1 marked this conversation as resolved.
Show resolved Hide resolved
)
})
Loading
Loading