Add validate-pr and validate-submission vignettes. Resolves #56 & #64

hubverse-org · Dec 22, 2023 · df290cb · df290cb
1 parent fde519f
commit df290cb
Show file tree

Hide file tree

Showing 2 changed files with 320 additions and 0 deletions.
diff --git a/vignettes/articles/validate-pr.Rmd b/vignettes/articles/validate-pr.Rmd
@@ -0,0 +1,183 @@
+---
+title: "Validating Pull Requests on GitHub"
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{r setup}
+library(hubValidations)
+```
+
+## Running validation checks on a Pull Request with `validate_pr()`
+
+The `validate_pr()` functions is designed to be used to validate team submissions through Pull Requests on GitHub. 
+Only model output and model metadata files are individually validated using `validate_submission()` on each file. 
+As part of checks, however, hub config files are also validated. 
+Any other files included in the PR are ignored but flagged in a message.
+
+### Deploying `validate_pr()` though a GitHub Action workflow
+
+The most common way to deploy `validate_pr()` is through a GitHub Action that triggers when a pull request containing changes to model output or model metadata files is opened. 
+The hubverse maintains the [**`validate-submission.yaml`**](https://github.com/Infectious-Disease-Modeling-Hubs/hubverse-actions/tree/main/validate-submission) GitHub Action workflow template for deploying `validate_pr()`.
+
+The latest release of the workflow can be added to hub's GitHub Action workflows using the `hubCI` package:
+```{r, eval = FALSE}
+hubCI::use_hub_github_action("validate-submission")
+```
+
+
+The pertinent section of the workflow is:
+
+```yaml
+      - name: Run validations
+        env:
+          PR_NUMBER: ${{ github.event.number }}
+        run: |
+          library("hubValidations")
+          v <- hubValidations::validate_pr(
+              gh_repo = Sys.getenv("GITHUB_REPOSITORY"),
+              pr_number = Sys.getenv("PR_NUMBER"),
+              skip_submit_window_check = FALSE
+          )
+          hubValidations::check_for_errors(v, verbose = TRUE)
+        shell: Rscript {0}
+```
+where `validate_pr()` is called on the contents of the current Pull Request, the results (an S3 `<hub_validations>` class object) is stored in `v` and then `check_for_errors()` used to signal whether overall validations have passed or failed and summarise any validation failures.
+
+
+### Skipping submission window checks
+
+Most hubs require that model output files for a given round are submitted within a submission window [defined in the `"submission_due"` property of the `tasks.json` hub config file](https://hubdocs.readthedocs.io/en/latest/quickstart-hub-admin/tasks-config.html#setting-up-submissions-due). 
+
+`validate_pr()` includes submission window checks for model output files and returns a `<warning/check_failure>` condition class object if a file is submitted outside the accepted submission window.
+
+To disable submission window checks, argument `skip_submit_window_check` can be set to `TRUE`.
+
+### Configuring file modification/deletion/renaming checks
+
+For most hubs, **modification, renaming or deletion of previously submitted model output files** or **deletion/renaming of previously submitted model metadata files** is not desirable without justification. They should therefore trigger validation failure and notify hub maintainers of the files affected. 
+At the same time, most hubs prefer to allow modifications to model output files within their allowed submission window.
+
+Reflecting these preferences, by default, `validate_pr()` checks for modification, renaming or deletion of previously submitted model output files and deletion/renaming of previously submitted model metadata files and appends a `<error/check_error>` class objects to the output for each file modification/deletion/renaming detected. 
+It does however allow modifications to model output files within their allowed submission window.
+
+
+```{r}
+  temp_hub <- fs::path(tempdir(), "mod_del_hub")
+  gert::git_clone(
+    url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
+    path = temp_hub,
+    branch = "test-mod-del"
+  )
+```
+
+
+```{r}
+v <- validate_pr(
+      hub_path = temp_hub,
+      gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
+      pr_number = 6,
+      skip_submit_window_check = TRUE
+  )
+
+v
+
+```
+
+
+These settings can be modified if required though the use of arguments `file_modification_check` and `allow_submit_window_mods`.
+
+- **`file_modification_check`** controls whether modification/deletion checks are performed, what is returned if modifications/deletions are detected and accepts one of the following values: 
+
+  - **`"error"`**: Appends a `<error/check_error>` condition class object for each applicable modified/deleted file. Will result in validation workflow failure.
+  - **`"warning"`**: Appends a `<warning/check_warning>` condition class object for each applicable modified/deleted file. Will result in validation workflow failure.
+  - **`"message"`**: Appends a `<message/check_info>` condition class object for each applicable modified/deleted file. Will not result in validation workflow failure.
+  - **`"none"`**: No modification/deletion checks performed.
+
+- **`allow_submit_window_mods`** controls whether modifications/deletions of model output files are allowed within their submission windows. Is set to `TRUE` by default but can be set to `FALSE` if modifications/deletions are not allowed, regardless of timing. 
+Is ignored when checking model metadata files as well as when `file_modification_check` is set to `"none"`.
+
+
+<div class="alert alert-warning" role="alert">
+
+#### Warning
+
+ Note that to establish **relative** submission windows when performing  modification/deletion checks and `allow_submit_window_mods` is `TRUE`, the reference date is taken as the `round_id` extracted from the file path.
+ This is because we cannot extract dates from columns of deleted files. 
+ If hub submission window reference dates do not match round IDs in file paths, currently `allow_submit_window_mods` will not work correctly and is best set to `FALSE`. 
+ This only relates to hubs/rounds where submission windows are determined relative to a reference date and not when explicit submission window start and end dates are provided in the config. 
+
+ For more details on submission window config see [Setting up `"submission_due"`](https://hubdocs.readthedocs.io/en/latest/quickstart-hub-admin/tasks-config.html#setting-up-submissions-due) in the hubverse hubDocs.
+
+</div>
+
+
+## Checking for validation failures with `check_for_errors()`
+
+`check_for_errors()` is used to inspect a `hub_validations` class object, determine whether overall validations have passed or failed and summarise any detected errors/failures.
+
+### Validation failure
+
+If any elements of the `hub_validations` object contain `<error/check_error>`, `<warning/check_warning>` or `<error/check_exec_error>` condition class objects, the function throws an error and prints the messages from the failing checks.
+
+```{r, error=TRUE}
+
+temp_hub <- fs::path(tempdir(), "invalid_sb_hub")
+  gert::git_clone(
+    url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
+    path = temp_hub,
+    branch = "pr-missing-taskid"
+  )
+
+v_fail <- validate_pr(
+    hub_path = temp_hub,
+    gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
+    pr_number = 5,
+    skip_submit_window_check = TRUE
+  )
+
+check_for_errors(v_fail)
+```
+
+### Validation success
+
+If all validations checks pass, `check_for_errors()` returns `TRUE` silently and prints:
+
+```
+✔ All validation checks have been successful.
+```
+
+```{r}
+temp_hub <- fs::path(tempdir(), "valid_sb_hub")
+  gert::git_clone(
+    url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
+    path = temp_hub,
+    branch = "pr-valid"
+  )
+
+  v_pass <- validate_pr(
+    hub_path = temp_hub,
+    gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
+    pr_number = 4,
+    skip_submit_window_check = TRUE
+  )
+  
+  check_for_errors(v_pass)
+```
+
+
+### Verbose output
+
+If printing the results of all checks is preferred instead of just summarising the results of checks that failed, argument `verbose` can be set to `TRUE`. 
+
+```{r, error=TRUE}
+check_for_errors(v_fail, verbose = TRUE)
+
+
+check_for_errors(v_pass, verbose = TRUE)
+```
diff --git a/vignettes/articles/validate-submission.Rmd b/vignettes/articles/validate-submission.Rmd
@@ -0,0 +1,137 @@
+---
+title: "Validating submissions locally"
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{r setup}
+library(hubValidations)
+```
+
+While most hubs will have automated validation systems set up to check contributions during submission, `hubValidations` also provides functionality for validating files locally before submitting them.
+For this, submitting teams can use `validate_submission()` to validate their model output files prior to submitting.
+
+
+### Structure of `hub_validations` object
+
+
+Each named element contains the result of an individual check and inherits from subclass `<hub_check>`. The name of each element is the name of the check.
+
+```{r}
+hub_path <- system.file("testhubs/simple", package = "hubValidations")
+validate_submission(hub_path,
+  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
+)
+```
+
+
+The super class returned depends on the status of the check:
+
+- If a check succeeds, a `<message/check_success>` condition class object is returned.
+
+- If a check is skipped, a `<message/check_info>` condition class object is returned.
+
+- Checks vary with respect to whether they return an `<error/check_error>` or `<warning/check_failure>` condition class object if the check fails. 
+Ultimately, both will cause overall validation to fail and the two classes are used primarily to communicate the severity of a failing check.
+
+### Validation early return
+
+Some checks which are critical to downstream checks will cause validation to stop and return the results of the checks up to and including the critical check that failed early. 
+They generally return a `<error/check_error>` condition class object.
+Any problems identified will need to be resolved and the function rerun for validation to proceed further.
+
+
+```{r}
+validate_submission(hub_path,
+  file_path = "team1-goodmodel/2022-10-15-hub-baseline.csv"
+)
+```
+
+### Execution Errors
+
+If an execution error occurs in any of the checks, an `<error/check_exec_error>` is returned instead. For validation purposes, this results in the same downstream effects as an `<error/check_error>` object.
+
+
+### Checking for errors with `check_for_errors()`
+
+You can check whether your file will overall pass validation checks by passing the `hub_validations` object to `check_for_errors()`. 
+
+If validation fails, an error will be thrown and the failing checks will be summarised.
+
+```{r, error=TRUE}
+validate_submission(hub_path,
+  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
+) %>% 
+    check_for_errors()
+```
+
+
+
+### Skipping the submission window check
+
+If you are preparing your submission prior to the submission window opening, you might want to skip the submission window check.
+You can so by setting argument `skip_submit_window_check` to `TRUE`. 
+
+This results in the previous valid file (except for failing the validation window check) now passing overall validation.
+
+```{r}
+validate_submission(hub_path,
+  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv",
+   skip_submit_window_check = TRUE
+) %>% 
+    check_for_errors()
+```
+
+
+
+## Structure of a `<hub_check>` object
+
+Let's look more closely at the structure of the first few elements of the `hub_validations` object retuned by `validate_submission()`
+
+```{r}
+v <- validate_submission(hub_path,
+  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
+)
+
+str(head(v))
+```
+
+Each `<hub_check>` objects contains the following elements:
+
+- `message`: the result message containing details about the check.
+- `where:`: there the check was performed, usually the model output file name.
+- `call`: the function used to perform the check.
+- `use_cli_format`: whether the message is formatted using cli format, almost always TRUE.
+
+### Extra information
+
+Some `<hub_check>` objects contain extra information about the failing check to help identify affected rows in submissions.
+
+For example, the `<hub_check>` object returned for the `valid_vals` check, which checks that all columns in a model output file (excluding the `value` column) contain valid combinations of task ID / output type / output type ID values contains an additional element called `error_tbl`, with details of the invalid value combinations in the rows affected.
+
+To access `error_tbl` from the output of `validate_submission()` stored in an object `v`, you would use:
+
+```{r, eval=FALSE}
+v$valid_vals$error_tbl
+```
+
+
+## `validate_submission` check details
+
+```{r, echo=FALSE}
+library(kableExtra)
+arrow::read_csv_arrow(system.file("check_table.csv", package = "hubValidations")) %>%
+  dplyr::select(-"parent fun", -"check fun") %>%
+  dplyr::mutate("Extra info" = dplyr::case_when(
+    is.na(.data$`Extra info`) ~ "",
+    TRUE ~ .data$`Extra info`
+  )) %>%
+  knitr::kable(caption = "Details of checks performed by `validate_submission()`") %>%
+  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
+  column_spec(1, bold = TRUE)
+```