Skip to content

Commit

Permalink
Add validate-pr and validate-submission vignettes. Resolves #56 & #64
Browse files Browse the repository at this point in the history
  • Loading branch information
annakrystalli committed Dec 22, 2023
1 parent fde519f commit df290cb
Show file tree
Hide file tree
Showing 2 changed files with 320 additions and 0 deletions.
183 changes: 183 additions & 0 deletions vignettes/articles/validate-pr.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
title: "Validating Pull Requests on GitHub"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

```{r setup}
library(hubValidations)
```

## Running validation checks on a Pull Request with `validate_pr()`

The `validate_pr()` functions is designed to be used to validate team submissions through Pull Requests on GitHub.
Only model output and model metadata files are individually validated using `validate_submission()` on each file.
As part of checks, however, hub config files are also validated.
Any other files included in the PR are ignored but flagged in a message.

### Deploying `validate_pr()` though a GitHub Action workflow

The most common way to deploy `validate_pr()` is through a GitHub Action that triggers when a pull request containing changes to model output or model metadata files is opened.
The hubverse maintains the [**`validate-submission.yaml`**](https://github.com/Infectious-Disease-Modeling-Hubs/hubverse-actions/tree/main/validate-submission) GitHub Action workflow template for deploying `validate_pr()`.

The latest release of the workflow can be added to hub's GitHub Action workflows using the `hubCI` package:
```{r, eval = FALSE}
hubCI::use_hub_github_action("validate-submission")
```


The pertinent section of the workflow is:

```yaml
- name: Run validations
env:
PR_NUMBER: ${{ github.event.number }}
run: |
library("hubValidations")
v <- hubValidations::validate_pr(
gh_repo = Sys.getenv("GITHUB_REPOSITORY"),
pr_number = Sys.getenv("PR_NUMBER"),
skip_submit_window_check = FALSE
)
hubValidations::check_for_errors(v, verbose = TRUE)
shell: Rscript {0}
```
where `validate_pr()` is called on the contents of the current Pull Request, the results (an S3 `<hub_validations>` class object) is stored in `v` and then `check_for_errors()` used to signal whether overall validations have passed or failed and summarise any validation failures.


### Skipping submission window checks

Most hubs require that model output files for a given round are submitted within a submission window [defined in the `"submission_due"` property of the `tasks.json` hub config file](https://hubdocs.readthedocs.io/en/latest/quickstart-hub-admin/tasks-config.html#setting-up-submissions-due).

`validate_pr()` includes submission window checks for model output files and returns a `<warning/check_failure>` condition class object if a file is submitted outside the accepted submission window.

To disable submission window checks, argument `skip_submit_window_check` can be set to `TRUE`.

### Configuring file modification/deletion/renaming checks

For most hubs, **modification, renaming or deletion of previously submitted model output files** or **deletion/renaming of previously submitted model metadata files** is not desirable without justification. They should therefore trigger validation failure and notify hub maintainers of the files affected.
At the same time, most hubs prefer to allow modifications to model output files within their allowed submission window.

Reflecting these preferences, by default, `validate_pr()` checks for modification, renaming or deletion of previously submitted model output files and deletion/renaming of previously submitted model metadata files and appends a `<error/check_error>` class objects to the output for each file modification/deletion/renaming detected.
It does however allow modifications to model output files within their allowed submission window.


```{r}
temp_hub <- fs::path(tempdir(), "mod_del_hub")
gert::git_clone(
url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
path = temp_hub,
branch = "test-mod-del"
)
```


```{r}
v <- validate_pr(
hub_path = temp_hub,
gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
pr_number = 6,
skip_submit_window_check = TRUE
)
v
```


These settings can be modified if required though the use of arguments `file_modification_check` and `allow_submit_window_mods`.

- **`file_modification_check`** controls whether modification/deletion checks are performed, what is returned if modifications/deletions are detected and accepts one of the following values:

- **`"error"`**: Appends a `<error/check_error>` condition class object for each applicable modified/deleted file. Will result in validation workflow failure.
- **`"warning"`**: Appends a `<warning/check_warning>` condition class object for each applicable modified/deleted file. Will result in validation workflow failure.
- **`"message"`**: Appends a `<message/check_info>` condition class object for each applicable modified/deleted file. Will not result in validation workflow failure.
- **`"none"`**: No modification/deletion checks performed.

- **`allow_submit_window_mods`** controls whether modifications/deletions of model output files are allowed within their submission windows. Is set to `TRUE` by default but can be set to `FALSE` if modifications/deletions are not allowed, regardless of timing.
Is ignored when checking model metadata files as well as when `file_modification_check` is set to `"none"`.


<div class="alert alert-warning" role="alert">

#### Warning

Note that to establish **relative** submission windows when performing modification/deletion checks and `allow_submit_window_mods` is `TRUE`, the reference date is taken as the `round_id` extracted from the file path.
This is because we cannot extract dates from columns of deleted files.
If hub submission window reference dates do not match round IDs in file paths, currently `allow_submit_window_mods` will not work correctly and is best set to `FALSE`.
This only relates to hubs/rounds where submission windows are determined relative to a reference date and not when explicit submission window start and end dates are provided in the config.

For more details on submission window config see [Setting up `"submission_due"`](https://hubdocs.readthedocs.io/en/latest/quickstart-hub-admin/tasks-config.html#setting-up-submissions-due) in the hubverse hubDocs.

</div>


## Checking for validation failures with `check_for_errors()`

`check_for_errors()` is used to inspect a `hub_validations` class object, determine whether overall validations have passed or failed and summarise any detected errors/failures.

### Validation failure

If any elements of the `hub_validations` object contain `<error/check_error>`, `<warning/check_warning>` or `<error/check_exec_error>` condition class objects, the function throws an error and prints the messages from the failing checks.

```{r, error=TRUE}
temp_hub <- fs::path(tempdir(), "invalid_sb_hub")
gert::git_clone(
url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
path = temp_hub,
branch = "pr-missing-taskid"
)
v_fail <- validate_pr(
hub_path = temp_hub,
gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
pr_number = 5,
skip_submit_window_check = TRUE
)
check_for_errors(v_fail)
```

### Validation success

If all validations checks pass, `check_for_errors()` returns `TRUE` silently and prints:

```
✔ All validation checks have been successful.
```

```{r}
temp_hub <- fs::path(tempdir(), "valid_sb_hub")
gert::git_clone(
url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
path = temp_hub,
branch = "pr-valid"
)
v_pass <- validate_pr(
hub_path = temp_hub,
gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
pr_number = 4,
skip_submit_window_check = TRUE
)
check_for_errors(v_pass)
```


### Verbose output

If printing the results of all checks is preferred instead of just summarising the results of checks that failed, argument `verbose` can be set to `TRUE`.

```{r, error=TRUE}
check_for_errors(v_fail, verbose = TRUE)
check_for_errors(v_pass, verbose = TRUE)
```
137 changes: 137 additions & 0 deletions vignettes/articles/validate-submission.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: "Validating submissions locally"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

```{r setup}
library(hubValidations)
```

While most hubs will have automated validation systems set up to check contributions during submission, `hubValidations` also provides functionality for validating files locally before submitting them.
For this, submitting teams can use `validate_submission()` to validate their model output files prior to submitting.


### Structure of `hub_validations` object


Each named element contains the result of an individual check and inherits from subclass `<hub_check>`. The name of each element is the name of the check.

```{r}
hub_path <- system.file("testhubs/simple", package = "hubValidations")
validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
```


The super class returned depends on the status of the check:

- If a check succeeds, a `<message/check_success>` condition class object is returned.

- If a check is skipped, a `<message/check_info>` condition class object is returned.

- Checks vary with respect to whether they return an `<error/check_error>` or `<warning/check_failure>` condition class object if the check fails.
Ultimately, both will cause overall validation to fail and the two classes are used primarily to communicate the severity of a failing check.

### Validation early return

Some checks which are critical to downstream checks will cause validation to stop and return the results of the checks up to and including the critical check that failed early.
They generally return a `<error/check_error>` condition class object.
Any problems identified will need to be resolved and the function rerun for validation to proceed further.


```{r}
validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-15-hub-baseline.csv"
)
```

### Execution Errors

If an execution error occurs in any of the checks, an `<error/check_exec_error>` is returned instead. For validation purposes, this results in the same downstream effects as an `<error/check_error>` object.


### Checking for errors with `check_for_errors()`

You can check whether your file will overall pass validation checks by passing the `hub_validations` object to `check_for_errors()`.

If validation fails, an error will be thrown and the failing checks will be summarised.

```{r, error=TRUE}
validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
) %>%
check_for_errors()
```



### Skipping the submission window check

If you are preparing your submission prior to the submission window opening, you might want to skip the submission window check.
You can so by setting argument `skip_submit_window_check` to `TRUE`.

This results in the previous valid file (except for failing the validation window check) now passing overall validation.

```{r}
validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv",
skip_submit_window_check = TRUE
) %>%
check_for_errors()
```



## Structure of a `<hub_check>` object

Let's look more closely at the structure of the first few elements of the `hub_validations` object retuned by `validate_submission()`

```{r}
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
str(head(v))
```

Each `<hub_check>` objects contains the following elements:

- `message`: the result message containing details about the check.
- `where:`: there the check was performed, usually the model output file name.
- `call`: the function used to perform the check.
- `use_cli_format`: whether the message is formatted using cli format, almost always TRUE.

### Extra information

Some `<hub_check>` objects contain extra information about the failing check to help identify affected rows in submissions.

For example, the `<hub_check>` object returned for the `valid_vals` check, which checks that all columns in a model output file (excluding the `value` column) contain valid combinations of task ID / output type / output type ID values contains an additional element called `error_tbl`, with details of the invalid value combinations in the rows affected.

To access `error_tbl` from the output of `validate_submission()` stored in an object `v`, you would use:

```{r, eval=FALSE}
v$valid_vals$error_tbl
```


## `validate_submission` check details

```{r, echo=FALSE}
library(kableExtra)
arrow::read_csv_arrow(system.file("check_table.csv", package = "hubValidations")) %>%
dplyr::select(-"parent fun", -"check fun") %>%
dplyr::mutate("Extra info" = dplyr::case_when(
is.na(.data$`Extra info`) ~ "",
TRUE ~ .data$`Extra info`
)) %>%
knitr::kable(caption = "Details of checks performed by `validate_submission()`") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
column_spec(1, bold = TRUE)
```

0 comments on commit df290cb

Please sign in to comment.