Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing custom check functions vignette #127

Merged
merged 48 commits into from
Oct 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
859b878
Add dependency section deps
annakrystalli Oct 3, 2024
1b9b943
Update navbar drop down topics
annakrystalli Oct 3, 2024
9023f5f
add step to link to dev version of docs if appropriate
annakrystalli Oct 3, 2024
0989285
Separate section on caller env object into child
annakrystalli Oct 3, 2024
a2203e2
Commit draft
annakrystalli Oct 3, 2024
c1c1443
Merge branch 'main' into ak/custom-fn-dev-article/121
annakrystalli Oct 4, 2024
646186e
Add child sections on managing custom fns & additional dependencies. …
annakrystalli Oct 4, 2024
da3e652
Additional info
annakrystalli Oct 4, 2024
1d33cf0
rename custom-functions article to deploying-custom-functions
annakrystalli Oct 4, 2024
dc6bfa8
skip submission window check
annakrystalli Oct 4, 2024
ca693a2
Appease linter!
annakrystalli Oct 4, 2024
28e9589
Update NEWS
annakrystalli Oct 4, 2024
91b10f3
Use Rmd instead of qmd
annakrystalli Oct 4, 2024
593bc6d
correct netlify preview GA dev status determination step
annakrystalli Oct 4, 2024
61ca2d9
Clarify the value of `round_id_col`.
annakrystalli Oct 10, 2024
f353818
Add info about validations_cfg_path caller env object
annakrystalli Oct 10, 2024
58b48cb
Typo
annakrystalli Oct 10, 2024
429e251
Add a sentence describing what the example check does
annakrystalli Oct 10, 2024
1af57d5
Clarify location of src dir
annakrystalli Oct 10, 2024
38a2253
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 10, 2024
a6960cf
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 10, 2024
d1c5a99
Minor edit
annakrystalli Oct 10, 2024
da66ada
remove rownames frm pkg deps datatable
annakrystalli Oct 10, 2024
c187767
Improve flow of entence and include example
annakrystalli Oct 10, 2024
103bbd4
Add note about further examples
annakrystalli Oct 10, 2024
b0a5c23
attempt to fix vignette references
annakrystalli Oct 10, 2024
6c85f6c
remove 'articles/' for href
annakrystalli Oct 10, 2024
ad28d54
Update vignettes/articles/children/_add-deps-source.Rmd
annakrystalli Oct 10, 2024
e160926
suppress comment prefixes when printing fn bodies
annakrystalli Oct 10, 2024
1a4b72c
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 10, 2024
63e9f64
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 10, 2024
f293f08
correct typo
annakrystalli Oct 10, 2024
dd1be46
Rework inputs/arguments sections
annakrystalli Oct 10, 2024
9441780
Add more details on required inputs and how objects are passed
annakrystalli Oct 10, 2024
9b58ee5
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 10, 2024
acade59
reword
annakrystalli Oct 10, 2024
b02411e
use comment=NA
annakrystalli Oct 10, 2024
4ffd044
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 10, 2024
e9fe91b
Fix headers
annakrystalli Oct 10, 2024
b4f9f3f
add link to overriding args section
annakrystalli Oct 11, 2024
ef940f3
Highlight and clarify important aspects of useing args in deployment
annakrystalli Oct 11, 2024
51e4f54
Add spaces
annakrystalli Oct 11, 2024
2bd8666
add alert role
annakrystalli Oct 11, 2024
ca499d7
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 11, 2024
821a577
Formatting updates
annakrystalli Oct 11, 2024
75dff02
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 11, 2024
ae573f9
Update vignettes/articles/writing-custom-fns.Rmd
annakrystalli Oct 11, 2024
ed6189b
Fix headings
annakrystalli Oct 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions .github/workflows/pkgdown-netlify-preview.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,20 @@ jobs:
branch: gh-pages
folder: docs

- id: deploy-dir
name: Determine dev status
run: |
if [[ $(grep -c -E 'sion. ([0-9]*\.){3}' ${{ github.workspace }}/DESCRIPTION) == 1 ]]; then
echo 'dir=./docs/dev' >> $GITHUB_OUTPUT
else
echo 'dir=./docs' >> $GITHUB_OUTPUT
fi
- name: Deploy PR preview to Netlify
annakrystalli marked this conversation as resolved.
Show resolved Hide resolved
if: contains(env.isPush, 'false')
id: netlify-deploy
uses: nwtgck/actions-netlify@v2
uses: nwtgck/actions-netlify@v3
with:
publish-dir: './docs'
publish-dir: '${{ steps.deploy-dir.outputs.dir }}'
production-branch: main
github-token: ${{ secrets.GITHUB_TOKEN }}
deploy-message:
Expand Down
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,11 @@ Imports:
yaml
Suggests:
covr,
DT,
gert,
kableExtra,
mockery,
pak,
readr,
rmarkdown,
testthat (>= 3.2.0),
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# hubValidations (development version)

* Added:
- new vignette on how to create custom validation checks for hub validations (#121)
- new section on how to manage additional dependencies required by custom validation functions (#22).

# hubValidations 0.7.0

* Added function `create_custom_check()` for creating custom validation check function files from templates (#121).
Expand Down
2 changes: 1 addition & 1 deletion R/check_tbl_unique_round_id.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' @param round_id_col Character string. The name of the column containing
#' `round_id`s. Usually, the value of round property `round_id` in hub `tasks.json`
#' config file.
#' config file. Defaults to `NULL` and determined from the config if applicable.
#' @inheritParams check_tbl_colnames
#' @return
#' Depending on whether validation has succeeded, one of:
Expand Down
7 changes: 5 additions & 2 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,11 @@ navbar:
- text: Validating submissions locally
href: articles/validate-submission.html
- text: -------
- text: Including custom validation functions
href: articles/custom-functions.html
- text: "Custom validation checks"
- text: Writing custom validation functions
href: articles/writing-custom-fns.html
- text: Deploying custom validation functions
href: articles/deploying-custom-functions.html
development:
mode: auto

2 changes: 1 addition & 1 deletion man/check_tbl_match_round_id.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/check_tbl_unique_round_id.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/check_valid_round_id_col.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/validate_model_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/validate_submission.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion tests/testthat/_snaps/check_tbl_values_required.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,8 @@
---

Code
check_for_errors(validate_submission(hub_path, file_path))
check_for_errors(validate_submission(hub_path, file_path,
skip_submit_window_check = TRUE))
Message

-- 2024-10-02-UMass-HMLR.parquet ----
Expand Down
5 changes: 4 additions & 1 deletion tests/testthat/test-check_tbl_values_required.R
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,10 @@ test_that("(#123) check_tbl_values_required works with all optional output types
)
# Ensure that req_vals check is the only one that fails
expect_snapshot(
check_for_errors(validate_submission(hub_path, file_path)),
check_for_errors(validate_submission(
hub_path, file_path,
skip_submit_window_check = TRUE
annakrystalli marked this conversation as resolved.
Show resolved Hide resolved
)),
error = TRUE
)
})
24 changes: 24 additions & 0 deletions vignettes/articles/children/_add-deps-pkg.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
### Deploying custom functions as a package

To deploy custom functions managed as a package in `src/validations`, you can use the `pkg` configuration property in the `validations.yml` file to specify the package namespace.

For example, if you have created a simple package in `src/validations/` with a `cstm_check_tbl_example.R` script containing the specification of an `cstm_check_tbl_example()` function in `src/validations/R`, you can use the following configuration in your `validation.yml` file to source the function from the installed `validations` package namespace:

```
default:
validate_model_data:
custom_check:
fn: "cstm_check_tbl_example"
pkg: "validations"
```

To ensure the package (and any additional dependencies it depends on) is installed and available during validation, you must add the package to the `setup-r-dependencies` step in the `hubverse-actions` `validate-submission.yaml` GitHub Action workflow of your hub like so:

```yaml
- uses: r-lib/actions/setup-r-dependencies@v2
with:
packages: |
any::hubValidations
any::sessioninfo
local::./src/validations
```
53 changes: 53 additions & 0 deletions vignettes/articles/children/_add-deps-source.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@

## Available dependencies

**All `hubValidations` exported functions are available** for use in your custom check functions as well as functions from hubverse packages **`huUtils`**, **`hubAdmin`** and **`hubData`**.

```{r, echo=FALSE}
get_deps <- function(pkg) {
suppressMessages(pak::pkg_deps(pkg))
}
memoise_pkg_deps <- memoise::memoise(get_deps)
pkgs <- memoise_pkg_deps(".")[, c("package", "version")]
```

In addition, **functions in packages from the `hubValidations` dependency tree are also generally available**, both locally (once `hubValidations` is installed) and in the hubverse `validate-submission` GitHub Action.

Functions from these packages can be used in your custom checks without specifying them as additional dependencies.

```{r, echo=FALSE}
pkgs[order(pkgs$package), ] |>
DT::datatable(rownames = FALSE)
```


## Additional dependencies

If any custom functions you are deploying depend on additional packages, you will need to ensure these packages are available during validation.

The simplest way to ensure they are available is to edit the `setup-r-dependencies` step in the `hubverse-actions` [`validate-submission.yaml`](https://github.com/hubverse-org/hubverse-actions/blob/main/validate-submission/validate-submission.yaml) GitHub Action workflow of your hub and add any additional dependency to the `packages` field list.

In the following pseudo example we add `additionalPackage` package to the list of standard dependencies:

```yaml
- uses: r-lib/actions/setup-r-dependencies@v2
with:
packages: |
any::hubValidations
any::sessioninfo
any::additionalPackage
```

Note that this ensures the additional dependency is available during validation on GitHub but does not guarantee it will be installed locally for hub administrators or submitting teams. Indeed such missing dependencies could lead to execution errors in custom checks when running `validate_submission()` locally.

You could use documentation, like your hub's README to communicate additional required dependencies for validation to submitting teams. Even better, you could add a check to the top of your function to catch missing dependencies and provide a helpful error message to the user.

```{r, eval=FALSE}
if (!(requireNamespace("additionalPackage", quietly = TRUE))) {
stop(
"Package 'additionalPackage' must be installed to run the full validation check.
Please install and try again."
)
}
```

20 changes: 20 additions & 0 deletions vignettes/articles/children/_custom-fn-available-args.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Each of the `validate_*()` functions **contain a number of standard objects in their call environment** which are **available for downstream check functions to use as arguments** and are **passed automatically to arguments** of optional/custom functions **with the same name**. Therefore, values for such arguments do not need including in function deployment configuration but [**can be overridden through a function's `args` configuration**](deploying-custom-functions.html#deploying-optional-hubvalidations-functions) in `validations.yml` during deployment.

**All `validate_*()` functions will contain the following five objects in their caller environment:**

- **`file_path`**: character string of path to file being validated relative to the `model-output` directory.
- **`hub_path`**: character string of path to hub.
- **`round_id`**: character string of `round_id` derived from the model file name.
- **`file_meta`**: named list containing `round_id`, `team_abbr`, `model_abbr` and `model_id` details.
- **`validations_cfg_path`**: character string of path to `validations.yml` file. Defaults to `hub-config/validations.yml`.

**`validate_model_data()` will contain the following additional objects:**

- **`tbl`**: a tibble of the model output data being validated.
- **`tbl_chr`**: a tibble of the model output data being validated with all columns coerced to character type.
- **`round_id_col`**: character string of name of `tbl` column containing `round_id` information. Defaults to `NULL` and usually determined from the `tasks.json` config if applicable unless explicitly provided as an argument to `validate_model_data()`.
- **`output_type_id_datatype`**: character string. The value of the `output_type_id_datatype` argument. This value is useful in functions like `hubData::create_hub_schema()` or `hubValidations::expand_model_out_grid()` to set the data type of `output_type_id` column.
- **`derived_task_ids`**: character vector or `NULL`. The value of the `derived_task_ids` argument, i.e. the names of task IDs whose values depend on other task IDs.


The `args` configuration can be used to override objects from the caller environment as well as defaults during deployment.
annakrystalli marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Include custom validation functions"
title: "Deploying custom validation functions"
---

```{r, include = FALSE}
Expand All @@ -17,7 +17,7 @@ library(hubValidations)
Custom validation functions can be included and configured within standard `hubValidation` workflows by **including a `validations.yml` file in the `hub-config` directory**. Alternatively, an appropriately structured file can be included at a different location and the path to the file provided through argument `validations_cfg_path`.

`hubValidations` uses the [`config`](https://rstudio.github.io/config/articles/inheritance.html) package to get validation configuration. This allows for configuration inheritance and the ability to include executable R code.
See the `confog` package vignette on [inheritance and R expressions](https://rstudio.github.io/config/articles/inheritance.html) for more details.
See the `config` package vignette on [inheritance and R expressions](https://rstudio.github.io/config/articles/inheritance.html) for more details.

## `validations.yml` structure

Expand All @@ -34,37 +34,24 @@ Within the default configuration, individual checks can be configured for each o
- **`fn`:** The name of the check function to be run, as character string (required).
- **`pkg`:** The name of the package namespace from which to get check function. Must be supplied if function is distributed as part of a package.
- **`source:`** Path to `.R` script containing function code to be sourced. If relative, should be relative to the hub's directory root. Must be supplied if function is not part of a package and only exists as a script.
- **`args`:** A yaml dictionary of key/value pairs or arguments to be passed to the custom function. Values can be yaml lists or even executable R code (optional).
- **`args`:** A yaml dictionary of key/value pairs of arguments and their values to be passed to the custom function. Values can be yaml lists or even executable R code (optional).

Note that each of the `validate_*()` functions contain a standard objects in their call environment which are passed automatically to any custom check function and therefore do not need including in the `args` configuration.

- **`validate_model_file`:**
- `file_path`: character string of path to file being validated relative to the `model-output` directory.
- `hub_path`: character string of path to hub.
- `round_id`: character string of `round_id`
- `file_meta`: named list containing `round_id`, `team_abbr`, `model_abbr` and `model_id` details.
- **`validate_model_data`:**
- `tbl`: a tibble of the model output data being validated.
- `file_path`: character string of path to file being validated relative to the `model-output` directory.
- `hub_path`: character string of path to hub.
- `round_id`: character string of `round_id`
- `file_meta`: named list containing `round_id`, `team_abbr`, `model_abbr` and `model_id` details.
- `round_id_col`: character string of name of `tbl` column containing `round_id` information.
- **`validate_model_metadata`:**
- `file_path`: character string of path to file being validated relative to the `model-output` directory.
- `hub_path`: character string of path to hub.
- `round_id`: character string of `round_id`
- `file_meta`: named list containing `round_id`, `team_abbr`, `model_abbr` and `model_id` details.

The `args` configuration can be used to override objects from the caller environment as well as defaults.


Here's an example configuration for a single check (`opt_check_tbl_horizon_timediff()`) to be run as part of the `validate_model_data()` validation function which checks the content of the model data submission files.

```{r child="children/_custom-fn-available-args.Rmd", echo=FALSE, results="asis"}
```

#### Deploying optional `hubValidations` functions

Here's an example configuration for a single optional `hubValidations` check, `opt_check_tbl_horizon_timediff()`, which checks that the temporal difference between the values in two date columns (defined by additional arguments `t0_colname` & `t1_colname`) is equal to a time period defined by horizon values (contained in a column defined by `horizon_colname`) and the length of a single horizon defined by argument `timediff`.

The check is to be run as part of the `validate_model_data()` validation function which checks the content of the model data submission files.

annakrystalli marked this conversation as resolved.
Show resolved Hide resolved
```{r, eval=FALSE, code=readLines(system.file('testhubs/flusight/hub-config/validations.yml', package = 'hubValidations'))}
```

The above configuration file relies on default values for arguments `horizon_colname` (`"horizon"`) and `timediff` (`lubridate::weeks()`). We can use the `validations.yml` `args` list to override the default values. Here's an example that includes **executable r code** as the value of an argument.
The above configuration file relies on default values for arguments `horizon_colname` (`"horizon"`) and `timediff` (`lubridate::weeks()`). We can **use the `validations.yml` `args` list to override the `horizon_colname` and `timediff` argument default values**.

In this example, we **also include executable r code** as the value of the `timediff` argument.

```
default:
Expand All @@ -79,6 +66,19 @@ default:
timediff: !expr lubridate::weeks(2)
```

#### Deploying custom functions

The above example involved an optional `hubValidation` function. To deploy a custom function that is not part of the `hubValidations` or any other package, you should store the script containing the function in the `src/validations/R/` directory (relative to the root of your hub) and include the path to the script in the `source` argument in the configuration file.

```
default:
validate_model_data:
custom_check:
fn: "cstm_check_tbl_example"
source: "src/validations/R/cstm_check_tbl_example.R"
```


### Round specific configuration

Additional round specific configurations can be included in `validations.yml` that can add to or override default configurations.
Expand Down Expand Up @@ -159,6 +159,12 @@ arrow::read_csv_arrow(system.file("check_table.csv", package = "hubValidations")
```


## Managing dependencies of custom sourced functions
# Managing dependencies of custom functions

TODO
If any custom functions you are deploying depend on additional packages, you will need to ensure these packages are available during validation.

```{r child="children/_add-deps-source.Rmd", echo=FALSE, results="asis"}
```

```{r child="children/_add-deps-pkg.Rmd", echo=FALSE, results="asis"}
```
Loading
Loading