-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue an error (instead of a warning) if incoming column data types don't match the hub schema #111
Comments
While I'm not adverse to this suggestion, I want to make sure any misconceptions are cleared up first. As stated in the docs both errors and warnings are considered validation failures and will throw an error via The error vs warning is admittedly used primarily for internal purposes to the validation process, i.e. an error usually causes an early return of the validation process because downstream tests cannot be performed until the failed error check is addressed. In the case of deviation from the hub schema, because the rest of the validation checks are performed on all character versions of data, validation can continue despite data type differences, hence this failure is handled as a warning and the rest of validation continues. That doesn't mean that the warning can be ignored. It will still trigger an error in So while I could change this to an error, I don't really want it to cause an early return when it's possible to run further checks. So it would be going against the paradigm all other checks are following. Having said that, I could make that clearer in the docs, i.e. that warning vs error has more to do with the validation process. Having said all that, while So I'm definitely up for thinking through how to make warnings more visible in general e.g. here are some other options in cli::cli_text("{cli::symbol$cross} error | {cli::symbol$circle_cross} warning | {cli::symbol$checkbox_on} warning")
#> ✖ error | ⓧ warning | ☒ warning and perhaps also issuing an overall summary when printing a cli::cli_h2(cli::format_inline("{cli::symbol$tick} File valid"))
#>
#> ── ✔ File valid ──
#>
cli::cli_h2(cli::format_inline("{cli::symbol$cross} File invalid"))
#>
#> ── ✖ File invalid ──
#> |
@annakrystalli Thanks for the detailed explanation of what's happening under the hood! Taking a step back from the underlying technical issues, I keep coming back to this statement:
I'm wary of applying R coding conventions as guidance for overall hub usability. Given that hub admins have the option merge a model-output submission even if the overall validations fail, a reasonable user could assume that merging warnings won't break anything in the same way that merging an error would. Model output files that don't conform to their hub's schema is an error that can impact downstream operations (especially for hubs that accept parquet submissions), so my .02 is that we should treat it as such. All that said, given that this is an edge case with a low probability of occurring in general hub usage, addressing it might not be a high priority for us. [To reproduce the "invalid schema" warning with a hub that accepts FIPS codes for locations, you'd have to submit a model output file that contains only 2 digit FIPS codes and no character values like |
OpinionGiving my opinion though it was not requested, when I run a program, the messages I expect are:
The blogdown package does a really good job at this: https://alison.netlify.app/ares-kind-tools/#120 R's warnings are a bit weird because they exist in a grey area between "things I need to address" and "ignore". They always appear after your run is finished and often are hidden if you have more than 50. Suggestion
For what it's worth, I believe calling it warning/error in this case is mostly semantic. The output of I have a suggestion: since they both result in an error, this might be solved by converting the following to hubValidations/R/capture_check_cnd.R Line 85 in 8c8b2f4
(I tested this and the only thing that changes are minor attributes of snapshot tests). This way, they are both presented as different types of errors with the difference being if the error prevented downstream checks from being run. |
Thanks for the input both! I feel more comfortable changing all I also feel that for users (admins and teams), the print method is the most important aspect of communicating the type and implications of different errors so I've also modified the print method and added more info to the vignette too. One thing this removes though is the removal of a warning class object as an output of any checks. While |
When researching hubverse-org/hubverse-transform#14, I realized that parquet model-output files with a column data type that doesn't match the config-derived schema will result in a validation warning instead of an error (full convo is in the linked issue, see relevant snippet below).
Such a mismatch can result in downstream errors when working with the data (for both local/github and cloud-based hubs). Therefore, we should consider re-classifying the warning to an error.
The text was updated successfully, but these errors were encountered: