diff --git a/docs/source/user-guide/model-output.md b/docs/source/user-guide/model-output.md index 089ae77e..da2bf963 100644 --- a/docs/source/user-guide/model-output.md +++ b/docs/source/user-guide/model-output.md @@ -128,7 +128,7 @@ Validation of forecast values occurs in two steps: > Note the difference in the following discussion between [hubverse schema](https://github.com/hubverse-org/schemas) - the schema which hub config files are validated against - and [`arrow schema`](https://arrow.apache.org/docs/11.0/r/reference/Schema.html) - the mapping of model output columns to data types. -Because we store model output data as separate files but open them as a single `arrow` dataset using the `hubData` package, for a hub to be successfully accessed as an `arrow dataset`, it is necesssary to ensure that all files conform to the same [`arrow schema`](https://arrow.apache.org/docs/11.0/r/reference/Schema.html) (i.e. share the same column data types) across the lifetime of the hub. This means that additions of new rounds should not change the overall hub schema at a later date (i.e. after submissions have already started being collected). +Because we store model output data as separate files but open them as a single [`arrow` dataset](https://arrow.apache.org/docs/r/reference/Dataset.html) using the `hubData` package, for a hub to be [successfully accessed and fully queryable across all columns as an `arrow dataset`](https://arrow.apache.org/docs/r/articles/dataset.html), it is necesssary to ensure that all files conform to the same [`arrow schema`](https://arrow.apache.org/docs/11.0/r/reference/Schema.html) (i.e. share the same column data types) across the lifetime of the hub. This means that additions of new rounds should not change the overall hub schema at a later date (i.e. after submissions have already started being collected). Many common task IDs are covered by the [hubverse schema](#model-tasks-tasks-json-interactive-schema), are validated during hub config validation and should therefore have consistent and stable data types. However, there are a number of situations where a single consistent data type cannot be guaranteed, e.g.: - New rounds introducing changes in custom task ID value data types, which are not covered by the hubverse schema.