Clarify what expectation of successful interaction with hub data is

hubverse-org · Aug 5, 2024 · f330a80 · f330a80
1 parent 48d98c7
commit f330a80
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/source/user-guide/model-output.md b/docs/source/user-guide/model-output.md
@@ -128,7 +128,7 @@ Validation of forecast values occurs in two steps:
 
 > Note the difference in the following discussion between [hubverse schema](https://github.com/hubverse-org/schemas) - the schema which hub config files are validated against - and [`arrow schema`](https://arrow.apache.org/docs/11.0/r/reference/Schema.html) - the mapping of model output columns to data types.
 
-Because we store model output data as separate files but open them as a single `arrow` dataset using the `hubData` package, for a hub to be successfully accessed as an `arrow dataset`, it is necesssary to ensure that all files conform to the same [`arrow schema`](https://arrow.apache.org/docs/11.0/r/reference/Schema.html) (i.e. share the same column data types) across the lifetime of the hub. This means that additions of new rounds should not change the overall hub schema at a later date (i.e. after submissions have already started being collected). 
+Because we store model output data as separate files but open them as a single [`arrow` dataset](https://arrow.apache.org/docs/r/reference/Dataset.html) using the `hubData` package, for a hub to be [successfully accessed and fully queryable across all columns as an `arrow dataset`](https://arrow.apache.org/docs/r/articles/dataset.html), it is necesssary to ensure that all files conform to the same [`arrow schema`](https://arrow.apache.org/docs/11.0/r/reference/Schema.html) (i.e. share the same column data types) across the lifetime of the hub. This means that additions of new rounds should not change the overall hub schema at a later date (i.e. after submissions have already started being collected). 
 
 Many common task IDs are covered by the [hubverse schema](#model-tasks-tasks-json-interactive-schema), are validated during hub config validation and should therefore have consistent and stable data types. However, there are a number of situations where a single consistent data type cannot be guaranteed, e.g.:
 - New rounds introducing changes in custom task ID value data types, which are not covered by the hubverse schema.