- Added
subset_task_id_names()
function to subset task ID names from a character vector of column names (#149). - Added functions
subset_task_id_cols()
andsubset_std_cols()
to subset amodel_out_tbl
or submissiontbl
to task ID or standard (non-task ID) columns respectively (#149).
schema_id
version checks silenced by default inread_config()
andread_config_file()
.- Add and export
hubValidations
functionsget_hub_timezone()
,get_hub_model_output_dir()
andget_hub_file_formats()
for extracting hub metadata tohubUtils
package. - Add new function
get_hub_derived_task_ids()
to extract round or hub level derived task ID values from atasks.json
config file.
- Add family of functions for extracting the version number from a variety of sources:
get_version_config()
: extract version from a<config>
class object.get_version_config_file()
: extract version from a config file by specifying aconfig_path
.get_version_hub()
: extract version from a config file by specifying ahub_path
.
- Add family of functions for comparing the version number extracted from a variety of sources to a given version number (#171):
version_equal()
: Check whether a schema version property is equal to.version_gte()
: Check whether a schema version property is equal to or greater than.version_gt()
: Check whether a schema version property is greater than.version_lte()
: Check whether a schema version property is equal to or less than.version_lt()
: Check whether a schema version property is less than.
<config>
class objects now have atype
attribute to track what type of config they contain (i.e"tasks"
or"admin"
).read_config()
andread_config_file()
will attempt to coerce their output a<config>
class object, with a warning if unsuccessful (#173).- Add
as_config()
function to coerce a config list to a<config>
class object (from thehubAdmin
package) (#173). - Fix bug in
extract_schema_version()
where only single digits from each version component were being extracted. - Fix documentation for
get_schema_version_latest()
to no longer usev1.0.0
- First submission to CRAN
- Removed
hubData
dependency
- Bug fix: Corrected bug in v3 config utilities so that configs are detected as
v3
if they arev3.0.0
or above, not justv3.0.0
. Thanks to @M-7th for reporting.
- Remove
hubAdmin
Suggests dependency by moving test hub configuration validation to CI (resolved: @annakrystalli, #158)
- Add
read_config_file()
helper function to read a JSON config file from a file path. - Add
extract_schema_version()
helper function to extract the schema version from a schemaid
or configschema_version
property character string. - Add helpers
is_v3_config
,is_v3_config_file
andis_v3_config_hub
to check whether a config object, file or hub is using schema version 3.
- Missing dependency (
jsonlite
) bug fix.
- First major release of
hubUtils
package containing significant breaking changes. Much of the package has been moved and split across two smaller and more dedicated packages:hubData
package: contains functions for connecting to and interacting with hub data.- Exported functions moved to
hubData
:connect_hub()
,connect_model_output()
,expand_model_out_val_grid()
,create_model_out_submit_tmpl()
,coerce_to_character()
,coerce_to_hub_schema()
andcreate_hub_schema()
. hubUtils
functions re-exported tohubData
:as_model_out_tbl()
,validate_model_out_tbl()
,model_id_split()
andmodel_id_merge()
.
- Exported functions moved to
hubAdmin
package: contains functions for administering Hubs, in particular creating and validating hub configuration files. Exported functions moved tohubAdmin
:- Functions for creating config files:
create_config()
,create_model_task()
,create_model_tasks()
,create_output_type()
,create_output_type_cdf()
,create_output_type_mean()
,create_output_type_median()
,create_output_type_pmf()
,create_output_type_quantile()
,create_output_type_sample()
,create_round()
,create_rounds()
,create_target_metadata()
,create_target_metadata_item()
,create_task_id()
,create_task_ids()
. - Functions for validating config files:
validate_config()
,validate_model_metadata_schema()
,validate_hub_config()
,view_config_val_errors()
.
- Functions for creating config files:
- Minor internal bug fixes and documentation updates.
- Added US and European location datasets. These can be used e.g. when assigning location task ID values for
tasks.json
config files programmatically (#127).
connect_hub()
andconnect_model_output()
now identify and report on files that are present and should have been opened but for which a connection was not successful (#124)- Introduced a number of minor documentation clarifications and bug fixes (#129, #128, #121, #130)
- Added
validate_model_metadata_schema()
function and included it as part ofvalidate_hub_config()
(#110 & #112).
- Added
load_model_metadata()
function to compile hub model metadata.
- Added
coerce_to_character()
function for coercing all model output columns to character. This can be much faster than coercing tocoerce_to_hub_schema()
, especially for dates. - Added the following parameters to
expand_model_out_val_grid()
:all_character
: allow for returning all character columns.as_arrow_table
: allow for returning an arrow data table.bind_model_tasks
: allow for returning list of model task level grids.
- Bug fix. Handle situation in
expand_model_out_val_grid()
whenrequired_vals_only = TRUE
yet required task ID columns are not consistent across modeling tasks. The function now pads missing task ID column values withNA
s.
- Introduced
coerce_to_hub_schema()
function and applied it tocreate_model_out_submit_tmpl()
&expand_model_out_val_grid()
to ensure column data types in returned tibbles are consistent with the hub's schema (#100). - Fixed bug where optional
mean
/median
output types where being included erroneously whenrequired_vals_only = TRUE
. - Exported function
get_round_task_id_names()
(#99). - Memoized function
read_config()
(#101).
- Fixed bug (#95 & #97) which was causing
connect_hub()
to error when"csv"
was an accepted hub file format but there were no CSV in the model output directory. Nowconnect_hub()
checks for the presence of files of each accepted file format and only opens datasets for file formats of which files exists. If there are no files of any accepted file_format in the model output directory, the S3hub_connection
object returned consists of an empty list. - Fixed bug (#96) which was required
hubUtils
to be loaded forstd_colnames
to be internally available.
- Changed default behavior of
create_model_out_submit_tmpl()
. Function now, by default, returns rows of complete cases only and the behavior is controlled by argumentcomplete_cases_only
. Argumentremove_empty_cols
was also removed.
- Support for Hubs using schema earlier than v2.0.0 deprecated. Currently a warning is issued when interacting with such Hubs. Support will eventually be retired completely and errors will be produced with Hubs using older config schema.
- Added
create_model_out_submit_tmpl()
for generating round specific model output template tibbles (#82). - Added lower level utilities:
expand_model_out_val_grid()
for creating an expanded grid of valid task ID and output type ID across round modeling tasks and output types.get_round_idx()
: for getting an integer index of the element inconfig_tasks$rounds
that a character round identifier maps to.get_round_ids()
: for getting a list or character vector of Hub round IDs.
- Added additional
tasks.json
validation checks viavalidate_config()
:- Check that all task_id and output_type_id values are unique across
required
andoptional
properties. - In rounds where
round_id_from_variable
isTRUE
, check that the specification of the task_id set asround_id
is consistent across modeling tasks. - Check that
round_id
values are unique across rounds.
- Check that all task_id and output_type_id values are unique across
- Exported object
std_colnames
which contains standard column names used in hubverse model output data files, for use in other hubverse packages (#88).
- Added
as_model_out_tbl()
function to standardize model output data by converting to amodel_out_tbl
S3 class object. (#32, #33, #63, #64, #66) - To support back-compatibility with model output data in older hubs, added functions
model_id_merge()
andmodel_id_split()
to createmodel_id
column from separateteam_abbr
andmodel_abbr
columns and vice versa (#63).
- Added argument
output_type_id_datatype
toconnect_hub()
to allow overriding default behavior of automatically detecting theoutput_type_id
column data type from thetasks.json
config file (#70). - Exposed
create_hub_schema()
argumentpartitions
toconnect_hub()
function to accommodate custom hub partitioning. - Added argument
partition_names
toconnect_model_output()
to accommodate custom hub partitioning. - Added argument
schema
toconnect_model_output()
to allow for overriding defaultarrow
schema auto-detection. - Moved
jsonvalidate
package to Imports so Hub administrator functionality accessible through standard installation. - Removed argument
format
fromcreate_hub_schema()
which now creates the same schema from atasks.json
config file, regardless of the data file format (#80).
- New function
validate_hub_config()
allows maintainers to check the validity of hub config files in a single call. Functionview_config_val_errors()
also modified to create combined report for hub config files from output ofvalidate_hub_config()
. - Breaking change: All
model-output
data are expected to haveoutput_type
&output_type_id
instead oftype
&type_id
respectively.
connect_hub()
now automatically determines theoutput_type_id
column data type from thetasks.json
config file coercing to the highest possible data type, "character" being the lowest denominator.- Introduced function
create_hub_schema()
for determining the schema for data in a hub's model-output directory from atasks.json
config file. connect_hub()
now allows establishing connections to hubs with multiple file type formats.create_output_type_categorical()
function was renamed tocreate_output_type_pmf()
.- When extracting data via a hub connection, the column containing model identification information, inferred from
model-output
data directory partitions, was renamed from "model" to "model_id".
- Re-implemented
connect_hub()
function to open connection tomodel-output
data implemented through anarrow
FileSystemDataset
object. This allows users to create customdplyr
queries to access model output data.
- Added functionality to help create JSON configuration files.
- Added
validate_config()
function to validate JSON configuration files against Hub schema as well as functionview_config_val_errors()
for viewing a concise and easier to navigate table of validation errors. - Added a
NEWS.md
file to track changes to the package.