You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a hub to be successfully accessed as an arrow dataset, column data types should not change from round to round.
Generally many task IDs that are covered by our schema shouldn't change data type in further rounds as that's somewhat fixed by the schema. However:
there are task IDs that accept more than one data type
Custom task IDs which are beyond our control
have the potential to vary between modeling tasks/rounds and change over time and this could indeed cause problems downstream. This is mainly a problem for parquet files (but has a small chance to cause problems in csvs too).
Dynamic check for more than one data type in task ID columns
Develop a dynamic config level validation check that:
Validates that task ID values across all rounds and modeling tasks share a single data type.
If not, determine the simplest data type that can encode all values.
If later rounds introduce a change in data type issue a warning that hub integrity might be affected by such a change
The text was updated successfully, but these errors were encountered:
For a hub to be successfully accessed as an arrow dataset, column data types should not change from round to round.
Generally many task IDs that are covered by our schema shouldn't change data type in further rounds as that's somewhat fixed by the schema. However:
have the potential to vary between modeling tasks/rounds and change over time and this could indeed cause problems downstream. This is mainly a problem for parquet files (but has a small chance to cause problems in csvs too).
Dynamic check for more than one data type in task ID columns
Develop a dynamic config level validation check that:
The text was updated successfully, but these errors were encountered: