Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orient to Caltrans Step 1: Add tests and docs for data grain #340

Open
jkarpen opened this issue Aug 7, 2024 · 1 comment
Open

Orient to Caltrans Step 1: Add tests and docs for data grain #340

jkarpen opened this issue Aug 7, 2024 · 1 comment
Assignees
Labels
will close Label to flag issues that will close this sprint

Comments

@jkarpen
Copy link
Collaborator

jkarpen commented Aug 7, 2024

Almost all of the tables in the dbt project have an intended data grain. In most cases they are time series for devices, and the grain is a combination of a device ID and a timestamp at a particular aggregation level. However, this data grain is not well documented or tested! Please:

  1. Review the data models in the dbt project and identify the intended grain (please talk to Ken if you need help)
  2. Add documentation where appropriate to better indicate the intended grain of the models.
  3. Add uniqueness and not-null tests to enforce the uniqueness, so long as it is not too costly.

Caveats:

  • A station is made up of multiple detectors, one in each lane. In some cases we use a station+lane combination to indicate a unique detector, in other cases we use a detector ID. We may want to standardize on the latter, but in the meantime, know that a table with station and lane is at the detector level, and a table with station only is at the station level.
  • Uniqueness tests on larger tables may be expensive. Do some performance tests and use best judgment on whether they are appropriate for a given dbt model.
@summer-mothwood
Copy link
Contributor

Part of this project will be to change to the new data_tests syntax across the yaml files for all Caltrans models that currently have data tests: https://docs.getdbt.com/docs/build/data-tests

From dbt v1.8, "tests" are now called "data tests" to disambiguate from unit tests. The YAML key tests: is still supported as an alias for data_tests:. Refer to New data_tests: syntax for more information.

@jkarpen jkarpen added the will close Label to flag issues that will close this sprint label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
will close Label to flag issues that will close this sprint
Projects
None yet
Development

No branches or pull requests

2 participants