Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] new configuration to run tests on only the "new" data for snapshots and incremental models #10877

Open
graciegoheen opened this issue Oct 17, 2024 · 2 comments
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Oct 17, 2024

Describe the feature

A more general solution for #10236 and #10864 and dbt-labs/dbt-snowflake#1198

Folks want to be able to test just their "new" data before it's inserted into their existing table. This is relevant for:

  • snapshots
  • incremental models

For example, you may want to test that your unique_key is actually unique for the "new" data. If that test fails don't insert the "new" data into the existing table.

We should create a new data test configuration to allow you to configure a given data to only run on the "new" data.

models:
  - name: my_model
    columns:
      - name: id
        data_tests:
          - unique:
              new_records_only: true  # name/spec TBD

Acceptance criteria:

  • you can set this config on a data test for an incremental model
  • you can set this config on a data test for a snapshot
  • when set, the data test will only run on "new" data before insert/merge
  • if data test fails, the subsequent insert/merge will be skipped
@QuentinCoviaux
Copy link

Reminds of this talk from Coalesce on how the team at Lyst did this by overwriting the incremental macro.

For the spec, I'm thinking you'd probably want to be able to test the new data in the tmp table, but also probably the final table.

For unique checks, the way I think about it is that even if you assert that your new record are in fact unique, that uniqueness might not be true when compiled with old records.

Super excited to have this for other tests though!

@NaomiJohnson
Copy link

Thanks @QuentinCoviaux for sharing my talk

@graciegoheen we have built a solution in house that will test necessary data while ensuring the primary key testing is always 100% accurate, in Snowflake
The implementation varies depending on whether the incremental model:

  1. does a merge or delete+insert where the primary key is part of the unique_key
  2. or, does a delete+insert where the PK is not part of the unique_key or it's just an insert

It's also been built so that the user only has to add unique_test_key (i.e. the primary key) in the model config, if the unique_key isn't the primary key
So, it's all automatic

Would be happy to chat through with you

@graciegoheen graciegoheen added the paper_cut A small change that impacts lots of users in their day-to-day label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day
Projects
None yet
Development

No branches or pull requests

3 participants