Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3099] Commands & configs naming changes: expansion of dbt testing coverage #8606

Closed
Tracked by #8283
graciegoheen opened this issue Sep 8, 2023 · 6 comments
Closed
Tracked by #8283
Labels
Impact: CA Impact: Exp user docs [docs.getdbt.com] Needs better documentation

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 8, 2023

Description

With the introduction of new dbt unit tests, we will need to adjust the naming conventions of our current dbt tests to avoid confusion.

Here's how I'm thinking about the distinction. We are expanding our testing coverage to include both:

  • dbt "data" tests: Test your data outputs (dbt models, snapshots, seeds, etc.) and inputs (dbt sources) in your warehouse to ensure your data is valid given your defined assertions. This is the type of testing we currently support.
  • dbt "unit" tests: Test your modeling logic using a small set of static inputs to validate that your code is working as expected. This is the new type of testing we're building as part of this initiative.

Commands

From the outcome of this spike, we will use the following commands for each type of test:
dbt test --select test_type:data
dbt test --select test_type:unit

  • If no flag is specified, all test_types will be included
  • the same flags will work for dbt build and dbt list

Configs

I propose we leave test-paths as is, and specify that csv fixtures for unit tests must be defined in tests/fixtures, just like generic test macros must be defined in tests/generic. The test-paths folder(s) is for all the "reusable bits" of dbt tests and singular data tests.

We will also make a few changes to the model yml configurations:

models:
  - name: orders
    columns:
      - name: order_id
        data_tests:
          - unique
          - not_null

unit_tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: [email protected],     email_top_level_domain: example.com}
         - {user_id: 2, email: [email protected],     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}
  - name: test_another_column
    ...

and the dbt_project.yml configuration:

data-tests:
...

unit-tests:
...

Note: for backward compatibility tests: will be an alias for data_tests: / data-tests: but we will eventually deprecate tests:

Docs

We will need to do an analysis of all places where docs (both dbt docs, explorer, and our documentation website) need to be updated for this change.

Note: the config naming changes will impact the IDE shortcuts for yml generation

@github-actions github-actions bot changed the title Commands & configs naming changes: expansion of dbt testing coverage [CT-3099] Commands & configs naming changes: expansion of dbt testing coverage Sep 8, 2023
@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Sep 8, 2023
@MichelleArk
Copy link
Contributor

MichelleArk commented Sep 11, 2023

From refinement:

In order to support:

dbt test --data
dbt test --unit

we'd likely to the if/else routing for task setup somewhere here, which would make it difficult with our current click setup to differentiate between which options are supported for which subcommand (--unit or --data).

But on the flip side, defining these as click subcommands (dbt test unit, dbt test data (?)) would make it difficult to preserve backwards compatibility wth the previous dbt test command.

@graciegoheen
Copy link
Contributor Author

^ @MichelleArk as discussed, let's spike the options to understand the technical challenges of them:

  1. dbt test —-unit and dbt test —-data
  2. dbt unit test and dbt data test
  3. dbt unit-test and dbt data-test

For all of these, we'd want to maintain legacy dbt test behavior to not introduce a breaking change - perhaps we'd eventually deprecate dbt test in the future.

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Sep 14, 2023

If we move to config name as unit-tests: instead of unit: we will need to update the spec of the unit test definitions.

What would the implications be of switching to something like this:

unit_tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_model_a')
        format: csv
        fixture: my_model_a_fixture
...

where "model" is a property of a given unit test, instead of defining all unit tests underneath a single model.

@MichelleArk @gshank

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Sep 18, 2023

Notes from refinement:

unit_tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_model_a')
        format: csv
        fixture: my_model_a_fixture
...
  • this sounds good!
  • more consistent

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Sep 18, 2023

I think next steps here is to open some smaller implementation tickets for each of the naming changes @dbeatty10

  1. update command syntax (blocked on [CT-3122] [spike] explore testing command options to understand technical complexities and determine best option #8651)
  2. update config blocks (unit-tests and data-tests) [CT-3147] [Implementation] update config block names for data tests and unit tests #8699

@graciegoheen
Copy link
Contributor Author

Outcome of the spike #8651 -> unit tests and data tests are types of tests that can be selected using --select test_type:unit or --select test_type:data for dbt build and dbt test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impact: CA Impact: Exp user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

2 participants