Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic test for exposure schema validation #530

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

erikzaadi
Copy link
Contributor

@erikzaadi erikzaadi commented Sep 10, 2023

Added a generic test that will validate model columns and data types pending that the meta attribute is populated with a dict in the following schema:

{ 
    "referenced_columns": [
        {
            "column_name": "column_A",
            "data_type": "string"
        },
        {
            "column_name": "column_B",
            "data_type": "numeric",
            "node": "ref('table')"
        } 
    ]
}

Where the data_type is optional.

source is optional if you only have 1 depends_on node.

E.g in a exposures.yml:

exposures:

  - name: customers
    label: CustomersFTW
    type: dashboard
    maturity: high
    url: https://bi.tool/dashboards/1
    description: >
      Did someone say "exponential growth"?

    depends_on:
      - ref('customers')

    owner:
      name: Callum McData
      email: [email protected]
    meta:
      referenced_columns:
        - column_name: "customer_id"
          data_type: "numeric"
        - column_name: 'ZOMG'
          data_type: "string"
        - name: 'WithNoDataType'
          node: ref('customers')

@erikzaadi erikzaadi requested a review from ellakz September 10, 2023 08:21
@linear
Copy link

linear bot commented Sep 10, 2023

ELE-1703 Create generic test for exposure stability

Definition of done: Create a generic test defined per model. The test iterates on the defined exposures and looks for exposures that depend on the model, and checks the columns described in the exposure match the columns in the model (name and data type).

@github-actions
Copy link
Contributor

👋 @erikzaadi
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 8 times, most recently from b94d093 to 22e489c Compare September 10, 2023 12:03
@ellakz
Copy link
Contributor

ellakz commented Sep 10, 2023

@erikzaadi for clarity's sake I'd use "data_type" and not dtype :)
also can you please add tests and documentation?

@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 2 times, most recently from 40f34e0 to 7463179 Compare September 11, 2023 14:00
@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 3 times, most recently from a0010fe to aa7f9d7 Compare September 14, 2023 08:51
@elongl
Copy link
Member

elongl commented Sep 14, 2023

Personally think columns needs to be renamed to referenced_columns.

macros/edr/tests/test_exposure_schema_validity.sql Outdated Show resolved Hide resolved

{%- for exposure in exposures -%}
{%- if node in exposure.depends_on.nodes and exposure['meta'] | default(none) is not none -%}
{%- do matching_exposures.append(exposure) -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the matching_exposures to only be exposures that are referencing the tested model in depends_on. This would also save a lot of unnecessary rendering when there are a lot of exposures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what this loop does

macros/edr/tests/test_exposure_schema_validity.sql Outdated Show resolved Hide resolved
@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from eeb4b09 to c3a11de Compare September 14, 2023 10:47
@erikzaadi erikzaadi marked this pull request as ready for review September 14, 2023 15:31
@erikzaadi erikzaadi changed the title [WIP] Add generic test for exposure schema validation Add generic test for exposure schema validation Sep 14, 2023
@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 4 times, most recently from fb1846c to 650f27c Compare September 18, 2023 14:40
@@ -78,6 +78,7 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: "3.8.17"
cache: "pip"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool
Did you see performance improvements in the job?
I tried adding it once to another workflow and it didn't save as much as I hoped, but still I guess it's a good configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much, about 10-15 seconds :(

referenced_columns:
- name: id
data_type: numeric
source: ref('customers')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that maybe instead of "source" we should call it "node"?
Since it can contain either models or sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, @ellakz thoughts?

@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 9 times, most recently from b408673 to 9c1219b Compare September 27, 2023 05:24
@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from 9c1219b to f7038ca Compare September 27, 2023 09:03
Copy link
Contributor

@IDoneShaveIt IDoneShaveIt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really nice job!
Left some comments - nothing to serious 🙂



def seed(dbt_project: DbtProject):
(seed_result,) = dbt_project.dbt_runner._run_command(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use dbt_runner.seed command instead of the internal _run_command here?

@@ -37,6 +40,8 @@
{% do return("schema_change") %}
{% elif flattened_test.short_name | lower in python_tests %}
{% do return("python_test") %}
{% elif flattened_test.short_name | lower in dbt_tests %}
{% do return("dbt_test") %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is the right thing to do.
Maybe the definition of test type is a bit off, but I do not think that we need to categorize ore none anomaly / schema / python tests as dbt test.

Maybe we need to put it in a category like 'integrity` and start having a better types because dbt test doesn't says much and can be really confusing.

@haritamar WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add warnings foe the user in case the exposures he set don't contain the right meta format (or meta at all)?
I think that saying "Your test passed - your exposures are valid" while we didn't checked them because they have missing properties or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is me being picky, but I think that we should had comments that explains the test behaviour as well.
Although it is a nice test, it is still hard to understand how it works just from reading it (for example, I didn't understand by myself that we need to make sure the exposure contains the right meta fields)

@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from 3b94f11 to b30d8b7 Compare September 28, 2023 05:08
Copy link
Contributor

@IDoneShaveIt IDoneShaveIt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@erikzaadi erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from b30d8b7 to 31ad61b Compare September 28, 2023 12:17
@erikzaadi erikzaadi merged commit 31ad61b into master Sep 28, 2023
11 checks passed
@erikzaadi erikzaadi deleted the ele-1703-create-generic-test-for-exposure-stability branch September 28, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants