Add generic test for exposure schema validation #530

erikzaadi · 2023-09-10T08:21:08Z

Added a generic test that will validate model columns and data types pending that the meta attribute is populated with a dict in the following schema:

{ 
    "referenced_columns": [
        {
            "column_name": "column_A",
            "data_type": "string"
        },
        {
            "column_name": "column_B",
            "data_type": "numeric",
            "node": "ref('table')"
        } 
    ]
}

Where the data_type is optional.

source is optional if you only have 1 depends_on node.

E.g in a exposures.yml:

exposures:

  - name: customers
    label: CustomersFTW
    type: dashboard
    maturity: high
    url: https://bi.tool/dashboards/1
    description: >
      Did someone say "exponential growth"?

    depends_on:
      - ref('customers')

    owner:
      name: Callum McData
      email: [email protected]
    meta:
      referenced_columns:
        - column_name: "customer_id"
          data_type: "numeric"
        - column_name: 'ZOMG'
          data_type: "string"
        - name: 'WithNoDataType'
          node: ref('customers')

linear · 2023-09-10T08:21:10Z

ELE-1703 Create generic test for exposure stability

Definition of done: Create a generic test defined per model. The test iterates on the defined exposures and looks for exposures that depend on the model, and checks the columns described in the exposure match the columns in the model (name and data type).

github-actions · 2023-09-10T08:21:19Z

👋 @erikzaadi
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

ellakz · 2023-09-10T14:36:10Z

@erikzaadi for clarity's sake I'd use "data_type" and not dtype :)
also can you please add tests and documentation?

macros/edr/tests/test_exposure_schema_validity.sql

integration_tests/tests/test_exposure_schema_validity.py

macros/edr/tests/test_exposure_schema_validity.sql

integration_tests/tests/test_exposure_schema_validity.py

elongl · 2023-09-14T08:53:32Z

Personally think columns needs to be renamed to referenced_columns.

integration_tests/tests/test_exposure_schema_validity.py

macros/edr/tests/test_exposure_schema_validity.sql

elongl · 2023-09-14T09:01:26Z

macros/edr/tests/test_exposure_schema_validity.sql

+
+    {%- for exposure in exposures -%}
+        {%- if node in exposure.depends_on.nodes and exposure['meta'] | default(none) is not none -%}
+            {%- do matching_exposures.append(exposure) -%}


I'd expect the matching_exposures to only be exposures that are referencing the tested model in depends_on. This would also save a lot of unnecessary rendering when there are a lot of exposures.

That's what this loop does

macros/edr/tests/test_exposure_schema_validity.sql

haritamar · 2023-09-19T07:11:05Z

.github/workflows/test-warehouse.yml

@@ -78,6 +78,7 @@ jobs:
        uses: actions/setup-python@v4
        with:
          python-version: "3.8.17"
+          cache: "pip"


Cool
Did you see performance improvements in the job?
I tried adding it once to another workflow and it didn't save as much as I hoped, but still I guess it's a good configuration

Not much, about 10-15 seconds :(

haritamar · 2023-09-19T07:21:57Z

integration_tests/dbt_project/models/exposures.yml

+      referenced_columns:
+        - name: id
+          data_type: numeric
+          source: ref('customers')


I think that maybe instead of "source" we should call it "node"?
Since it can contain either models or sources.

Makes sense, @ellakz thoughts?

IDoneShaveIt

Overall really nice job!
Left some comments - nothing to serious 🙂

IDoneShaveIt · 2023-09-27T11:14:16Z

integration_tests/tests/test_exposure_schema_validity.py

+
+
+def seed(dbt_project: DbtProject):
+    (seed_result,) = dbt_project.dbt_runner._run_command(


Can't we use dbt_runner.seed command instead of the internal _run_command here?

IDoneShaveIt · 2023-09-27T12:27:24Z

macros/edr/tests/test_utils/get_test_type.sql

@@ -37,6 +40,8 @@
      {% do return("schema_change") %}
    {% elif flattened_test.short_name | lower in python_tests %}
        {% do return("python_test") %}
+    {% elif flattened_test.short_name | lower in dbt_tests %}
+        {% do return("dbt_test") %}


I am not sure this is the right thing to do.
Maybe the definition of test type is a bit off, but I do not think that we need to categorize ore none anomaly / schema / python tests as dbt test.

Maybe we need to put it in a category like 'integrity` and start having a better types because dbt test doesn't says much and can be really confusing.

@haritamar WDYT?

IDoneShaveIt · 2023-09-27T12:33:23Z

macros/edr/tests/test_exposure_schema_validity.sql

Can we add warnings foe the user in case the exposures he set don't contain the right meta format (or meta at all)?
I think that saying "Your test passed - your exposures are valid" while we didn't checked them because they have missing properties or something.

Not sure if that is me being picky, but I think that we should had comments that explains the test behaviour as well.
Although it is a nice test, it is still hard to understand how it works just from reading it (for example, I didn't understand by myself that we need to make sure the exposure contains the right meta fields)

IDoneShaveIt

LGTM

erikzaadi requested a review from ellakz September 10, 2023 08:21

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 8 times, most recently from b94d093 to 22e489c Compare September 10, 2023 12:03

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 2 times, most recently from 40f34e0 to 7463179 Compare September 11, 2023 14:00

erikzaadi mentioned this pull request Sep 11, 2023

Add docs a for exposure generic tests elementary-data/elementary#1154

Merged

haritamar reviewed Sep 11, 2023

View reviewed changes

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 3 times, most recently from a0010fe to aa7f9d7 Compare September 14, 2023 08:51

elongl reviewed Sep 14, 2023

View reviewed changes

integration_tests/tests/test_exposure_schema_validity.py Outdated Show resolved Hide resolved

elongl reviewed Sep 14, 2023

View reviewed changes

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from eeb4b09 to c3a11de Compare September 14, 2023 10:47

erikzaadi marked this pull request as ready for review September 14, 2023 15:31

erikzaadi changed the title ~~[WIP] Add generic test for exposure schema validation~~ Add generic test for exposure schema validation Sep 14, 2023

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 4 times, most recently from fb1846c to 650f27c Compare September 18, 2023 14:40

haritamar reviewed Sep 19, 2023

View reviewed changes

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch 9 times, most recently from b408673 to 9c1219b Compare September 27, 2023 05:24

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from 9c1219b to f7038ca Compare September 27, 2023 09:03

erikzaadi requested a review from IDoneShaveIt September 27, 2023 10:55

IDoneShaveIt reviewed Sep 27, 2023

View reviewed changes

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from 3b94f11 to b30d8b7 Compare September 28, 2023 05:08

IDoneShaveIt approved these changes Sep 28, 2023

View reviewed changes

erikzaadi added 2 commits September 28, 2023 15:16

Add pip cache in CI

5afcb45

Add exposure stability test

31ad61b

erikzaadi force-pushed the ele-1703-create-generic-test-for-exposure-stability branch from b30d8b7 to 31ad61b Compare September 28, 2023 12:17

erikzaadi merged commit 31ad61b into master Sep 28, 2023
11 checks passed

erikzaadi deleted the ele-1703-create-generic-test-for-exposure-stability branch September 28, 2023 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic test for exposure schema validation #530

Add generic test for exposure schema validation #530

erikzaadi commented Sep 10, 2023 •

edited

Loading

linear bot commented Sep 10, 2023

github-actions bot commented Sep 10, 2023

ellakz commented Sep 10, 2023

elongl commented Sep 14, 2023

elongl Sep 14, 2023

erikzaadi Sep 14, 2023

haritamar Sep 19, 2023

erikzaadi Sep 19, 2023

haritamar Sep 19, 2023

erikzaadi Sep 19, 2023

IDoneShaveIt left a comment

IDoneShaveIt Sep 27, 2023

IDoneShaveIt Sep 27, 2023

IDoneShaveIt Sep 27, 2023

IDoneShaveIt Sep 27, 2023

IDoneShaveIt left a comment



		def seed(dbt_project: DbtProject):
		(seed_result,) = dbt_project.dbt_runner._run_command(

Add generic test for exposure schema validation #530

Add generic test for exposure schema validation #530

Conversation

erikzaadi commented Sep 10, 2023 • edited Loading

linear bot commented Sep 10, 2023

github-actions bot commented Sep 10, 2023

ellakz commented Sep 10, 2023

elongl commented Sep 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IDoneShaveIt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IDoneShaveIt left a comment

Choose a reason for hiding this comment

erikzaadi commented Sep 10, 2023 •

edited

Loading