-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic test for exposure schema validation #530
Add generic test for exposure schema validation #530
Conversation
ELE-1703 Create generic test for exposure stability
Definition of done: Create a generic test defined per model. The test iterates on the defined exposures and looks for exposures that depend on the model, and checks the columns described in the exposure match the columns in the model (name and data type). |
👋 @erikzaadi |
b94d093
to
22e489c
Compare
@erikzaadi for clarity's sake I'd use "data_type" and not dtype :) |
40f34e0
to
7463179
Compare
a0010fe
to
aa7f9d7
Compare
Personally think |
|
||
{%- for exposure in exposures -%} | ||
{%- if node in exposure.depends_on.nodes and exposure['meta'] | default(none) is not none -%} | ||
{%- do matching_exposures.append(exposure) -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the matching_exposures
to only be exposures that are referencing the tested model in depends_on
. This would also save a lot of unnecessary rendering when there are a lot of exposures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what this loop does
eeb4b09
to
c3a11de
Compare
fb1846c
to
650f27c
Compare
@@ -78,6 +78,7 @@ jobs: | |||
uses: actions/setup-python@v4 | |||
with: | |||
python-version: "3.8.17" | |||
cache: "pip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool
Did you see performance improvements in the job?
I tried adding it once to another workflow and it didn't save as much as I hoped, but still I guess it's a good configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much, about 10-15 seconds :(
referenced_columns: | ||
- name: id | ||
data_type: numeric | ||
source: ref('customers') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that maybe instead of "source" we should call it "node"?
Since it can contain either models or sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, @ellakz thoughts?
b408673
to
9c1219b
Compare
9c1219b
to
f7038ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall really nice job!
Left some comments - nothing to serious 🙂
|
||
|
||
def seed(dbt_project: DbtProject): | ||
(seed_result,) = dbt_project.dbt_runner._run_command( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use dbt_runner.seed command instead of the internal _run_command here?
@@ -37,6 +40,8 @@ | |||
{% do return("schema_change") %} | |||
{% elif flattened_test.short_name | lower in python_tests %} | |||
{% do return("python_test") %} | |||
{% elif flattened_test.short_name | lower in dbt_tests %} | |||
{% do return("dbt_test") %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is the right thing to do.
Maybe the definition of test type is a bit off, but I do not think that we need to categorize ore none anomaly / schema / python tests as dbt test
.
Maybe we need to put it in a category like 'integrity` and start having a better types because dbt test doesn't says much and can be really confusing.
@haritamar WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add warnings foe the user in case the exposures he set don't contain the right meta format (or meta at all)?
I think that saying "Your test passed - your exposures are valid" while we didn't checked them because they have missing properties or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that is me being picky, but I think that we should had comments that explains the test behaviour as well.
Although it is a nice test, it is still hard to understand how it works just from reading it (for example, I didn't understand by myself that we need to make sure the exposure contains the right meta fields)
3b94f11
to
b30d8b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
b30d8b7
to
31ad61b
Compare
Added a generic test that will validate model columns and data types pending that the
meta
attribute is populated with a dict in the following schema:Where the
data_type
is optional.source
is optional if you only have 1depends_on
node.E.g in a
exposures.yml
: