-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle unknown type_code
for model contracts
#8887
Conversation
type_code
for model contracts
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main dbt-labs/dbt-core#8887 +/- ##
==========================================
- Coverage 86.36% 86.30% -0.07%
==========================================
Files 177 177
Lines 26385 26385
==========================================
- Hits 22787 22771 -16
- Misses 3598 3614 +16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Would it be useful to add a debug level warning when it fails to find a type? In particular if someone is using the |
if type_code in string_types: | ||
return string_types[type_code].name | ||
else: | ||
return f"unknown type_code {type_code}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlh1994 brought up the idea of raising a debug-level warning log message in this case.
Seems like a valuable idea to explore. Maybe something like this?
logger.debug("unknown type_code: {}", type_code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichelleArk how would you feel about adding this ☝️ debug output when the type_code
isn't found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a little redundant given unknown type_code
will appear in the error message either way. So I'd either omit it or write a more informative message like"type_code not found in adapter-provided lookup, defaulting to 'unknown type_code'"
And we may actually need to use fire_event to fire a debuglevel event as opposed not logger.debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Splitting this into its own issue:
That way we can ship this fix while still giving that debug message enough attention.
Just for complete visibility on this PR, I have confirmed that this works for user created types in postgres as well. |
See below for a prototype of converting an unrecognized 1. Create user-space macros to do lookups:
{# Define your lookup table here #}
{% macro get_type_code_lookup_table() %}
{% set lookup_table = {
'790': 'MONEY',
'2950': 'UUID',
} %}
{{ return(lookup_table) }}
{% endmacro %}
{# Look up an unrecognized `type_code` for a Column #}
{% macro lookup_type_code(data_type) %}
{% set lookup_table = get_type_code_lookup_table() %}
{# Attempt to match the pattern in the provided data_type string. #}
{% set pattern = 'unknown type_code ' %}
{% if data_type.startswith(pattern) %}
{# Extract the type code and trim any leading/trailing whitespace. #}
{% set type_code = data_type.replace(pattern, '').strip() %}
{# Look up the extracted type code in the dictionary. #}
{% if type_code in lookup_table %}
{% set data_type = lookup_table[type_code] %}
{% endif %}
{% endif %}
{{ return(data_type) }}
{% endmacro %} 2. Define an analysis to try it out:
{#- Get the column names and data types -#}
{%- set cols = get_column_schema_from_query(sql) -%}
SQL:
{{ sql | indent(2) }}
Columns:
{% for col in cols %}
{{ col }}
{%- endfor %}
User-provided `type_code` translations:
{% for col in cols %}
{{ col.name }} ({{ lookup_type_code(col.data_type) }})
{%- endfor %} 3. Compile the analysis: dbt compile 4. Examine the output:
|
Nice!! Presumably the rationale for implementing this in jinja space to support user-defined types? In case it's helpful, my decision criteria for where adapters-related logic should live boils down to:
As much as I'd like this mapping of |
Yep, you got it @MichelleArk ! Your decision criteria for where adapters-related logic should live is super helpful 🤩 Doing In the meantime, the user space Jinja macros in the comment above are a prototype of what users could do even if we don't choose to provide any formal Python or Jinja interfaces for them. |
I'm interested in the release status of this feature on PyPI. I haven't found any details about its release in the recent tags or notes. Could you please update me on its current status? |
Backport dbt-labs/dbt-core#8887 to make data contracts work correctly with custom PostgreSQL types that are unknown to dbt/psycopg2. The error messages are bad when contract validation on such types fails, but the contracts fundamentally work, which is a big improvement. See comments within the patch for details.
Backport dbt-labs/dbt-core#8887 to make data contracts work correctly with custom PostgreSQL types that are unknown to dbt/psycopg2. The error messages are bad when contract validation on such types fails, but the contracts fundamentally work, which is a big improvement. See comments within the patch for details.
* Handle unknown `type_code` for model contracts * Changelog entry * Fix changelog entry * Functional test for a `type_code` that is not recognized by psycopg2 * Functional tests for data type mismatches (cherry picked from commit 6aeebc4)
resolves dbt-labs/dbt-postgres#54
resolves #8877
Problem
Postgres has built-in data types like
money
,uuid
, etc that aren't recognized by default. In addition, Postgres has the ability to create user-defined data types. In both cases, thepsycopg2
Python driver won't know how to do a reverse lookup from thetype_code
into a human-readable data type string (likemoney
oruuid
).Other database platforms may have similar situations.
Solution
The proposed solution is to check if a lookup is available. If not, then provide a stand-in of the form
unknown data_type 4678
. This stand-in only applies if a data contract is broken. If the data contract is valid, then this text label isn't necessary.Alternatives considered
1) Add missing data types as they come up
#8357 adds translations for some specific
type_code
s for dbt-postgres. But it doesn't cover user-defined types, and it wouldn't affect other adapters that can run into this issue.2) Surface a macro in user space
Theoretically, we could create a macro in user space that users can override as-needed to do a translation from a
type_code
to adata_type
.Examples
Actual:
MONEY
, expected:DECIMAL
Actual:
DECIMAL
, expected:MONEY
Actual:
MONEY
, expected:UUID
Testing
Functional tests
type_code
fulfills the contracttype_code
in the data breaks the contracttype_code
specified in a broken contractManual tests
Click to toggle log output
Spawned issues
type_code
s intodata_type
s #8900type_code
fails to convert to adata_type
dbt-postgres#38Checklist