Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unknown type_code for model contracts #8887

Merged
merged 5 commits into from
Oct 25, 2023

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Oct 24, 2023

resolves dbt-labs/dbt-postgres#54
resolves #8877

Problem

Postgres has built-in data types like money, uuid, etc that aren't recognized by default. In addition, Postgres has the ability to create user-defined data types. In both cases, the psycopg2 Python driver won't know how to do a reverse lookup from the type_code into a human-readable data type string (like money or uuid).

Other database platforms may have similar situations.

Solution

The proposed solution is to check if a lookup is available. If not, then provide a stand-in of the form unknown data_type 4678. This stand-in only applies if a data contract is broken. If the data contract is valid, then this text label isn't necessary.

Alternatives considered

1) Add missing data types as they come up

#8357 adds translations for some specific type_codes for dbt-postgres. But it doesn't cover user-defined types, and it wouldn't affect other adapters that can run into this issue.

2) Surface a macro in user space

Theoretically, we could create a macro in user space that users can override as-needed to do a translation from a type_code to a data_type.

Examples

Actual: MONEY, expected: DECIMAL

01:23:28    Compilation Error in model my_model (models/my_model.sql)
  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
  
  | column_name  | definition_type       | contract_type | mismatch_reason    |
  | ------------ | --------------------- | ------------- | ------------------ |
  | money_col_22 | unknown type_code 790 | DECIMAL       | data type mismatch |

Actual: DECIMAL, expected: MONEY

01:26:08    Compilation Error in model my_model (models/my_model.sql)
  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
  
  | column_name   | definition_type | contract_type         | mismatch_reason    |
  | ------------- | --------------- | --------------------- | ------------------ |
  | numeric_col_5 | DECIMAL         | unknown type_code 790 | data type mismatch |

Actual: MONEY, expected: UUID

01:27:00    Compilation Error in model my_model (models/my_model.sql)
  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
  
  | column_name  | definition_type       | contract_type          | mismatch_reason    |
  | ------------ | --------------------- | ---------------------- | ------------------ |
  | money_col_22 | unknown type_code 790 | unknown type_code 2950 | data type mismatch |

Testing

Functional tests

  1. Unrecognized type_code fulfills the contract
  2. Unrecognized type_code in the data breaks the contract
  3. Unrecognized type_code specified in a broken contract

Manual tests

  1. User-defined data types (manual by @rlh1994 here)
  2. Manually confirmed that the first functional test above fails when the change in 7b8aa2d is not included
Click to toggle log output
01:16:44  1 of 1 START sql view model test16981966020785324796_test_nonstandard_data_type.my_numeric_model  [RUN]
01:16:44  Unhandled error while executing 
790
01:16:44  1 of 1 ERROR creating sql view model test16981966020785324796_test_nonstandard_data_type.my_numeric_model  [ERROR in 0.08s]
01:16:44  
01:16:44  Finished running 1 view model in 0 hours 0 minutes and 0.28 seconds (0.28s).
01:16:44  
01:16:44  Completed with 1 error and 0 warnings:
01:16:44  
01:16:44    790
01:16:44  
01:16:44  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Spawned issues

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc)
  • This PR does not include type annotations because there are no new or modified functions

@cla-bot cla-bot bot added the cla:yes label Oct 24, 2023
@dbeatty10 dbeatty10 changed the title Dbeatty/8877 unknown data type code Handle unknown type_code for model contracts Oct 24, 2023
@codecov
Copy link

codecov bot commented Oct 24, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (35f46da) 86.36% compared to head (4c8a3b0) 86.30%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    dbt-labs/dbt-core#8887      +/-   ##
==========================================
- Coverage   86.36%   86.30%   -0.07%     
==========================================
  Files         177      177              
  Lines       26385    26385              
==========================================
- Hits        22787    22771      -16     
- Misses       3598     3614      +16     
Flag Coverage Δ
integration 83.08% <ø> (-0.14%) ⬇️
unit 64.58% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rlh1994
Copy link
Contributor

rlh1994 commented Oct 24, 2023

Would it be useful to add a debug level warning when it fails to find a type? In particular if someone is using the column.Data_type from a get_column_schema_from_query call it might not be obvious why something is failing without directly outputting all the column data types?

if type_code in string_types:
return string_types[type_code].name
else:
return f"unknown type_code {type_code}"
Copy link
Contributor Author

@dbeatty10 dbeatty10 Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlh1994 brought up the idea of raising a debug-level warning log message in this case.

Seems like a valuable idea to explore. Maybe something like this?

            logger.debug("unknown type_code: {}", type_code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichelleArk how would you feel about adding this ☝️ debug output when the type_code isn't found?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a little redundant given unknown type_code will appear in the error message either way. So I'd either omit it or write a more informative message like"type_code not found in adapter-provided lookup, defaulting to 'unknown type_code'"

And we may actually need to use fire_event to fire a debuglevel event as opposed not logger.debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Splitting this into its own issue:

That way we can ship this fix while still giving that debug message enough attention.

@rlh1994
Copy link
Contributor

rlh1994 commented Oct 25, 2023

Just for complete visibility on this PR, I have confirmed that this works for user created types in postgres as well.

@dbeatty10
Copy link
Contributor Author

See below for a prototype of converting an unrecognized type_code into the desired data_type.

1. Create user-space macros to do lookups:

macros/type_code.sql

{# Define your lookup table here #}
{% macro get_type_code_lookup_table() %}
  {% set lookup_table = {
      '790': 'MONEY',
      '2950': 'UUID',
  } %}
  {{ return(lookup_table) }}
{% endmacro %}


{# Look up an unrecognized `type_code` for a Column #}
{% macro lookup_type_code(data_type) %}

    {% set lookup_table = get_type_code_lookup_table() %}
    
    {# Attempt to match the pattern in the provided data_type string. #}
    {% set pattern = 'unknown type_code ' %}
    {% if data_type.startswith(pattern) %}
        {# Extract the type code and trim any leading/trailing whitespace. #}
        {% set type_code = data_type.replace(pattern, '').strip() %}
        
        {# Look up the extracted type code in the dictionary. #}
        {% if type_code in lookup_table %}
            {% set data_type = lookup_table[type_code] %}
        {% endif %}
    {% endif %}

  {{ return(data_type) }}

{% endmacro %}

2. Define an analysis to try it out:

analyses/get_column_schema_from_query.sql

{#- Get the column names and data types -#}
{%- set cols = get_column_schema_from_query(sql) -%}

SQL:

  {{ sql | indent(2) }}

Columns:
{% for col in cols %}
  {{ col }}
{%- endfor %}

User-provided `type_code` translations:
{% for col in cols %}
  {{ col.name }} ({{ lookup_type_code(col.data_type) }})
{%- endfor %}

3. Compile the analysis:

dbt compile

4. Examine the output:

target/compiled/my_project/analyses/get_column_schema_from_query.sql

SQL:

  select
    '12.34'::numeric as numeric_col,
    '12.34'::money as money_col,
    'f76838a4-7411-4f04-a960-4fc77fd3107b'::text as text_col,
    'f76838a4-7411-4f04-a960-4fc77fd3107b'::uuid as uuid_col

Columns:

  <Column numeric_col (DECIMAL)>
  <Column money_col (unknown type_code 790)>
  <Column text_col (TEXT)>
  <Column uuid_col (unknown type_code 2950)>

User-provided `type_code` translations:

  numeric_col (DECIMAL)
  money_col (MONEY)
  text_col (TEXT)
  uuid_col (UUID)

@dbeatty10 dbeatty10 marked this pull request as ready for review October 25, 2023 13:55
@dbeatty10 dbeatty10 requested review from a team as code owners October 25, 2023 13:55
@dbeatty10 dbeatty10 requested review from Fleid and emmyoop October 25, 2023 13:55
@dbeatty10 dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Oct 25, 2023
@MichelleArk
Copy link
Contributor

MichelleArk commented Oct 25, 2023

prototype of converting an unrecognized type_code into the desired data_type

Nice!! Presumably the rationale for implementing this in jinja space to support user-defined types?

In case it's helpful, my decision criteria for where adapters-related logic should live boils down to:

  1. If the implementation does not or should not vary by adapter -> dbt-core
  2. If it's adapter specific + has only one objectively correct implementation -> adapter interface, implemented in python for maintainers to overwrite as necessary.
    • This is also easier to unit test for adapter maintainers!
  3. If it's adapter specific + should be flexible to end-user tweaks and customizations -> adapter interface, implemented in jinja (user space) for end-users to customize entirely how they see fit for their dbt project.

As much as I'd like this mapping of type_code to data_type to have a single 'correct' implementation per adapter, I can see how user-defined types put a bit of a wrench in that. It may be possible to split the implementation into base types in python (could follow a similar pattern to #8357) and user-defined types, but that may introduce more complexity than it's worth...

@dbeatty10
Copy link
Contributor Author

Nice!! Presumably the rationale for implementing this in jinja space to support user-defined types?

Yep, you got it @MichelleArk !

Your decision criteria for where adapters-related logic should live is super helpful 🤩

Doing type_code look-ups feels like it should be a separate feature request, so I just opened this up for further consideration and refinement:

In the meantime, the user space Jinja macros in the comment above are a prototype of what users could do even if we don't choose to provide any formal Python or Jinja interfaces for them.

@clementdavies56
Copy link

I'm interested in the release status of this feature on PyPI. I haven't found any details about its release in the recent tags or notes. Could you please update me on its current status?

benesch added a commit to benesch/materialize that referenced this pull request Dec 17, 2023
Backport dbt-labs/dbt-core#8887 to make data contracts work correctly
with custom PostgreSQL types that are unknown to dbt/psycopg2. The error
messages are bad when contract validation on such types fails, but the
contracts fundamentally work, which is a big improvement.

See comments within the patch for details.
benesch added a commit to benesch/materialize that referenced this pull request Dec 17, 2023
Backport dbt-labs/dbt-core#8887 to make data contracts work correctly
with custom PostgreSQL types that are unknown to dbt/psycopg2. The error
messages are bad when contract validation on such types fails, but the
contracts fundamentally work, which is a big improvement.

See comments within the patch for details.
github-actions bot pushed a commit that referenced this pull request Jan 19, 2024
* Handle unknown `type_code` for model contracts

* Changelog entry

* Fix changelog entry

* Functional test for a `type_code` that is not recognized by psycopg2

* Functional tests for data type mismatches

(cherry picked from commit 6aeebc4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.7.latest cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
6 participants