Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Ingest DBT Contract Information as a DataHub Data Contract #11927

Open
matthew-coudert-cko opened this issue Nov 22, 2024 · 1 comment
Labels
feature-request Request for a new feature to be added

Comments

@matthew-coudert-cko
Copy link
Contributor

matthew-coudert-cko commented Nov 22, 2024

We (Checkout.com) have started using DataHub's contract feature more intensely over the past few months, and have implemented a custom mapping between DBT contracts and DataHub's data contract feature. We propose implementing this as a part of the native DBT Core ingestion with the following functionality:

  1. DBT Contracts prevent breaking changes (column removals or column type changes), so they are equivalent to a schema contract in DataHub.
  2. DBT Tests assigned with an arbitrary tag (default contract) have their assertion added to the data contract.
  3. Optionally DBT constraints that are enforced in the target data platform (e.g not_null in Snowflake) could be added into the contract as well as always passing.

Example DBT Yaml:

- name: dbt_contract_test_view
  description: This view is used to test the data contract checks for the dbt models.
  config:
    contract:
      enforced: true # this adds a schema contract to the DataHub data contract.
  columns:
      - name: urn
        data_type: text
        description: The urn of the object.
        data_tests:
          - unique
             tags: ['contract'] # this is included in the data contract
          - not_null # this is not

We're happy to contribute this if there's appetite, happy to hide it behind a feature flag in the DBT config as well.

@matthew-coudert-cko matthew-coudert-cko changed the title Ingest DBT Contract Information as a DataHub Data Contract Feature: Ingest DBT Contract Information as a DataHub Data Contract Nov 22, 2024
@jjoyce0510
Copy link
Collaborator

This is really cool.

I am sure others would be quite interested in this! As long as we can place behind reasonably well named feature flags with the appropriate early stage / incubating labeling, I think things should be fine!

To clarify, for dbt contracts you are minting net new schema assertions in DataHub is that right?
And for other DBT tests you are simply linking them to the contract for the assets.

@RyanHolstien RyanHolstien added the feature-request Request for a new feature to be added label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for a new feature to be added
Projects
None yet
Development

No branches or pull requests

3 participants