Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/is-incremental-compatible #4

Merged
merged 7 commits into from
Jul 17, 2024

Conversation

fivetran-joemarkiewicz
Copy link
Collaborator

@fivetran-joemarkiewicz fivetran-joemarkiewicz commented Jul 15, 2024

PR Overview

This PR will address the following Issue/Feature: Internal Ticket T-739367

This PR will result in the following new package version: v0.2.0

This will be a breaking change due to the nature of Databricks incremental runs resulting in a breaking change as the table format will change for all Databricks incremental models as well as non databricks runtimes will not leverage an incremental strategy for the impacted models.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

🚨 Breaking Changes 🚨

⚠️ Since the following changes result in the table format changing for Databricks users, we recommend running a --full-refresh after upgrading to this version to avoid possible incremental failures.

  • For Databricks All-Purpose clusters, the salesforce_marketing_cloud__events_enhanced model will now be materialized using the delta table format (previously parquet).
    • Delta tables are generally more performant than parquet and are also more widely available for Databricks users. Previously, the parquet file format was causing compilation issues on customers' managed tables.

Documentation

  • Added details to the README to highlight the incremental strategies used within the salesforce_marketing_cloud__events_enhanced model.

Under the Hood

  • The is_incremental_compatible macro has been added to the package. This macro will return true if the Databricks runtime being used is an all-purpose cluster or if any other non-Databricks supported destination is being used.
    • This update was applied as there are other Databricks runtimes (ie. sql warehouse, endpoint, and external runtime) which do not support the insert_overwrite incremental strategy used in the salesforce_marketing_cloud__events_enhanced model.
  • In addition to the above, for Databricks users the salesforce_marketing_cloud__events_enhanced model will now leverage the incremental strategy only if the Databricks runtime is all-purpose. Otherwise, all other Databricks runtimes will not leverage an incremental strategy.
  • Added validation tests to the integration_tests/ folder to ensure the consistency and integrity of the salesforce_marketing_cloud__events_enhanced model for subsequent updates.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned.
  • All necessary documentation and version upgrades have been applied.
  • docs were regenerated (unless this PR does not include any code or yml updates).
  • BuildKite integration tests are passing.
  • Detailed validation steps have been provided below.

Detailed Validation

Please share any and all of your validation steps:

Please see the validation tests passing below using the following variables for BigQuery:

vars:
  salesforce_marketing_cloud_database: <testing-project> 
  salesforce_marketing_cloud_schema: <testing-schema> 
  salesforce_marketing_cloud__link_enabled: false
  salesforce_marketing_cloud__list_enabled: false

image

Additionally, we can see the Databricks incremental logic is working as expected.

  • ✅ Databricks Runtime is using the insert-overwrite incremental logic on subsequent runs as expected.

    • image
  • ✅ Databricks SQL Warehouse (also represents the behavior for all non Databricks Runtimes) is not using the incremental logic on subsequent runs as expected.

    • image
  • Ensuring the other warehouses are not impacted, I can confirm that Snowflake, BigQuery, Redshift, and Postgres are all still using the appropriate incremental logic on subsequent runs.

    • ✅ Snowflake
      • image
    • ✅ BigQuery
      • image
    • ✅ Postgres
      • image
    • ✅ Redshift
      • image

If you had to summarize this PR in an emoji, which would it be?

🧱

@fivetran-joemarkiewicz fivetran-joemarkiewicz marked this pull request as ready for review July 15, 2024 15:26
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz Everything is running well locally! I ran full refresh and incremental runs on bigquery, databricks all purpose, and databricks sql, and confirm all run the expected incremental strategy.

I have some comments/questions for the integrations tests, so just let me know if you want to chat about those!

@@ -16,12 +16,21 @@ db=$1
echo `pwd`
cd integration_tests
dbt deps
if [ "$db" = "databricks-sql" ]; then
dbt seed --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" --full-refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dbt seed --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" --full-refresh
dbt seed --vars '{salesforce_marketing_cloud_schema: sfmc_sqlw_tests}' --target "$db" --full-refresh

The var here just needs to be updated--I think this will fix the table not found messages in the last buildkite run.
I would also make these vars specific for sfmc to keep separated from other packages sql warehouse schemas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes 🤦 that was a copy/paste error on my end. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_SQL_DBT_HTTP_PATH') }}"
schema: sqlw_tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to the schema name here too!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -21,6 +21,10 @@ vars:
salesforce_marketing_cloud_link_send_identifier: "link_send_data"
salesforce_marketing_cloud_triggered_send_identifier: "triggered_send_data"

models:
+schema: "{{ 'sqlw_tests' if target.name == 'databricks-sql' else 'salesforce_marketing_cloud' }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to the schema name here too!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

dbt seed --target "$db" --full-refresh
dbt run --target "$db" --full-refresh
dbt test --target "$db"
dbt run --target "$db"
dbt test --target "$db"
dbt run --vars '{salesforce_marketing_cloud__link_enabled: false, salesforce_marketing_cloud__list_enabled: false}' --full-refresh --target "$db"
dbt test --target "$db"

fi
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dbt_fivetran_log, do you recall why these lines were included in run_models.sh? It seems like all is functioning here without it, so just curious!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were just necessary for the Azure SQL Server support. Different from Databricks SQL Warehouse. No need for that here since we are not supporting Azure SQL Server in this model yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense--thank you!

README.md Outdated Show resolved Hide resolved
@fivetran-joemarkiewicz
Copy link
Collaborator Author

Thanks @fivetran-catfritz for the review and comments/suggestions! I just applied the updates and this is good for re-review. Let me know if you have any other comments.

Copy link
Contributor

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nice to have staging and transform all in one place. Thanks for the responses, and all lgtm!

dbt seed --target "$db" --full-refresh
dbt run --target "$db" --full-refresh
dbt test --target "$db"
dbt run --target "$db"
dbt test --target "$db"
dbt run --vars '{salesforce_marketing_cloud__link_enabled: false, salesforce_marketing_cloud__list_enabled: false}' --full-refresh --target "$db"
dbt test --target "$db"

fi
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense--thank you!

@fivetran-avinash fivetran-avinash self-requested a review July 17, 2024 18:10
Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz Looking good! Just a few questions/suggestions before approving.

CHANGELOG.md Outdated Show resolved Hide resolved
models/salesforce_marketing_cloud__events_enhanced.sql Outdated Show resolved Hide resolved
@@ -16,12 +16,21 @@ db=$1
echo `pwd`
cd integration_tests
dbt deps
if [ "$db" = "databricks-sql" ]; then
dbt seed --vars '{salesforce_marketing_cloud_schema: sfmc_sqlw_tests}' --target "$db" --full-refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a dbt compile statement after this line, similar to dbt_fivetran_log (link)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A compile statement is not necessary. A compile will already be run as part of the run operation. The compile in Fivetran Log may have been for a particular reason, but I don't imagine we need to include it here.

dbt test --vars '{salesforce_marketing_cloud_schema: sfmc_sqlw_tests}' --target "$db"
dbt run --vars '{salesforce_marketing_cloud_schema: sfmc_sqlw_tests}' --target "$db"
dbt test --vars '{salesforce_marketing_cloud_schema: sfmc_sqlw_tests}' --target "$db"
dbt run --vars '{salesforce_marketing_cloud_schema: sfmc_sqlw_tests, salesforce_marketing_cloud__link_enabled: false, salesforce_marketing_cloud__list_enabled: false}' --full-refresh --target "$db"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a dbt run (incremental) after the full-refresh, similar to fivetran_log? (link)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah including an incremental run after a full refresh would be best here. Added to both sql warehouse and normal sections of the run_models.sh

README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

@fivetran-joemarkiewicz fivetran-joemarkiewicz merged commit 8cf68f5 into main Jul 17, 2024
9 checks passed
@fivetran-joemarkiewicz fivetran-joemarkiewicz deleted the bugfix/is-incremental-compatible branch July 17, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants