add key requirements about adapter requirement #6582

mirnawong1 · 2024-12-03T15:17:27Z

this pr adds support forincremental stategy microbatch, which requires additional configs over the standard set:
dbt-postgres requires a unique_key
dbt-bigquery requires a partition_by
dbt-spark requires a [partition_by]

Resolves #6576

🚀 Deployment available! Here are the direct links to the updated files:

https://docs-getdbt-com-git-add-microbatch-adapter-config-dbt-labs.vercel.app/docs/build/incremental-microbatch

vercel · 2024-12-03T15:17:33Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
docs-getdbt-com	✅ Ready (Inspect)	Visit Preview	Dec 3, 2024 6:49pm

matthewshaver · 2024-12-03T21:12:17Z

website/docs/docs/build/incremental-microbatch.md

+| `unique_key` |  A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String <br />  | Optional* |
+| `partition_by` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String | Optional* |
+
+***Note:**


Suggested change

***Note:**

*There are scenarios where the following optional configs become required:

I'm not sure if optional is the right terminology, but I also don't know of a better terminology in this instance. Specifically, unique_key is, I believe, an optional config for all models that does something regardless of adapter and has core implications. This is in contrast to partition_by which isn't actually used in core for anything. It is only ever used by adapter implementations that use it. That is to say defining partition_by on a model in say dbt-postgres will do nothing.

QMalcolm

One comment

QMalcolm · 2024-12-03T22:12:34Z

website/docs/docs/build/incremental-microbatch.md

+| `unique_key` |  A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String <br />  | Optional* |
+| `partition_by` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String | Optional* |


Where did the

Required for the check strategy.

part of the description come from? It feels odd here, as in, what is a check strategy in context of defining a microbatch model?

Additionally, I feel weird about including unique_key and partition_by in the table. They are only relevant for specific adapters, and not microbatch as a whole. The issue is the microbatch isn't a uniform strategy across the different adapters. That is (where ~= means "approximately equals")

dbt-postgres microbatch ~= merge

dbt-redshift microbatch ~= delete+insert

dbt-bigquery microbatch ~= insert_overwrite

dbt-spark microbatch ~= insert_overwrite

dbt-databricks microbatch ~= insert_overwrite

dbt-snowflake microbatch ~= delete+insert

The unique_key is required for dbt-postgres's microbatch because it's merge strategy requires a unique_key. And that unique_key is used to identify which rows in the data warehouse need to get merged. For dbt-bigquery and dbt-spark, their underlying implementation of insert_overwrite requires a partition_by. My understanding of that strategy is that partition_by makes it efficient.

Oh thanks for catching this! this def shouldn't be there - it was tricky and maybe we need a separate 'adapters-related' section to explain the microbatch x adapter strategy.

I'll rejig this !

mirnawong1 · 2024-12-03T22:45:13Z

website/docs/docs/build/incremental-microbatch.md

+| `begin`      |  The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today."        | N/A     | Date   | Required |
+| `batch_size` |  The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year`    | N/A     | String  | Required |
+| `lookback`   | Process X batches prior to the latest bookmark to capture late-arriving records.    | `1`     | Integer | Optional |
+| `unique_key` |  A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String <br />  | Optional* |


mirnawong1 · 2024-12-03T22:45:25Z

website/docs/docs/build/incremental-microbatch.md

+| `batch_size` |  The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year`    | N/A     | String  | Required |
+| `lookback`   | Process X batches prior to the latest bookmark to capture late-arriving records.    | `1`     | Integer | Optional |
+| `unique_key` |  A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String <br />  | Optional* |
+| `partition_by` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String | Optional* |


add note about adapter requirement

b48ed25

mirnawong1 requested a review from a team as a code owner December 3, 2024 15:17

github-actions bot added content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: small This change will take 1 to 2 days to address labels Dec 3, 2024

vercel bot deployed to Preview December 3, 2024 15:20 View deployment

clarify microbatch

68d1431

vercel bot deployed to Preview December 3, 2024 15:41 View deployment

Merge branch 'current' into add-microbatch-adapter-config

fb11cb8

vercel bot deployed to Preview December 3, 2024 18:49 View deployment

matthewshaver reviewed Dec 3, 2024

View reviewed changes

QMalcolm reviewed Dec 3, 2024

View reviewed changes

mirnawong1 commented Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add key requirements about adapter requirement #6582

add key requirements about adapter requirement #6582

mirnawong1 commented Dec 3, 2024 •

edited by github-actions bot

Loading

vercel bot commented Dec 3, 2024 •

edited

Loading

matthewshaver Dec 3, 2024 •

edited

Loading

QMalcolm Dec 3, 2024 •

edited

Loading

QMalcolm left a comment

QMalcolm Dec 3, 2024

mirnawong1 Dec 3, 2024

mirnawong1 Dec 3, 2024

mirnawong1 Dec 3, 2024

	*Note:
	*There are scenarios where the following optional configs become required:

		\| `unique_key` \| A column(s) (string or array) or expression for the record. Required for the `check` strategy. \| N/A \| String <br /> \| Optional* \|
		\| `partition_by` \| A column(s) (string or array) or expression for the record. Required for the `check` strategy. \| N/A \| String \| Optional* \|

add key requirements about adapter requirement #6582

Are you sure you want to change the base?

add key requirements about adapter requirement #6582

Conversation

mirnawong1 commented Dec 3, 2024 • edited by github-actions bot Loading

vercel bot commented Dec 3, 2024 • edited Loading

matthewshaver Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

QMalcolm Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

QMalcolm left a comment

Choose a reason for hiding this comment

QMalcolm Dec 3, 2024

Choose a reason for hiding this comment

mirnawong1 Dec 3, 2024

Choose a reason for hiding this comment

mirnawong1 Dec 3, 2024

Choose a reason for hiding this comment

mirnawong1 Dec 3, 2024

Choose a reason for hiding this comment

mirnawong1 commented Dec 3, 2024 •

edited by github-actions bot

Loading

vercel bot commented Dec 3, 2024 •

edited

Loading

matthewshaver Dec 3, 2024 •

edited

Loading

QMalcolm Dec 3, 2024 •

edited

Loading