From 47398ebd26d69c53458873e7c3e6447f7573087c Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 4 Dec 2024 13:37:20 +0000 Subject: [PATCH 1/4] add microbatch to data platform configs --- .../reference/resource-configs/bigquery-configs.md | 5 +++-- .../reference/resource-configs/postgres-configs.md | 1 + .../reference/resource-configs/redshift-configs.md | 1 + .../reference/resource-configs/snowflake-configs.md | 12 +++++++----- .../docs/reference/resource-configs/spark-configs.md | 3 ++- 5 files changed, 14 insertions(+), 8 deletions(-) diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 9dd39c936b6..26b7c7e951f 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -426,8 +426,9 @@ Please note that in order for policy tags to take effect, [column-level `persist The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables. The `incremental_strategy` config can be set to one of two values: - - `merge` (default) - - `insert_overwrite` +- `merge` (default) +- `insert_overwrite` +- [`microbatch`](/docs/build/incremental-microbatch) ### Performance and cost diff --git a/website/docs/reference/resource-configs/postgres-configs.md b/website/docs/reference/resource-configs/postgres-configs.md index f2bf90a93c0..e71c6f1484d 100644 --- a/website/docs/reference/resource-configs/postgres-configs.md +++ b/website/docs/reference/resource-configs/postgres-configs.md @@ -11,6 +11,7 @@ In dbt-postgres, the following incremental materialization strategies are suppor - `append` (default when `unique_key` is not defined) - `merge` - `delete+insert` (default when `unique_key` is defined) +- [`microbatch`](/docs/build/incremental-microbatch) ## Performance optimizations diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index b033cd6267e..01c9bffd055 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -17,6 +17,7 @@ In dbt-redshift, the following incremental materialization strategies are suppor - `append` (default when `unique_key` is not defined) - `merge` - `delete+insert` (default when `unique_key` is defined) +- [`microbatch`](/docs/build/incremental-microbatch) All of these strategies are inherited from dbt-postgres. diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 7bef180e3d3..9a81b9485fc 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -38,11 +38,11 @@ flags: The following configurations are supported. For more information, check out the Snowflake reference for [`CREATE ICEBERG TABLE` (Snowflake as the catalog)](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake). -| Field | Type | Required | Description | Sample input | Note | -| --------------------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Table Format | String | Yes | Configures the objects table format. | `iceberg` | `iceberg` is the only accepted value. | +| Field | Type | Required | Description | Sample input | Note | +| ------ | ----- | -------- | ------------- | ------------ | ------ | +| Table Format | String | Yes | Configures the objects table format. | `iceberg` | `iceberg` is the only accepted value. | | External volume | String | Yes(*) | Specifies the identifier (name) of the external volume where Snowflake writes the Iceberg table's metadata and data files. | `my_s3_bucket` | *You don't need to specify this if the account, database, or schema already has an associated external volume. [More info](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#:~:text=Snowflake%20Table%20Structures.-,external_volume) | -| Base location Subpath | String | No | An optional suffix to add to the `base_location` path that dbt automatically specifies. | `jaffle_marketing_folder` | We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See [Base Location](#base-location) for more info. | +| Base location Subpath | String | No | An optional suffix to add to the `base_location` path that dbt automatically specifies. | `jaffle_marketing_folder` | We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See [Base Location](#base-location) for more info. | ### Example configuration @@ -472,6 +472,8 @@ The [`incremental_strategy` config](/docs/build/incremental-strategy) controls h Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`. +Snowflake also supports the [`microbatch`](/docs/build/incremental-microbatch) incremental strategy. + ## Configuring table clustering dbt supports [table clustering](https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html) on Snowflake. To control clustering for a or incremental model, use the `cluster_by` config. When this configuration is applied, dbt will do two things: @@ -701,4 +703,4 @@ flags: ``` - \ No newline at end of file + diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index 3b2174b8ff5..a52fd93eace 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -37,7 +37,8 @@ For that reason, the dbt-spark plugin leans heavily on the [`incremental_strateg - **`append`** (default): Insert new records without updating or overwriting any existing data. - **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the with new data. If no `partition_by` is specified, overwrite the entire table with new data. - **`merge`** (Delta, Iceberg and Hudi file format only): Match records based on a `unique_key`; update old records, insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) - +- `microbatch` Implements the [microbatch strategy](/docs/build/incremental-microbatch) using `event_time` to define time-based ranges for filtering data. + Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. ### The `append` strategy From c74d9be1a499a0a63a17af30b0689af4bf5cb59e Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:38:48 +0000 Subject: [PATCH 2/4] Update website/docs/reference/resource-configs/bigquery-configs.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/resource-configs/bigquery-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 26b7c7e951f..69e144b6162 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -425,7 +425,7 @@ Please note that in order for policy tags to take effect, [column-level `persist The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables. -The `incremental_strategy` config can be set to one of two values: +The `incremental_strategy` config can be set to one of the following values: - `merge` (default) - `insert_overwrite` - [`microbatch`](/docs/build/incremental-microbatch) From 2f6acf964c74c3275ce075973104857826222614 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:39:08 +0000 Subject: [PATCH 3/4] Update website/docs/reference/resource-configs/snowflake-configs.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- .../docs/reference/resource-configs/snowflake-configs.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 9a81b9485fc..73b34d1fc3b 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -470,6 +470,12 @@ In this example, you can set up a query tag to be applied to every query with th The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. By default, dbt will use a [merge statement](https://docs.snowflake.net/manuals/sql-reference/sql/merge.html) on Snowflake to refresh incremental tables. +Snowflake supports the following incremental strategies: +- Merge (default) +- Append +- Delete+insert +- [`microbatch`](/docs/build/incremental-microbatch) + Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`. Snowflake also supports the [`microbatch`](/docs/build/incremental-microbatch) incremental strategy. From 1236efc0959cebf953e705123d72ea9c7fed4688 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:39:40 +0000 Subject: [PATCH 4/4] Update website/docs/reference/resource-configs/snowflake-configs.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/resource-configs/snowflake-configs.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 73b34d1fc3b..d576b195b65 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -478,7 +478,6 @@ Snowflake supports the following incremental strategies: Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`. -Snowflake also supports the [`microbatch`](/docs/build/incremental-microbatch) incremental strategy. ## Configuring table clustering