Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace environment variable with a project flag to gate microbatch functionality #10799

Merged
merged 28 commits into from
Nov 11, 2024

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Sep 30, 2024

Resolves #10798

Problem

We had gated the new microbatch feature behind an environment variable in the initial implementation of microbatch (#10594). However, for a better experience, we want people to be setting a project flag (AKA behavior flag).

Solution

Gate microbatch functionality behind a project flag (AKA behavior flag) instead of an environment variable.

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@cla-bot cla-bot bot added the cla:yes label Sep 30, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link

codecov bot commented Sep 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.07%. Comparing base (30b8a92) to head (0f6b7fb).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10799      +/-   ##
==========================================
- Coverage   89.12%   89.07%   -0.06%     
==========================================
  Files         183      183              
  Lines       23592    23626      +34     
==========================================
+ Hits        21027    21044      +17     
- Misses       2565     2582      +17     
Flag Coverage Δ
integration 86.37% <100.00%> (-0.14%) ⬇️
unit 62.79% <52.94%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.79% <52.94%> (-0.03%) ⬇️
Integration Tests 86.37% <100.00%> (-0.14%) ⬇️

…with behavior flag changes

This is needed to get the tests to pass. This is only necessary until the changes in
dbt-adapters are merged. Said another way: we need to merge the changes to dbt-adapters
first, and then revert these dependency changes before merging these changes.
@QMalcolm QMalcolm marked this pull request as ready for review October 1, 2024 21:25
@QMalcolm QMalcolm requested a review from a team as a code owner October 1, 2024 21:25
@QMalcolm QMalcolm changed the title first pass: replace os env with project flag Replace environment variable with a project flag to gate micorbatch functionality Oct 1, 2024
Comment on lines 1 to 2
git+https://github.com/dbt-labs/dbt-adapters.git@main
git+https://github.com/dbt-labs/dbt-adapters.git@main#subdirectory=dbt-tests-adapter
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag#subdirectory=dbt-tests-adapter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to get the tests to pass. This is only necessary until the changes in dbt-adapters are merged. Said another way: we need to merge the changes to dbt-adapters first, and then revert these dependency changes before merging this PR's changes.

@QMalcolm QMalcolm requested a review from a team October 1, 2024 22:00
@MichelleArk MichelleArk changed the title Replace environment variable with a project flag to gate micorbatch functionality Replace environment variable with a project flag to gate microbatch functionality Nov 4, 2024
…o `find_materialization_macro_candidate_by_name`

This is necessary because in the following commit we're going to add a function to get
whether or not a manifest should be run using the new microbatch functionality. Getting
the MacroCandidate is necessary because we need to know the locality of the macro.
…ity should be used

The new microbatch functionality is, unfortunately, potentially dangerous. That is
it adds a new materalization strategy `microbatch` which an end user could have
defined as a custom strategy previously. Additionally we added config keys to nodes,
and as `config` is just a Dict[str, Any], it could contain anything, thus meaning
people could already be using the configs we're adding for different purposes. Thus
we need some intellegent gating. Specifically something that adheres to the following:

cms = Custom Microbatch Strategy
abms = Adapter Builtin Microbatch Strategy
bf = Behavior flag
umb = Use Microbatch Batching
t/f/e = True/False/Error

| cms | abms | bf | umb |
| t   | t    | t  | t   |
| f   | t    | t  | t   |
| t   | f    | t  | t   |
| f   | f    | t  | e   |
| t   | t    | f  | f   |
| f   | t    | f  | t   |
| t   | f    | f  | f   |
| f   | f    | f  | e   |

(The above table assumes that there is a microbatch model present in the project)

In order to achieve this we need to check that either the microbatch behavior
flag is set to true OR microbatch materializaion being used is the _root_ microbatch
materialization (i.e. not custom). The function we added in this commit,
`use_microbatch_batches`, does just that.
@QMalcolm QMalcolm force-pushed the microbatch-project-flags branch from 9b3c122 to 543e024 Compare November 5, 2024 00:41
In 0349968 I had done this for the function
`find_materialization_macro_by_name`, but that wasn't the right function to
do it to, and will be reverted shortly. `find_materialization_macro_by_name`
is used for finding the general materialization macro, whereas `find_macro_by_name`
is more general. For the work we're doing, we need to find the microbatch
macro, which is not a materialization macro.
…_name` to `find_materialization_macro_candidate_by_name`"

This reverts commit 0349968.
…tead of `root`

Previously were were checking for a locality of `root`. However, a locality
of `root` means it was provided by a `package`. We wnt to check for locality
of `core` which basically means `builtin via dbt-core/adapters`. There is
another locality `imported` which I beleive means it comes from another
package.
…in boolean checks

The method `use_microbatch_batches` is always invoked to evaluate an `if`
statement. In most instances, it is part of a logic chain (i.e. there are
multiple things being evaluated in the `if` statement). In `if` statements
where there are multiple things being evaulated, `use_microbatch_batches`
should come _last_ (or as late as possible). This is because it is likely
the most costly thing to evaluate in the logic chain, and thus any shortcuts
cuts via other evaluations in the if statement failing (and thus avoiding
invoking `use_microbatch_batches) is desirable.
git+https://github.com/dbt-labs/dbt-adapters.git@main
git+https://github.com/dbt-labs/dbt-adapters.git@main#subdirectory=dbt-tests-adapter
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag#subdirectory=dbt-tests-adapter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder: let's make sure we bump the lower bound pin of dbt-adapters in setup.py as part of this PR once the dbt-adapters branches are merged.

return "D020"

def message(self) -> str:
description = "The use of a custom microbatch macro outside of batched execution is deprecated. To use it with batched execution, set `flags.require_batched_execution_for_custom_microbatch_strategy` to `True` in `dbt_project.yml`. In the future this will be the default behavior."
Copy link
Contributor Author

@MichelleArk MichelleArk Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit, just because this is user-facing:

"In the future this will be the default behavior." -> "In the future batched execution will be the default behaviour."

Copy link
Contributor Author

@MichelleArk MichelleArk Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also mention the name of the custom macro that exists (e.g. get_incremental_microbatch_sql) as part of the message in case the user that's seeing this doesn't know about the custom strategy macro naming offhand.

Copy link
Contributor Author

@MichelleArk MichelleArk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Can't approve a PR I "authored" but here is my psuedo-approval!

Thank you for taking this across the finish line @QMalcolm, clean + precise work ✨

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self approving because I didn't open the PR 🙈

… branch with behavior flag changes"

This reverts commit 5b9d7f7.
@mirnawong1
Copy link
Contributor

mirnawong1 commented Nov 11, 2024

hey @QMalcolm @MichelleArk , the docs team received this github issue.

i was reviewing this dbt-adapters PR and #10799 -- am i correct in thinking that the following needs to be added to the behavior flag doc:

the require_batched_execution_for_custom_microbatch_strategy flag defaults to False, which will raise a deprecation warning. Set it to True to allow custom microbatch strategies (like get_incremental_microbatch_sql) to use batched execution.

@QMalcolm QMalcolm merged commit 89caa33 into main Nov 11, 2024
58 of 60 checks passed
@QMalcolm QMalcolm deleted the microbatch-project-flags branch November 11, 2024 14:49
@QMalcolm
Copy link
Contributor

hey @QMalcolm @MichelleArk , the docs team received this github issue.

i was reviewing this dbt-adapters PR and #10799 -- am i correct in thinking that the following needs to be added to the behavior flag doc:

the require_batched_execution_for_custom_microbatch_strategy flag defaults to False, which will raise a deprecation warning. Set it to True to allow custom microbatch strategies (like get_incremental_microbatch_sql) to use batched execution.

@mirnawong1 It's actually a little bit different than that. The flag require_batched_execution_for_custom_microbatch_strategy does default to False. However, a user only needs to set require_batched_execution_for_custom_microbatch_strategy to True if they have a custom microbatch macro. If they don't have a custom microbatch macro, everything will work. Additionally, they'll only get a depreciation warning IFF they have a custom microbatch strategy and the flag is False.

@mirnawong1
Copy link
Contributor

hey @QMalcolm oh ok i see! so require_batched_execution_for_custom_microbatch_strategy is for custom microbatch macros only.

so does this also mean we have another flag: require_builtin_microbatch_strategy (pr here), which replaces the DBT_EXPERIMENTAL_MICROBATCH env var? and they should set to true if they had the env_var already enabled?

@QMalcolm
Copy link
Contributor

@mirnawong1 There is no other flag. I recognize that as slightly confusing. For context, the only reason we gated microbatch in the first place was the fear of breaking people's projects who already had a custom microbatch macro. What we've done here is make it so one only needs to set a flag if they have a custom microbatch macro, instead of making everyone set a flag because someone else's project might have a microbatch macro.

For a person setting up a microbatch model for the first time, there is no longer a flag or environment variable they need to set (assuming they don't have a custom microbatch macro)

@mirnawong1
Copy link
Contributor

oh ok got it! so i can update the callout here and replace it with content informing users of the behavior flag should they have a custom microbatch macro only. all other users that are setting up incremental microbatch strategy for the first time w no existing custom macro don't need to do anything apart from their config.

thank you so much for clarifying this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace DBT_EXPERIMENTAL_MICROBATCH env var with behaviour flag
3 participants