Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added retry policy param to dbt assets decorator #18990

Closed
wants to merge 2 commits into from

Conversation

askvinni
Copy link
Contributor

@askvinni askvinni commented Jan 3, 2024

Summary & Motivation

Noticed the parameter for retry policies is currently missing from the dbt assets decorator.

How I Tested These Changes

Test suite, added test for the new attribute.

@askvinni askvinni changed the title added retry policy arg to dbt assets decorator added retry policy param to dbt assets decorator Jan 3, 2024
@rexledesma rexledesma self-requested a review January 3, 2024 17:48
Copy link
Contributor

@rexledesma rexledesma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for your feedback: I'm wondering if this is the right abstraction we should be exposing to our users to accomplish a "retry" in dbt.

To my knowledge, there are three cases in which a retry could be triggered:

  1. Syntax error
  2. Business logic error (e.g. a test assertion is failing)
  3. Connection flakiness

An existing Dagster retry policy accommodates (3), at the expense of (1) and (2). The entire materialization function will be run again, only for the user to encounter (1) and (2) again. (3) is accommodated, but ideally, the materialization function should only run from the point of failure (the flaky dbt model/test execution). Otherwise, a retry could potentially be incredibly expensive.

With the emergence of dbt retry (link), I think this retry is better served if users handle the retry on their own, in their decorated function. We should add documentation on how to accomplish this. This retry occurs from the point of failure, which alleviates the concerns about using the built-in Dagster retry policy.

@askvinni
Copy link
Contributor Author

askvinni commented Jan 3, 2024

Hey Rex, thanks for the comment. I was actually not aware of the dbt retry command. The purpose of this PR really was to solve flaky connections that we've been seeing recently. I agree that retrying with the command should be left to the users, so I'll just close this.

@askvinni askvinni closed this Jan 3, 2024
@rexledesma
Copy link
Contributor

@askvinni Great, I'll add some documentation on the dbt retry capability in Dagster.

@askvinni askvinni deleted the dbt-retry-policy branch January 3, 2024 18:43
@rexledesma
Copy link
Contributor

rexledesma commented Jan 4, 2024

Important

You'll need to be on dbt-core>=1.7.9, which adds --target-path support to dbt retry: https://github.com/dbt-labs/dbt-core/releases/tag/v1.7.9.

As a breadcrumb for anyone who see this pull request, I'm providing a small code snippet to add dbt retry logic in the existing decorator.

If the dbt command fails, we issue a dbt retry on exception. This dbt retry takes parameters from the previously failed command (e.g. manifest, dagster_dbt_translator, and target_path) so that the retry can use the previous command's dbt artifacts to execute properly.

from dataclasses import replace

from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets


@dbt_assets(manifest=dbt_manifest_path)
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    dbt_invocation = dbt.cli(["build"], context=context)
    try:
        yield from dbt_invocation.stream()
    except:
        dbt_retry_invocation = dbt.cli(
            ["retry"],
            manifest=dbt_invocation.manifest,
            dagster_dbt_translator=dbt_invocation.dagster_dbt_translator,
            target_path=dbt_invocation.target_path,
        )
        dbt_retry_invocation = replace(dbt_retry_invocation, context=context)
        
        yield from dbt_retry_invocation.stream()

@askvinni
Copy link
Contributor Author

Hey @rexledesma, I finally got around to trying this and sadly, it seems like the dbt retry command doesn't respect the target-path flag set. It's a known issue. Just wanted to flag this here in case someone else comes across the same thing.

@rexledesma
Copy link
Contributor

rexledesma commented Jan 10, 2024

@askvinni Did you try the code snippet that I provided above? It doesn't use the --target-path argument, but instead sets the DBT_TARGET_PATH env var programmatically.

It works on my machine:

❯ dbt --version
Core:
  - installed: 1.7.4
  - latest:    1.7.4 - Up to date!

Plugins:
  - duckdb:    1.7.0 - Up to date!

Running the following commands on a modified jaffle_shop with a failing test produces the expected retry: the retry starts from the failed test, which fails again.

jaffle_shop on  main [!?] 🐍 (dagster) took 3s
❯ DBT_TARGET_PATH=target/new-path dbt build
14:52:07  Running with dbt=1.7.4
14:52:08  Registered adapter: duckdb=1.6.0
14:52:08  Unable to do partial parsing because saved manifest not found. Starting full parse.
14:52:09  Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 391 macros, 0 groups, 0 semantic models
14:52:09
14:52:09  Concurrency: 24 threads (target='dev')
14:52:09
14:52:09  1 of 28 START seed file main.raw_customers ..................................... [RUN]
14:52:09  2 of 28 START seed file main.raw_orders ........................................ [RUN]
14:52:09  3 of 28 START seed file main.raw_payments ...................................... [RUN]
14:52:09  3 of 28 OK loaded seed file main.raw_payments .................................. [INSERT 113 in 0.06s]
14:52:09  2 of 28 OK loaded seed file main.raw_orders .................................... [INSERT 99 in 0.07s]
14:52:09  1 of 28 OK loaded seed file main.raw_customers ................................. [INSERT 100 in 0.07s]
14:52:09  4 of 28 START sql view model main.stg_payments ................................. [RUN]
14:52:09  5 of 28 START sql view model main.stg_orders ................................... [RUN]
14:52:09  6 of 28 START sql view model main.stg_customers ................................ [RUN]
14:52:09  5 of 28 OK created sql view model main.stg_orders .............................. [OK in 0.07s]
14:52:09  7 of 28 START test accepted_values_stg_orders_status__nope ..................... [RUN]
14:52:09  8 of 28 START test not_null_stg_orders_order_id ................................ [RUN]
14:52:09  6 of 28 OK created sql view model main.stg_customers ........................... [OK in 0.07s]
14:52:09  9 of 28 START test unique_stg_orders_order_id .................................. [RUN]
14:52:09  4 of 28 OK created sql view model main.stg_payments ............................ [OK in 0.08s]
14:52:09  10 of 28 START test not_null_stg_customers_customer_id ......................... [RUN]
14:52:09  11 of 28 START test unique_stg_customers_customer_id ........................... [RUN]
14:52:09  12 of 28 START test accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card  [RUN]
14:52:09  13 of 28 START test not_null_stg_payments_payment_id ........................... [RUN]
14:52:09  14 of 28 START test unique_stg_payments_payment_id ............................. [RUN]
14:52:09  8 of 28 PASS not_null_stg_orders_order_id ...................................... [PASS in 0.08s]
14:52:09  7 of 28 FAIL 5 accepted_values_stg_orders_status__nope ......................... [FAIL 5 in 0.08s]
14:52:09  9 of 28 PASS unique_stg_orders_order_id ........................................ [PASS in 0.08s]
14:52:09  11 of 28 PASS unique_stg_customers_customer_id ................................. [PASS in 0.07s]
14:52:09  10 of 28 PASS not_null_stg_customers_customer_id ............................... [PASS in 0.07s]
14:52:09  13 of 28 PASS not_null_stg_payments_payment_id ................................. [PASS in 0.07s]
14:52:09  12 of 28 PASS accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card  [PASS in 0.07s]
14:52:09  14 of 28 PASS unique_stg_payments_payment_id ................................... [PASS in 0.07s]
14:52:09  15 of 28 SKIP relation main.customers .......................................... [SKIP]
14:52:09  16 of 28 SKIP relation main.orders ............................................. [SKIP]
14:52:09  17 of 28 SKIP test not_null_customers_customer_id .............................. [SKIP]
14:52:09  18 of 28 SKIP test unique_customers_customer_id ................................ [SKIP]
14:52:09  19 of 28 SKIP test accepted_values_orders_status__placed__shipped__completed__return_pending__returned  [SKIP]
14:52:09  20 of 28 SKIP test not_null_orders_amount ...................................... [SKIP]
14:52:09  21 of 28 SKIP test not_null_orders_bank_transfer_amount ........................ [SKIP]
14:52:09  22 of 28 SKIP test not_null_orders_coupon_amount ............................... [SKIP]
14:52:09  23 of 28 SKIP test not_null_orders_credit_card_amount .......................... [SKIP]
14:52:09  24 of 28 SKIP test not_null_orders_customer_id ................................. [SKIP]
14:52:09  25 of 28 SKIP test not_null_orders_gift_card_amount ............................ [SKIP]
14:52:09  26 of 28 SKIP test not_null_orders_order_id .................................... [SKIP]
14:52:09  27 of 28 SKIP test relationships_orders_customer_id__customer_id__ref_customers_  [SKIP]
14:52:09  28 of 28 SKIP test unique_orders_order_id ...................................... [SKIP]
14:52:09
14:52:09  Finished running 3 seeds, 3 view models, 20 tests, 2 table models in 0 hours 0 minutes and 0.31 seconds (0.31s).
14:52:09
14:52:09  Completed with 1 error and 0 warnings:
14:52:09
14:52:09  Failure in test accepted_values_stg_orders_status__nope (models/staging/schema.yml)
14:52:09    Got 5 results, configured to fail if != 0
14:52:09
14:52:09    compiled Code at target/new-path/compiled/jaffle_shop/models/staging/schema.yml/accepted_values_stg_orders_status__nope.sql
14:52:09
14:52:09  Done. PASS=13 WARN=0 ERROR=1 SKIP=14 TOTAL=28

jaffle_shop on  main [!?] 🐍 (dagster) took 3s
❯ DBT_TARGET_PATH=target/new-path dbt retry
14:52:15  Running with dbt=1.7.4
14:52:15  Registered adapter: duckdb=1.6.0
14:52:15  Warning: The state and target directories are the same: 'target'. This could lead to missing changes due to overwritten state including non-idempotent retries.
14:52:15  Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 391 macros, 0 groups, 0 semantic models
14:52:15
14:52:15  Concurrency: 24 threads (target='dev')
14:52:15
14:52:15  1 of 15 START test accepted_values_stg_orders_status__nope ..................... [RUN]
14:52:15  1 of 15 FAIL 5 accepted_values_stg_orders_status__nope ......................... [FAIL 5 in 0.03s]
14:52:15  2 of 15 SKIP relation main.customers ........................................... [SKIP]
14:52:15  3 of 15 SKIP relation main.orders .............................................. [SKIP]
14:52:15  4 of 15 SKIP test not_null_customers_customer_id ............................... [SKIP]
14:52:15  5 of 15 SKIP test unique_customers_customer_id ................................. [SKIP]
14:52:15  6 of 15 SKIP test accepted_values_orders_status__placed__shipped__completed__return_pending__returned  [SKIP]
14:52:15  7 of 15 SKIP test not_null_orders_amount ....................................... [SKIP]
14:52:15  8 of 15 SKIP test not_null_orders_bank_transfer_amount ......................... [SKIP]
14:52:15  9 of 15 SKIP test not_null_orders_coupon_amount ................................ [SKIP]
14:52:15  10 of 15 SKIP test not_null_orders_credit_card_amount .......................... [SKIP]
14:52:15  11 of 15 SKIP test not_null_orders_customer_id ................................. [SKIP]
14:52:15  12 of 15 SKIP test not_null_orders_gift_card_amount ............................ [SKIP]
14:52:15  13 of 15 SKIP test not_null_orders_order_id .................................... [SKIP]
14:52:15  14 of 15 SKIP test relationships_orders_customer_id__customer_id__ref_customers_  [SKIP]
14:52:15  15 of 15 SKIP test unique_orders_order_id ...................................... [SKIP]
14:52:15
14:52:15  Finished running 13 tests, 2 table models in 0 hours 0 minutes and 0.09 seconds (0.09s).
14:52:15
14:52:15  Completed with 1 error and 0 warnings:
14:52:15
14:52:15  Failure in test accepted_values_stg_orders_status__nope (models/staging/schema.yml)
14:52:15    Got 5 results, configured to fail if != 0
14:52:15
14:52:15    compiled Code at target/compiled/jaffle_shop/models/staging/schema.yml/accepted_values_stg_orders_status__nope.sql
14:52:15
14:52:15  Done. PASS=0 WARN=0 ERROR=1 SKIP=14 TOTAL=15

@askvinni
Copy link
Contributor Author

askvinni commented Jan 12, 2024

@rexledesma this might be something that's fixed in 1.7.*, and we're running 1.6.9. Either way, my solution was to copy the run_results.json into the target/ folder, that way the retry did manage to read it. It's odd behavior and the workaround isn't exactly pretty, but it's good enough.

@Baksbany22
Copy link

@askvinni Can you give a more detailed description of how you solved this problem?

@askvinni
Copy link
Contributor Author

@Baksbany22 dbt has a target folder where it generates its manifest.json and other files, usually called target/ from the root where your dbt_project.yaml file resides. Dagster's default behavior is to create a subfolder within the target/ folder with a UUID (e.g. target/my_dbt_assets-12345) for each CLI invocation. That's the folder the run_results.json, dbt.log, and other files are created in. My woraround is essentially to copy the target/my_dbt_assets-12345/run_results.json file into target/run_results.json, that way the dbt retry command did manage to find the file and execute properly.

@toddy86
Copy link
Contributor

toddy86 commented Feb 14, 2024

@rexledesma I have tried this as well with the same result as @askvinni

@rexledesma
Copy link
Contributor

Are you on dbt-core==1.6.*? Is it possible for you to upgrade to dbt-core==1.7.*?

If not, could you try Vinni's workaround?

@garethbrickman garethbrickman added the integration: dbt Related to dagster-dbt label Feb 14, 2024
@Baksbany22
Copy link

Baksbany22 commented Feb 15, 2024

@toddy86 There are two ways to solve this problem:

  1. When calling dbt.cli, specify target_path. Then each launch of dbt models will not create a new directory: command = dbt.cli(["build"], context=context, target_path = dbt_target_path)

  2. The second solution to the problem is shown above. You need to copy run_results from the created directory to the "target" directory. It works fine on my local machine, but on our server we got the error: "source_file does not exist". Solved with time.sleep:

@dbt_assets(manifest=dbt_manifest_path)
def test_bi_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    command = dbt.cli(["build"], context=context) #, target_path = dbt_target_path)
    try:
        yield from command.stream()
    except:
        time.sleep(10)
        source_file = os.path.join(command.target_path, "run_results.json")
        if os.path.exists(source_file):
            shutil.copy(source_file, r'/opt/dagster/app/dbt_project/target/')
        else:
            raise Exception(f'source_file does not exists')
        yield from dbt.cli(
            ["retry"],
            manifest=command.manifest,
            dagster_dbt_translator=command.dagster_dbt_translator,
            target_path=command.target_path,
        ).stream()

@toddy86
Copy link
Contributor

toddy86 commented Feb 19, 2024

Are you on dbt-core==1.6.*? Is it possible for you to upgrade to dbt-core==1.7.*?

If not, could you try Vinni's workaround?

I’m on dbt 1.7.x.

It isn’t critical for us to have these retries on individual dbt assets. So we are testing just brute forcing this with a job level retry. Which I haven’t tested yet, but I’m presuming the job level retry will only pick up the failed dbt assets (perhaps a wrong assumption and we shall see).

@the4thamigo-uk
Copy link
Contributor

Also hitting this issue : noticed this
dbt-labs/dbt-core@d1e400e

@the4thamigo-uk
Copy link
Contributor

@toddy86
Copy link
Contributor

toddy86 commented Mar 3, 2024

https://github.com/dbt-labs/dbt-core/releases/tag/v1.7.9

Can confirm this is now working as expected after bumping to dbt v.1.7.9

@rexledesma
Copy link
Contributor

Thanks for confirming the fix @the4thamigo-uk and @toddy86. I've updated #18990 (comment) with a disclaimer to be on dbt-core>=1.7.9.

@toddy86
Copy link
Contributor

toddy86 commented Mar 8, 2024

@rexledesma There is a hidden gremlin in using the dbt retry command if you aren't on the look for it.

If you have a job which splits the dbt asset materializations into multiple steps (e.g. a job with partitioned and non-partitioned assets), and the parent task initially fails and retries, then some of the downstream dbt tasks can be skipped as Dagster interprets the upstream assets as being skipped.

Initial run where some models succeeded, but others failed
Screenshot 2024-03-08 at 1 11 08 pm

The dbt retry kicks in and all failed and skipped models are successfully built on the second try
Screenshot 2024-03-08 at 1 14 15 pm

Downstream dbt steps are incorrectly skipped, as assets successfully materialized in the retry run are incorrectly labelled as skipped

Screenshot 2024-03-08 at 1 16 53 pm

@rexledesma
Copy link
Contributor

rexledesma commented Mar 8, 2024

@toddy86 Are you yielding Dagster events from the dbt retry? You need to be doing this. Just want to sanity check that you're calling yield from!

@dbt_assets(manifest=dbt_manifest_path)
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    dbt_invocation = dbt.cli(["build"], context=context)
    try:
        yield from dbt_invocation.stream()
    except:
+       yield from dbt.cli(
            ["retry"],
            manifest=dbt_invocation.manifest,
            dagster_dbt_translator=dbt_invocation.dagster_dbt_translator,
            target_path=dbt_invocation.target_path,
        ).stream()

@toddy86
Copy link
Contributor

toddy86 commented Mar 9, 2024

@rexledesma yep, we are yielding. Full dbt asset code below (we have a thin wrapper around the dbt_assets)

def build_dbt_assets(  # noqa: PLR0913
    select: str = "fqn:*",
    exclude: str = "",
    mode: str = "build",
    name: Optional[str] = None,
    partitions_def: Optional[PartitionsDefinition] = None,
    backfill_policy: Optional[BackfillPolicy] = None,
    dbt_retry: bool = False,
) -> list[AssetsDefinition]:

    _exclude = exclude + " tag:exclude_dagster"
    @dbt_assets(
        name=name,
        manifest=dbt_manifest_path,
        select=select,
        exclude=_exclude,
        partitions_def=partitions_def,
        backfill_policy=backfill_policy,
        dagster_dbt_translator=CustomDagsterDbtTranslator(
            settings=DagsterDbtTranslatorSettings(enable_asset_checks=True),
        ),
    )
    def _assets(
        context: OpExecutionContext,
    ):
        dbt_build_args = [mode]
        if partitions_def:
            dbt_vars = {
                "start_date": context.partition_key_range.start,
                "end_date": context.partition_key_range.end,
            }
            dbt_build_args.extend(["--vars", json.dumps(dbt_vars)])

        command = dbt_resource.cli(dbt_build_args, context=context)
        try:
            yield from command.stream()
        except:  # noqa: E722
            if dbt_retry:
                yield from dbt_resource.cli(
                    ["retry"],
                    manifest=command.manifest,
                    dagster_dbt_translator=command.dagster_dbt_translator,
                    target_path=command.target_path,
                ).stream()
            else:
                raise
    return [_assets]

@rexledesma
Copy link
Contributor

rexledesma commented Mar 11, 2024

@toddy86 Ah, I think this is because we need to pass context into the dbt retry so that it generates Output events instead of AssetMaterialization events. However, once we do this, we'll also try to add the subsetting selection arguments to dbt retry, which is not ideal.

Here's a workaround (under test in #20395) to ensure that the dbt invocation doesn't have the subsetting arguments, but the emitted events still use the context argument:

+ from dataclasses import replace

from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets


@dbt_assets(manifest=dbt_manifest_path)
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    dbt_invocation = dbt.cli(["build"], context=context)
    try:
        yield from dbt_invocation.stream()
    except:
+       dbt_retry_invocation = dbt.cli(
+           ["retry"],
+           manifest=dbt_invocation.manifest,
+           dagster_dbt_translator=dbt_invocation.dagster_dbt_translator,
+           target_path=dbt_invocation.target_path,
+       )
+       dbt_retry_invocation = replace(dbt_retry_invocation, context=context)
+       
+       yield from dbt_retry_invocation.stream()

On my end, I'll see if I can have a fix out so we don't need to call replace.

rexledesma added a commit that referenced this pull request Mar 11, 2024
…vents (#20395)

## Summary & Motivation
Put
#18990 (comment)
under test.

## How I Tested These Changes
pytest
@toddy86
Copy link
Contributor

toddy86 commented Mar 14, 2024

Thanks @rexledesma. I’m on leave for a few weeks, but I’ll give this a try once I’m back.

PedramNavid pushed a commit that referenced this pull request Mar 28, 2024
…vents (#20395)

## Summary & Motivation
Put
#18990 (comment)
under test.

## How I Tested These Changes
pytest
@the4thamigo-uk
Copy link
Contributor

Hi @rexledesma, I am trying to get this to work. I am seeing the following though :

dagster._core.errors.DagsterInvariantViolationError: Compute for op "dbt_clickhouse_unpartitioned" returned an output "staging__stg_my_test__validation" multiple times

Stack Trace:
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_plan.py", line 282, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
,  File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 523, in core_dagster_event_sequence_for_step
    for user_event in _step_output_error_checked_user_event_sequence(
,  File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 222, in _step_output_error_checked_user_event_sequence
    raise DagsterInvariantViolationError(

In this case a dbt test failed causing a dbt retry which failed with the above error. I am replacing the context as in your example. Do you see a similar issue?

@rexledesma
Copy link
Contributor

@the4thamigo-uk I assume you're modeling your dbt tests as Dagster asset checks (cc @johannkm)

If that's the case, then what happened is:

  1. Your dbt invocation emitted AssetCheckResult's, with passed=False to represent a failed dbt test. Call this test A.
  2. This caused the dbt invocation to fail, preparing it for a retry.
  3. In the retry, the failed test A was retried, and a new AssetCheckResult event was emitted for it. However, an event for A had already been emitted.

If you want to do this retry scheme with Dagster asset checks, you'll need to ensure that the failed tests in (1) are not emitted in the event stream. Only the final result from the dbt retry should be emitted.

@emirkmo
Copy link

emirkmo commented Jun 7, 2024

@rexledesma Out of curiosity would calling dbt retry instead of the actual dbt command work if say context.retry_number != 0? I am especially thinking of when yielding AssetCheckResults for dbt tests (currently we hold back failed check results until the last retry) and trying to integrate dbt retry with Dagster retries.

Something like:

@dbt_assets(...)
def dbt_assets(context: AssetExecutionContext):


    dbt_command = "build" if context.retry_number == 0 else "retry"
    dbt_cli_invocation = dbt.cli([dbt_command], context=context, raise_on_error=True)
    try:
        yield from dbt_cli_invocation.stream():
    except DagsterDbtCliRuntimeError as err:
        raise RetryRequested(max_retries=1, seconds_to_wait=300) from err

Currently we do

@dbt_assets(...)
def dbt_assets(context: AssetExecutionContext):

    dbt_cli_invocation = dbt.cli(["build"], context=context, raise_on_error=True)
    failed_test_events = {}
    try:
        for dagster_event in dbt_cli_invocation.stream():
            if isinstance(dagster_event, AssetCheckResult) and not dagster_event.passed:
                failed_test_events[dagster_event.check_name] = dagster_event
                continue
            yield dagster_event
        if failed_test_events:
            # Only some failed tests, if something else failed, it would have already raised before getting here.
            raise DagsterDbtCliRuntimeError(description="failed_tests")
    except DagsterDbtCliRuntimeError as err:
 
        # Save run_results before retry potentially overwrites it.
        build_run_results = dbt_cli_invocation.get_artifact("run_results.json")

        dbt_retry_invocation = dbt.cli(
            ["retry"],
            manifest=dbt_cli_invocation.manifest,
            dagster_dbt_translator=dbt_cli_invocation.dagster_dbt_translator,
            target_path=dbt_cli_invocation.target_path,
        )
        dbt_retry_invocation = replace(dbt_retry_invocation, context=context)

        # (Technically you can add another try/catch and invoke builtin Dagster retry in case the issue is Network related.)
        yield from dbt_retry_invocation.stream()

Since we deploy on K8s with docker run_results etc. are not overwritten by another run (and I know nowadayas dbt_assets saves to unique target folder anyway) so dbt_retry can be ran just fine even if other dbt commands ran in between the retry.

@rexledesma
Copy link
Contributor

Out of curiosity would calling dbt retry instead of the actual dbt command work if say context.retry_number != 0?

Some thoughts that come to mind:

  • You would need to ensure that your dbt retry references the target path containing the previous dbt invocation's artifacts (e.g. run_results.json).
  • You should ensure that the artifacts persist across retries.

@lokofoko
Copy link

Hi @rexledesma. To better understand how this works, please explain if using your snippet, will run retry on the first run of the code. Because I would expect to have it run only when I re-execute my job from failure, not just regularly.

I am not sure how dagster parses that function, but from plain python side it looks like retry would be executed always whenever there is an error and I don't understand the use case for this.

@rexledesma
Copy link
Contributor

@lokofoko See #18990 (review) on what is happening here.

The point is that we are doing a dbt retry without needing to invoke do a re-execution from failure, because if the initial run failed because of connection flakiness, you can just do the retry within the same run. You shouldn't need to spin up a new run.

@the4thamigo-uk
Copy link
Contributor

@the4thamigo-uk I assume you're modeling your dbt tests as Dagster asset checks (cc @johannkm)

If that's the case, then what happened is:

  1. Your dbt invocation emitted AssetCheckResult's, with passed=False to represent a failed dbt test. Call this test A.
  2. This caused the dbt invocation to fail, preparing it for a retry.
  3. In the retry, the failed test A was retried, and a new AssetCheckResult event was emitted for it. However, an event for A had already been emitted.

If you want to do this retry scheme with Dagster asset checks, you'll need to ensure that the failed tests in (1) are not emitted in the event stream. Only the final result from the dbt retry should be emitted.

@rexledesma Yes I think this is what happened. Can you provide an example of how to do this? Perhaps we need a final canonical example posted in this issue, or ideally in the docs?

@G14rb
Copy link

G14rb commented Nov 14, 2024

@the4thamigo-uk to materialize AssetObservation instead of AssetCheckResult you have to set the settings property of DagsterDbtTranslator enable_asset_checks to False

from dataclasses import replace

from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets, DagsterDbtTranslatorSettings

dagster_dbt_translator = DagsterDbtTranslator(
    settings=DagsterDbtTranslatorSettings(enable_asset_checks=False)
)

@dbt_assets(manifest=dbt_manifest_path, dagster_dbt_translator=dagster_dbt_translator)
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    dbt_invocation = dbt.cli(["build"], context=context)
    try:
        yield from dbt_invocation.stream()
    except:
        dbt_retry_invocation = dbt.cli(
            ["retry"],
            manifest=dbt_invocation.manifest,
            dagster_dbt_translator=dbt_invocation.dagster_dbt_translator,
            target_path=dbt_invocation.target_path,
        )
        dbt_retry_invocation = replace(dbt_retry_invocation, context=context)
        
        yield from dbt_retry_invocation.stream()

@the4thamigo-uk
Copy link
Contributor

the4thamigo-uk commented Nov 20, 2024

@the4thamigo-uk to materialize AssetObservation instead of AssetCheckResult you have to set the settings property of DagsterDbtTranslator enable_asset_checks to False

Thanks for the code above. However this means we never generate AssetCheckResult, but I thought you were suggesting in the earlier post that we should still emit it, but only once, either in the first invocation (if it succeeds), or the retry if it repeatedly fails.

you'll need to ensure that the failed tests in (1) are not emitted in the event stream. Only the final result from the dbt retry should be emitted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration: dbt Related to dagster-dbt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants