Create `skip_nodes_if_on_run_start_fails` behavior change flag #10699

aranke · 2024-09-11T18:56:53Z

Resolves #7387

Problem

If there is a failure in an on-run-start hook, dbt should skip all selected resources from running. This is not true right now.
If there is a failure in an on-run-end hook, dbt should still write run_results.json for all previous results. This is true right now, and the GAP team is counting on it remaining true.

Solution

I'm documenting how this PR is currently implemented, open to questions/comments/concerns.

Behavior

on-run-start/end hooks are run in serial (--threads 1)
If a hook fails, subsequent hooks of the same type (start/end) are skipped (i.e., hooks “fail fast”)
Hook status is now saved in run results instead of panicking and exiting immediately
This status is written to run_results.json at the end of the run
Create a skip_nodes_if_on_run_start_fails project config flag
- If the flag is set and any on-run-start hook fails, mark all selected nodes as skipped
on-run-start/end hooks are always run, regardless of whether they passed or failed last time
- The exception here is if the source_freshness_run_project_hooks is set to False
Cosmetic changes to log output to make it more useful

Implementation

Modify the type signature of before_run to return a RunStatus instead of None so we can implement some railway-oriented programming
Combine safe_run_hooks and run_hooks into a single method to make logic easier to follow
- Create a new get_execution_status method to encapsulate RunStatus logic
Add HookNode to ResultNode union type
Change some existing tests to account for hooks now being part of run results (1 per hook) and new logging
Rewrite test_run_hooks → test_on_run_hooks with the following tests:
- Start hook fails, flag is None/False/True
- End hook fails, logs and run_results.json are still surfaced
- No hooks run if zero nodes selected
Add a new test for retry to run hooks even if they succeeded last time

Checklist

I have read the contributing guide and understand what's expected of me.
I have run this code in development, and it appears to resolve the stated issue.
This PR includes tests, or tests are not required or relevant for this PR.
This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
This PR includes type annotations for new and modified functions.

github-actions · 2024-09-11T18:57:11Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2024-09-11T19:02:51Z

Codecov Report

Attention: Patch coverage is 98.14815% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.17%. Comparing base (5e3d418) to head (9accf08).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #10699   +/-   ##
=======================================
  Coverage   89.16%   89.17%           
=======================================
  Files         183      183           
  Lines       23258    23288   +30     
=======================================
+ Hits        20738    20767   +29     
- Misses       2520     2521    +1

Flag	Coverage Δ
integration	`86.44% <96.29%> (-0.02%)`	⬇️
unit	`62.19% <23.14%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`62.19% <23.14%> (-0.10%)`	⬇️
Integration Tests	`86.44% <96.29%> (-0.02%)`	⬇️

core/dbt/task/runnable.py

core/dbt/task/run.py

jtcohen6 · 2024-09-24T22:48:44Z

@aranke Behavior flag looks good, as does the "happy path" for skipping downstream nodes when it's enabled. When the behavior flag is not set / disabled, it works as before and the logs are even a bit nicer 👍

I found a few corner cases around dbt retry. Here's a reproduction case:

Create a project with three on-run-start hooks and two models
Put a syntax error in the second on-run-start hook (e.g. select column_does_not_exist)
dbt run
dbt retry - notice that the node numbering isn't right (models say they're 2 of 4 / 4 of 4)
Fix the syntax error
dbt retry - dbt successfully reruns the hooks, but then it also tries to materialize them as if they were models (?) using the default materialization (view)

# dbt_project.yml
on-run-start:
  - "select 1 as id"
  - "select column_does_not_exist"
  - "select 2 as id"

flags:
  skip_nodes_if_on_run_start_fails: True

The first run fails as expected, and skips models from running (good!):

% dbt run
22:44:11  Running with dbt=1.9.0-a1
22:44:11  Registered adapter: duckdb=1.8.3
22:44:11  Unable to do partial parsing because a project config has changed
22:44:11  Found 2 models, 3 operations, 4 data tests, 417 macros, 2 groups
22:44:11  
22:44:11  Concurrency: 1 threads (target='dev')
22:44:11  
22:44:11  1 of 3 START hook: testy.on-run-start.0 ........................................ [RUN]
22:44:11  1 of 3 OK hook: testy.on-run-start.0 ........................................... [OK in 0.01s]
22:44:11  2 of 3 START hook: testy.on-run-start.1 ........................................ [RUN]
22:44:11  2 of 3 ERROR hook: testy.on-run-start.1 ........................................ [ERROR in 0.01s]
22:44:11  3 of 3 SKIP hook: testy.on-run-start.2 ......................................... [SKIP in 0.00s]
22:44:11  
22:44:11  1 of 2 SKIP relation main.my_first_dbt_model ................................... [SKIP]
22:44:11  2 of 2 SKIP relation main.my_second_dbt_model .................................. [SKIP]
22:44:11  
22:44:11  Finished running 3 project hooks, 1 table model, 1 view model in 0 hours 0 minutes and 0.07 seconds (0.07s).
22:44:11  
22:44:11  Completed with 2 errors and 0 warnings:
22:44:11  
22:44:11    testy.on-run-start.1 failed, error:
 Binder Error: Referenced column "column_does_not_exist" not found in FROM clause!
LINE 2: select column_does_not_exist
               ^
22:44:11  
22:44:11  Done. PASS=1 WARN=0 ERROR=1 SKIP=3 TOTAL=5

Now retry, with weird numbering. dbt thinks it's running 4 total models (I think because the failed/skipped hooks are counted):

% dbt retry
22:44:15  Running with dbt=1.9.0-a1
22:44:15  Registered adapter: duckdb=1.8.3
22:44:15  Found 2 models, 3 operations, 4 data tests, 417 macros, 2 groups
22:44:15  
22:44:15  Concurrency: 1 threads (target='dev')
22:44:15  
22:44:15  1 of 3 START hook: testy.on-run-start.0 ........................................ [RUN]
22:44:15  1 of 3 OK hook: testy.on-run-start.0 ........................................... [OK in 0.01s]
22:44:15  2 of 3 START hook: testy.on-run-start.1 ........................................ [RUN]
22:44:15  2 of 3 ERROR hook: testy.on-run-start.1 ........................................ [ERROR in 0.01s]
22:44:15  3 of 3 SKIP hook: testy.on-run-start.2 ......................................... [SKIP in 0.00s]
22:44:15  
22:44:15  1 of 4 SKIP relation main.my_second_dbt_model .................................. [SKIP]
22:44:15  4 of 4 SKIP relation main.my_first_dbt_model ................................... [SKIP]
22:44:15  
22:44:15  Finished running 3 project hooks, 1 table model, 1 view model in 0 hours 0 minutes and 0.06 seconds (0.06s).
22:44:15  
22:44:15  Completed with 2 errors and 0 warnings:
22:44:15  
22:44:15    testy.on-run-start.1 failed, error:
 Binder Error: Referenced column "column_does_not_exist" not found in FROM clause!
LINE 2: select column_does_not_exist
               ^
22:44:15  
22:44:15  Done. PASS=1 WARN=0 ERROR=1 SKIP=3 TOTAL=5

Fix the syntax error in second hook, retry again. dbt will successfully run the hooks and build the models, but it also tries to build the hooks as models...

% dbt retry
22:44:33  Running with dbt=1.9.0-a1
22:44:33  Registered adapter: duckdb=1.8.3
22:44:33  Unable to do partial parsing because a project config has changed
22:44:34  Found 2 models, 3 operations, 4 data tests, 417 macros, 2 groups
22:44:34  
22:44:34  Concurrency: 1 threads (target='dev')
22:44:34  
22:44:34  1 of 3 START hook: testy.on-run-start.0 ........................................ [RUN]
22:44:34  1 of 3 OK hook: testy.on-run-start.0 ........................................... [OK in 0.01s]
22:44:34  2 of 3 START hook: testy.on-run-start.1 ........................................ [RUN]
22:44:34  2 of 3 OK hook: testy.on-run-start.1 ........................................... [OK in 0.01s]
22:44:34  3 of 3 START hook: testy.on-run-start.2 ........................................ [RUN]
22:44:34  3 of 3 OK hook: testy.on-run-start.2 ........................................... [OK in 0.01s]
22:44:34  
22:44:34  1 of 4 START sql table model main.my_first_dbt_model ........................... [RUN]
22:44:34  1 of 4 OK created sql table model main.my_first_dbt_model ...................... [OK in 0.04s]
22:44:34  2 of 4 START sql view model main.testy-on-run-start-1 .......................... [RUN]
22:44:34  2 of 4 ERROR creating sql view model main.testy-on-run-start-1 ................. [ERROR in 0.01s]
22:44:34  3 of 4 START sql view model main.testy-on-run-start-2 .......................... [RUN]
22:44:34  3 of 4 ERROR creating sql view model main.testy-on-run-start-2 ................. [ERROR in 0.00s]
22:44:34  4 of 4 START sql view model main.my_second_dbt_model ........................... [RUN]
22:44:34  4 of 4 OK created sql view model main.my_second_dbt_model ...................... [OK in 0.01s]
22:44:34  
22:44:34  Finished running 5 project hooks, 1 table model, 1 view model in 0 hours 0 minutes and 0.14 seconds (0.14s).
22:44:34  
22:44:34  Completed with 2 errors and 0 warnings:
22:44:34  
22:44:34    Compilation Error in operation testy-on-run-start-1 (./dbt_project.yml)
  'None' has no attribute 'database'
  
  > in macro materialization_view_default (macros/materializations/models/view.sql)
  > called by operation testy-on-run-start-1 (./dbt_project.yml)
22:44:34  
22:44:34    Compilation Error in operation testy-on-run-start-2 (./dbt_project.yml)
  'None' has no attribute 'database'
  
  > in macro materialization_view_default (macros/materializations/models/view.sql)
  > called by operation testy-on-run-start-2 (./dbt_project.yml)
22:44:34  
22:44:34  Done. PASS=5 WARN=0 ERROR=2 SKIP=0 TOTAL=7

aranke · 2024-09-27T14:21:23Z

@jtcohen6 Thanks for reviewing the behavior here, and good catch!
retry picked up these nodes because they're now recorded as “failed” in run_results.json.

I've tweaked the logic in retry to ignore operation nodes unless we're retrying run-operation, this seems to fix both the mis-numbering issue and the attempt to materialize a start hook.

I've also added a test for this case, but here are the new logs with the fix applied:

❯ dbt run
14:18:53  1 of 3 START hook: jaffle_shop.on-run-start.0 .................................. [RUN]
14:18:53  1 of 3 OK hook: jaffle_shop.on-run-start.0 ..................................... [OK in 0.01s]
14:18:53  2 of 3 START hook: jaffle_shop.on-run-start.1 .................................. [RUN]
14:18:53  2 of 3 ERROR hook: jaffle_shop.on-run-start.1 .................................. [ERROR in 0.02s]
14:18:53  3 of 3 SKIP hook: jaffle_shop.on-run-start.2 ................................... [SKIP in 0.00s]
14:18:53  
14:18:53  1 of 5 SKIP relation main.stg_orders ........................................... [SKIP]
14:18:53  2 of 5 SKIP relation main.orders ............................................... [SKIP]
14:18:53  3 of 5 SKIP relation main.stg_payments ......................................... [SKIP]
14:18:53  4 of 5 SKIP relation main.customers ............................................ [SKIP]
14:18:53  5 of 5 SKIP relation main.stg_customers ........................................ [SKIP]

❯ dbt retry
14:19:32  1 of 3 START hook: jaffle_shop.on-run-start.0 .................................. [RUN]
14:19:32  1 of 3 OK hook: jaffle_shop.on-run-start.0 ..................................... [OK in 0.01s]
14:19:32  2 of 3 START hook: jaffle_shop.on-run-start.1 .................................. [RUN]
14:19:32  2 of 3 ERROR hook: jaffle_shop.on-run-start.1 .................................. [ERROR in 0.02s]
14:19:32  3 of 3 SKIP hook: jaffle_shop.on-run-start.2 ................................... [SKIP in 0.00s]
14:19:32  
14:19:32  1 of 5 SKIP relation main.stg_customers ........................................ [SKIP]
14:19:32  2 of 5 SKIP relation main.orders ............................................... [SKIP]
14:19:32  3 of 5 SKIP relation main.stg_payments ......................................... [SKIP]
14:19:32  4 of 5 SKIP relation main.stg_orders ........................................... [SKIP]
14:19:32  5 of 5 SKIP relation main.customers ............................................ [SKIP]

Fix syntax error

❯ dbt retry
14:20:57  1 of 3 START hook: jaffle_shop.on-run-start.0 .................................. [RUN]
14:20:57  1 of 3 OK hook: jaffle_shop.on-run-start.0 ..................................... [OK in 0.01s]
14:20:57  2 of 3 START hook: jaffle_shop.on-run-start.1 .................................. [RUN]
14:20:57  2 of 3 OK hook: jaffle_shop.on-run-start.1 ..................................... [OK in 0.01s]
14:20:57  3 of 3 START hook: jaffle_shop.on-run-start.2 .................................. [RUN]
14:20:57  3 of 3 OK hook: jaffle_shop.on-run-start.2 ..................................... [OK in 0.02s]
14:20:57  
14:20:57  1 of 5 START sql view model main.stg_customers ................................. [RUN]
14:20:57  2 of 5 START sql view model main.stg_orders .................................... [RUN]
14:20:57  3 of 5 START sql view model main.stg_payments .................................. [RUN]
14:20:57  1 of 5 OK created sql view model main.stg_customers ............................ [OK in 0.07s]
14:20:57  3 of 5 OK created sql view model main.stg_payments ............................. [OK in 0.07s]
14:20:57  2 of 5 OK created sql view model main.stg_orders ............................... [OK in 0.08s]
14:20:57  4 of 5 START sql table model main.customers .................................... [RUN]
14:20:57  5 of 5 START sql table model main.orders ....................................... [RUN]
14:20:57  4 of 5 OK created sql table model main.customers ............................... [OK in 0.04s]
14:20:57  5 of 5 OK created sql table model main.orders .................................. [OK in 0.05s]

aranke · 2024-09-27T14:24:15Z

I found a few corner cases around dbt retry.

@jtcohen6 are there any other corner cases here I should fix before merging in this PR? If not, would appreciate a 👍🏾 on the PR.

Thanks once again for taking the time to review this PR right before the Coalesce crunch.

jtcohen6

@aranke Thanks for fixing! That's the only issue I found in my testing. This looks fit to ship 👍

wip: refactor safe_run_hooks and add more metadata in

c54e228

cla-bot bot added the cla:yes label Sep 11, 2024

Add back comment about log_contextvars

7ad6c39

aranke added 18 commits September 11, 2024 20:27

make printer correct

7fcfb0c

fix existing tests

064f60e

Merge branch 'main' into fix_7387

59f558c

only execute nodes if pre-hooks succeeded

526b31c

Add skip messages for nodes

12b7694

Merge branch 'main' into fix_7387

8694ea5

One result per hook

75f64d0

o1 refactor

900d1cd

Merge branch 'main' into fix_7387

235aaec

fix tests

6970291

Add flag

66997f2

Fix existing tests

498a015

Write tests for on-run-start hooks

512b252

Add test for on-run-end hooks

37c62ab

do message test off run_results.json

ad1d952

add no hooks ran test

71ead6a

Add test for retry to always run hooks

0dcc944

Add changie

66fbae6

aranke changed the title ~~wip: refactor safe_run_hooks and add more metadata in~~ Add new skip_nodes_if_on_run_start_fails to allow skipping nodes if on-run-start hooks fail Sep 18, 2024

Update Features-20240918-162959.yaml

4234f84

aranke changed the title ~~Add new skip_nodes_if_on_run_start_fails to allow skipping nodes if on-run-start hooks fail~~ Add new skip_nodes_if_on_run_start_fails behavior change flag Sep 18, 2024

Update Features-20240918-162959.yaml

5abd9fb

aranke changed the title ~~Add new skip_nodes_if_on_run_start_fails behavior change flag~~ Create skip_nodes_if_on_run_start_fails behavior change flag Sep 18, 2024

Change tests

9c1ff8a

aranke marked this pull request as ready for review September 18, 2024 16:29

aranke requested a review from a team as a code owner September 18, 2024 16:29

aranke added 2 commits September 18, 2024 17:59

Merge branch 'main' into fix_7387

46a5f2f

Merge branch 'main' into fix_7387

3a42c72

peterallenwebb reviewed Sep 19, 2024

View reviewed changes

core/dbt/task/runnable.py Outdated Show resolved Hide resolved

core/dbt/task/runnable.py Outdated Show resolved Hide resolved

core/dbt/task/run.py Show resolved Hide resolved

Change formatting order, rename variable

53ee509

aranke requested a review from peterallenwebb September 20, 2024 13:11

peterallenwebb approved these changes Sep 20, 2024

View reviewed changes

graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Sep 24, 2024

aranke added 6 commits September 26, 2024 16:09

Merge branch 'main' into fix_7387

5df400c

merge from main

2b46907

make imports absolute

2ad3dd2

ignore operations unless we're in run-operation

dbfaa22

Add test for fixing retry hook

d808204

merge from main

9accf08

aranke requested a review from jtcohen6 September 27, 2024 14:22

jtcohen6 approved these changes Sep 27, 2024

View reviewed changes

aranke merged commit 2ff3f20 into main Sep 30, 2024
65 of 66 checks passed

aranke deleted the fix_7387 branch September 30, 2024 12:53

aranke mentioned this pull request Oct 1, 2024

Fix hook tests dbt-labs/dbt-postgres#153

Merged

4 tasks

dbeatty10 mentioned this pull request Oct 1, 2024

[Core] skip_nodes_if_on_run_start_fails behavior change flag dbt-labs/docs.getdbt.com#6172

Closed

dbeatty10 mentioned this pull request Oct 22, 2024

[Bug] return in finally can swallow exceptions #10904

Open

2 tasks

jtcohen6 mentioned this pull request Dec 12, 2024

Create a no-op exposure runner #11082

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create `skip_nodes_if_on_run_start_fails` behavior change flag #10699

Create `skip_nodes_if_on_run_start_fails` behavior change flag #10699

aranke commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024

codecov bot commented Sep 11, 2024 •

edited

Loading

jtcohen6 commented Sep 24, 2024 •

edited

Loading

aranke commented Sep 27, 2024 •

edited

Loading

aranke commented Sep 27, 2024

jtcohen6 left a comment

Create skip_nodes_if_on_run_start_fails behavior change flag #10699

Create skip_nodes_if_on_run_start_fails behavior change flag #10699

Conversation

aranke commented Sep 11, 2024 • edited Loading

Problem

Solution

Behavior

Implementation

Checklist

github-actions bot commented Sep 11, 2024

codecov bot commented Sep 11, 2024 • edited Loading

Codecov Report

jtcohen6 commented Sep 24, 2024 • edited Loading

aranke commented Sep 27, 2024 • edited Loading

aranke commented Sep 27, 2024

jtcohen6 left a comment

Choose a reason for hiding this comment

Create `skip_nodes_if_on_run_start_fails` behavior change flag #10699

Create `skip_nodes_if_on_run_start_fails` behavior change flag #10699

aranke commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading

jtcohen6 commented Sep 24, 2024 •

edited

Loading

aranke commented Sep 27, 2024 •

edited

Loading