Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix rendering dbt tests with multiple parents (#1433)
If these two circumstances are met: 1. The dbt project has tests that rely on multiple parent models and; 2. The `DbtDag` or `DbtTaskGroup` use `TestBehavior.AFTER_EACH` (default) or `TestBehavior.BUILD` Cosmos 1.8.0 and previous versions would attempt to run the same test multiple times after each parent model run, likely failing if any of the parents hadn't been run yet. This PR aims to fix this behaviour by not running tests with multiple dependencies within each task group / build task - and by adding those tests to run only once and after all parents have run. # Related issues Closes: #978 Closes: #1365 This change also sets the ground for adding support to tests that don't have any dependencies, a problem discussed in the following tickets: * #959 * #1242 * #1279 # How to reproduce There are two steps to reproduce this problem: 1. To create a representative dbt project 2. To create a Cosmos `DbtDag` that uses this dbt project to reproduce the original problem ## Representative dbt project We created a dbt project named `multiple_parents_test` that has a test called`custom_test_combined_model` that depends on two models: * combined_model * model_a The expectation from a user perspective is that, since the `combined_model` depends on `model_a`, that the `multiple_parents_test` will only be run after both models were run, once. Definitions of the test: ``` {% test custom_test_combined_model(model) %} WITH source_data AS ( SELECT id FROM {{ ref('model_a') }} ), combined_data AS ( SELECT id FROM {{ model }} ) SELECT s.id FROM source_data s LEFT JOIN combined_data c ON s.id = c.id WHERE c.id IS NULL {% endtest %} ``` By running the following `dbt build` command, we confirm that the test depends on both models: ``` dbt build --select "+custom_test_combined_model_combined_model_" 11:59:29 Running with dbt=1.8.2 11:59:29 Registered adapter: postgres=1.8.1 11:59:29 Found 3 models, 6 data tests, 414 macros 11:59:29 11:59:30 Concurrency: 4 threads (target='dev') 11:59:30 11:59:30 1 of 9 START sql view model public.model_a ..................................... [RUN] 11:59:30 2 of 9 START sql view model public.model_b ..................................... [RUN] 11:59:30 1 of 9 OK created sql view model public.model_a ................................ [CREATE VIEW in 0.18s] 11:59:30 2 of 9 OK created sql view model public.model_b ................................ [CREATE VIEW in 0.18s] 11:59:30 3 of 9 START test unique_model_a_id ............................................ [RUN] 11:59:30 4 of 9 START test unique_model_b_id ............................................ [RUN] 11:59:30 4 of 9 PASS unique_model_b_id .................................................. [PASS in 0.05s] 11:59:30 3 of 9 PASS unique_model_a_id .................................................. [PASS in 0.06s] 11:59:30 5 of 9 START sql view model public.combined_model .............................. [RUN] 11:59:30 5 of 9 OK created sql view model public.combined_model ......................... [CREATE VIEW in 0.03s] 11:59:30 6 of 9 START test custom_test_combined_model_combined_model_ ................... [RUN] 11:59:30 7 of 9 START test not_null_combined_model_created_at ........................... [RUN] 11:59:30 8 of 9 START test not_null_combined_model_id ................................... [RUN] 11:59:30 9 of 9 START test not_null_combined_model_name ................................. [RUN] 11:59:30 7 of 9 PASS not_null_combined_model_created_at ................................. [PASS in 0.07s] 11:59:30 9 of 9 PASS not_null_combined_model_name ....................................... [PASS in 0.07s] 11:59:30 8 of 9 PASS not_null_combined_model_id ......................................... [PASS in 0.07s] 11:59:30 6 of 9 PASS custom_test_combined_model_combined_model_ ......................... [PASS in 0.08s] 11:59:30 11:59:30 Finished running 3 view models, 6 data tests in 0 hours 0 minutes and 0.50 seconds (0.50s). 11:59:30 11:59:30 Completed successfully 11:59:30 11:59:30 Done. PASS=9 WARN=0 ERROR=0 SKIP=0 TOTAL=9 ``` This is what the pipeline topology looks like: <img width="1020" alt="Screenshot 2024-12-27 at 11 39 31" src="https://github.com/user-attachments/assets/d8a8e628-2fd7-4959-b13f-3d289e7250ed" /> The source code structure for this dbt project: ``` ├── dbt_project.yml ├── macros │ └── custom_test_combined_model.sql ├── models │ ├── combined_model.sql │ ├── model_a.sql │ ├── model_b.sql │ └── schema.yml └── profiles.yml ``` When running `dbt ls`, it displays: ``` dbt ls 11:40:58 Running with dbt=1.8.2 11:40:58 Registered adapter: postgres=1.8.1 11:40:58 Unable to do partial parsing because saved manifest not found. Starting full parse. 11:40:59 [WARNING]: Deprecated functionality The `tests` config has been renamed to `data_tests`. Please see https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more information. 11:40:59 Found 3 models, 6 data tests, 414 macros my_dbt_project.combined_model my_dbt_project.model_a my_dbt_project.model_b my_dbt_project.custom_test_combined_model_combined_model_ my_dbt_project.not_null_combined_model_created_at my_dbt_project.not_null_combined_model_id my_dbt_project.not_null_combined_model_name my_dbt_project.unique_model_a_id my_dbt_project.unique_model_b_id ``` ## Behavior in Cosmos The DAG `example_multiple_parents_test` uses this new dbt project: ``` import os from datetime import datetime from pathlib import Path from cosmos import DbtDag, ProfileConfig, ProjectConfig from cosmos.profiles import PostgresUserPasswordProfileMapping DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt" DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH)) profile_config = ProfileConfig( profile_name="default", target_name="dev", profile_mapping=PostgresUserPasswordProfileMapping( conn_id="example_conn", profile_args={"schema": "public"}, disable_event_tracking=True, ), ) example_multiple_parents_test = DbtDag( # dbt/cosmos-specific parameters project_config=ProjectConfig( DBT_ROOT_PATH / "multiple_parents_test", ), profile_config=profile_config, # normal dag parameters start_date=datetime(2023, 1, 1), dag_id="example_multiple_parents_test", ) ``` When trying to run it using: ``` airflow dags test example_multiple_parents_test ``` Users face the original error because the test is being attempted to be run after `model_a` was run but before `combined_model` is run: <img width="861" alt="Screenshot 2024-12-27 at 12 10 36" src="https://github.com/user-attachments/assets/33ea7b71-ba49-4418-b194-4d3590fff1b8" /> Excerpt from the logs of the failing task: ``` [2024-12-27T12:07:33.564+0000] {taskinstance.py:2905} ERROR - Task failed with exception Traceback (most recent call last): File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable return execute_callable(context=context, **execute_callable_kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venvpy39/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 401, in wrapper return func(self, *args, **kwargs) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 796, in execute result = self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 654, in build_and_run_cmd result = self.run_command(cmd=dbt_cmd, env=env, context=context) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 509, in run_command self.handle_exception(result) File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 237, in handle_exception_dbt_runner raise AirflowException(f"dbt invocation completed with errors: {error_message}") airflow.exceptions.AirflowException: dbt invocation completed with errors: custom_test_combined_model_combined_model_: Database Error in test custom_test_combined_model_combined_model_ (models/schema.yml) relation "public.combined_model" does not exist LINE 12: SELECT id FROM "postgres"."public"."combined_model" ^ compiled Code at target/run/my_dbt_project/models/schema.yml/custom_test_combined_model_combined_model_.sql ``` ## Behaviour after this change With this change, when running the DAG mentioned above, it results in: <img width="1264" alt="Screenshot 2024-12-27 at 15 44 17" src="https://github.com/user-attachments/assets/e0395a4d-dbae-4b63-a3c3-69ca79ad0b04" /> And it can successfully be run. ## Breaking Change? This PR slightly changes the behaviour of Cosmos DAG rendering when using `TestBeahavior.AFTER_EACH` or `TestBeahavior.BUILD` when there are tests with multiple parents. Some may consider it a breaking change, but a bug fix is a better classification since Cosmos did not support rendering many dbt projects that met these circumstances. The behaviour change in those cases is that we're isolating tests that depend on multiple parents and running them outside of the `TestBehaviour.AFTER_EACH` dbt node Cosmos TaskGroup or `TestBehaviour.BUILD`. This change will likely highlight any tests that depended on multiple models and were not failing previously but running as part of the tests of both models.
- Loading branch information