-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve errors occurring when dbt_project_path is str and partial support dbt_project_path=None #605
Conversation
👷 Deploy Preview for amazing-pothos-a3bca0 processing.
|
I'm not 100% confident with some of the changes i made downstream - assuming test coverage is all good we should be fine, however, given i'm still not 100% familiar with the codebase here some more detailed review may be required. |
I have just noticed the above errors in some tests - i will check those out shortly |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #605 +/- ##
==========================================
- Coverage 93.32% 93.29% -0.03%
==========================================
Files 54 53 -1
Lines 2128 2089 -39
==========================================
- Hits 1986 1949 -37
+ Misses 142 140 -2
☔ View full report in Codecov by Sentry. |
dbt_project_path
is None
or str
dbt_project_path
is None
or strdbt_project_path
is None
or str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks excellent. Thanks a lot for quickly picking up and solving the issue, @MrBones757 !
I left some minor comments in-line, and I strongly suggest we change the following example DAGs, which are used in the integration tests:
dev/dags/cosmos_manifest_example.py
: to havedbt_project_path
as Nonedev/dags/basic_cosmos_task_group.py
: to havedbt_project_path
as a string
This will complement the unit tests you created to help us verify the change works e2e.
We'll release this change as part of Cosmos 1.2.1, at the latest, early next week.
I think the latest commit should address all your feedback :) let me know if theres anything else that needs updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrBones757, we're super close to merging this. Thanks for addressing all the comments and the relevant refactoring on how we represent the DbtProjects within Cosmos.
While validating, I found an issue while running the DAG cosmos_manifest_example
using:
AIRFLOW_CONN_AIRFLOW_DB="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 airflow dags test cosmos_manifest_example `date -Iseconds`
It raised the following exception:
[2023-10-23T15:07:16.149+0100] {dag.py:2760} ERROR - (astronomer-cosmos) - Task failed; ti=<TaskInstance: cosmos_manifest_example.customers.run manual__2023-10-23T15:07:10+01:00 [up_for_retry]>
Traceback (most recent call last):
File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/dag.py", line 2758, in test
_run_task(ti, session=session)
File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/dag.py", line 3902, in _run_task
ti._run_raw_task(session=session)
File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/utils/session.py", line 74, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 1516, in _run_raw_task
self._execute_task_with_callbacks(context, test_mode, session=session)
File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 1679, in _execute_task_with_callbacks
result = self._execute_task(context, task_orig)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 1742, in _execute_task
result = execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 445, in execute
self.build_and_run_cmd(context=context, cmd_flags=cmd_flags)
File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 357, in build_and_run_cmd
result = self.run_command(cmd=dbt_cmd, env=env, context=context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 198, in run_command
shutil.copytree(
File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/shutil.py", line 561, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/shutil.py", line 466, in _copytree
srcname = os.path.join(src, srcentry.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen posixpath>", line 76, in join
TypeError: expected str, bytes or os.PathLike object, not NoneType
I checked in the CI, and it seems this is the only DAG not being run in any of our integration tests - something that also happens on Cosmos main
branch. The fix for this issue is here:
#619. Please, could you rebase and look into this issue?
Removed nested functions from graph.load_manual updated dags with alternate inputs to get better e2e test coverage
Stopped using hard-coded Profile Path for Tests and relying on default in old DbtProject - Source it from a ProfileConfig instead. Dont Pass project_path to DbtNode if it doesnt exist - Allow DbtNode to compare paths based on relative too. Changed Other DbtProject to LegacyDbtProject consistantly. Updated tests to pass the fully qualified project path, rather than root and name.
For now, while i'm testing, i've added a new field to ExecutionConfig called dbt_project_path_root, which i'm referencing in converter.py init as: I was thinking of just adding it as a default in converter, but, exposing it as a config option seemed like a better option, though, im aware we don't want to cause any backward compatibility issues with public facing interfaces like this, so happy to remove it from ExecutionConfig, and just hard-code (lookup AIRFLOW_HOME) in converter if thats a better approach I'm currently unable to test this locally due to some environment constraints on my development environment, though im working on this and should have more tests conducted soon to fully verify the validity of this solution. |
I have confirmed all unit and integration tests run successfully on a fresh vm across the required python/airflow version set/s Could you also re-validate the failing test you found on your end to ensure its working as expected? |
@MrBones757, thanks for looking into this, addressing it quickly, and finding a workaround for the issue! I also have mixed feelings about the approach of setting the path in the local operator. As you pointed out, I'm worried about edge cases and the possibility of us introducing a breaking change or unexpected behaviours. I understand that to execute the dbt commands in the LOCAL mode, we will need the original dbt project code & path. Implementing ticket #568 would solve the issue without needing the more abrupt change proposed in #597 - I see it as a first step to solving this problem. K8s and Docker users have complained about this existing behaviour. That said, fixing #568 is outside the scope of the current PR. My suggestion for us to wrap this PR:
Once we implement #568, we can update the Manifest example safely - adding the project path only to the execution configuration. I'm adding a comment to the ticket so we don't forget this. WDYT? |
@tatiana I'm happy to revert that change later, as long as we're sure that this wont break docker / k8s based deployments - i'm not 100% clear on where those execution methods get their paths from. Based on what i could see by following the code though, this property that im looking up in env when dbt_project_path is not defined seems to also control paths for docker / k8s too - can you confirm where the k8s / docker paths are controlled? my assumption was it used the same mechanism.. my thinking here, is that since there would be no root path when only a manifest is provided, dbt would have no base folder for its operations in the containerised modes and thus resul in errors locating files, unconfirmed though - purely an assumption |
@MrBones757 You are correct that Kubernetes and Docker rely on the value My main concern on 02ae25b is this configuration change:
This feels very opinionated and could lead to unexpected behaviours. I believe we should not set a default project path on the user's behalf. The documentation you mentioned (https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html) is just an example of how to use Cosmos with Kubernetes - but, ultimately - the user is the one who controls the project/path. The amazing PR you wrote solves the issue Since this PR has already grown a lot, I suggest we rename it to: And get it merged. All the work you did is super valuable - we'll only be missing the last cherry to wrap it up. Before merging this PR, I'll make sure I validate things continue working with the Cosmos K8s execution mode. |
dbt_project_path
is None
or str
that sounds good to me - to confirm the above: I will revert the change to the ExecutionConfig, and we will require the project path be defined still (by failing if it isnt?); Separately to that, i will have a discussion with you about the implementation of #568 as i believe i should have bandwith to do that. |
I cant see any other opportunities for adding coverage in the scope of this PR. Is this slight change required to be resolved to get this merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, @MrBones757 , thanks a lot for the contribution and for all the patience in addressing the feedback. The refactoring was amazing, and we're very close to allowing users to support dbt_project_path=None
!
…port dbt_project_path=None (#605) As part of the changes made #581, some downstream logic was missed relating to the handling of a None and String-based project dir. This MR attempts to remedy this issue by adding down steam support for the project dir being None (including generation of exceptions and guarding), as well as some property reference changes in the converter. Closes: #601 Co-authored-by: tabmra <[email protected]> (cherry picked from commit 2f8d0e2)
Bug fixes * Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605 * Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613 * Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599 * Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606 Others * Update contributing guide docs by @raffifu in #591 * Fix running test that validates manifest-based DAGs by @tatiana in #619 * pre-commit updates in #604 and #621
Bug fixes * Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605 * Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613 * Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599 * Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606 Others * Update contributing guide docs by @raffifu in #591 * Fix running test that validates manifest-based DAGs by @tatiana in #619 * pre-commit updates in #604 and #621 (cherry picked from commit 52c34a2)
Bug fixes * Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605 * Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613 * Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599 * Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606 * Fix `LoadMode.AUTOMATIC` behaviour to use `LoadMode.DBT_LS` when `ProfileMapping` is used by @tatiana in #625 * Fix failure if `openlineage-common` raises a jinja exception by @tatiana in #626 Others * Update contributing guide docs by @raffifu in #591 * Remove unnecessary stack trace from Cosmos initialization by @tatiana in #624 * Fix running test that validates manifest-based DAGs by @tatiana in #619 * pre-commit updates in #604 and #621
Bug fixes * Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605 * Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613 * Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599 * Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606 * Fix `LoadMode.AUTOMATIC` behaviour to use `LoadMode.DBT_LS` when `ProfileMapping` is used by @tatiana in #625 * Fix failure if `openlineage-common` raises a jinja exception by @tatiana in #626 Others * Update contributing guide docs by @raffifu in #591 * Remove unnecessary stack trace from Cosmos initialization by @tatiana in #624 * Fix running test that validates manifest-based DAGs by @tatiana in #619 * pre-commit updates in #604 and #621 (cherry picked from commit 635fb7b)
Bug fixes * Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605 * Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613 * Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599 * Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606 * Fix `LoadMode.AUTOMATIC` behaviour to use `LoadMode.DBT_LS` when `ProfileMapping` is used by @tatiana in #625 * Fix failure if `openlineage-common` raises a jinja exception by @tatiana in #626 Others * Update contributing guide docs by @raffifu in #591 * Remove unnecessary stack trace from Cosmos initialization by @tatiana in #624 * Fix running test that validates manifest-based DAGs by @tatiana in #619 * pre-commit updates in #604 and #621 (cherry picked from commit 635fb7b)
@MrBones757 We just published the release 1.2.1: |
… Rendering and Execution (#634) This MR finishes the work that was started in #605 to add full support for ProjectConfig.dbt_project_path = None, and implements #568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: #568
… Rendering and Execution (#634) This MR finishes the work that was started in #605 to add full support for ProjectConfig.dbt_project_path = None, and implements #568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: #568 (cherry picked from commit b64eb9a)
… Rendering and Execution (#634) This MR finishes the work that was started in #605 to add full support for ProjectConfig.dbt_project_path = None, and implements #568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: #568 (cherry picked from commit b64eb9a)
… Rendering and Execution (astronomer#634) This MR finishes the work that was started in astronomer#605 to add full support for ProjectConfig.dbt_project_path = None, and implements astronomer#568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: astronomer#568
… Rendering and Execution (astronomer#634) This MR finishes the work that was started in astronomer#605 to add full support for ProjectConfig.dbt_project_path = None, and implements astronomer#568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: astronomer#568
… Rendering and Execution (#634) This MR finishes the work that was started in #605 to add full support for ProjectConfig.dbt_project_path = None, and implements #568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: #568
… Rendering and Execution (#634) This MR finishes the work that was started in #605 to add full support for ProjectConfig.dbt_project_path = None, and implements #568. Within this PR, several things have been updated: 1 - Added project_path fields to RenderConfig and ExecutionConfig 2 - Simplified the consumption of RenderConfig in the dbtGraph class 3 - added option to configure different dbt executables for Rendering vs Execution. Closes: #568
Description
As part of the changes made #581, some downstream logic was missed relating to the handling of a None and String based project dir. This MR attempts to remedy this issue, by adding down steam support for project dir being None (including generation of exceptions and guarding), as well as some property reference changes in converter.
Related Issue(s)
closes #601
Breaking Change?
Checklist