Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve errors occurring when dbt_project_path is str and partial support dbt_project_path=None #605

Merged
merged 7 commits into from
Oct 25, 2023
Merged

Resolve errors occurring when dbt_project_path is str and partial support dbt_project_path=None #605

merged 7 commits into from
Oct 25, 2023

Conversation

MrBones757
Copy link
Contributor

Description

As part of the changes made #581, some downstream logic was missed relating to the handling of a None and String based project dir. This MR attempts to remedy this issue, by adding down steam support for project dir being None (including generation of exceptions and guarding), as well as some property reference changes in converter.

Related Issue(s)

closes #601

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@MrBones757 MrBones757 temporarily deployed to external October 17, 2023 04:08 — with GitHub Actions Inactive
@netlify
Copy link

netlify bot commented Oct 17, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 12792b7
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/6537a5a00b44630008680737

@MrBones757 MrBones757 marked this pull request as ready for review October 17, 2023 04:09
@MrBones757 MrBones757 requested a review from a team as a code owner October 17, 2023 04:09
@MrBones757 MrBones757 requested a review from a team October 17, 2023 04:09
@MrBones757
Copy link
Contributor Author

@tatiana

I'm not 100% confident with some of the changes i made downstream - assuming test coverage is all good we should be fine, however, given i'm still not 100% familiar with the codebase here some more detailed review may be required.

@MrBones757
Copy link
Contributor Author

I have just noticed the above errors in some tests - i will check those out shortly

@MrBones757 MrBones757 temporarily deployed to external October 17, 2023 09:50 — with GitHub Actions Inactive
@MrBones757 MrBones757 temporarily deployed to external October 17, 2023 21:04 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (1bf3a3e) 93.32% compared to head (12792b7) 93.29%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #605      +/-   ##
==========================================
- Coverage   93.32%   93.29%   -0.03%     
==========================================
  Files          54       53       -1     
  Lines        2128     2089      -39     
==========================================
- Hits         1986     1949      -37     
+ Misses        142      140       -2     
Files Coverage Δ
cosmos/config.py 96.87% <100.00%> (+3.06%) ⬆️
cosmos/converter.py 96.82% <100.00%> (-0.32%) ⬇️
cosmos/dbt/graph.py 98.77% <100.00%> (-0.03%) ⬇️
cosmos/dbt/parser/project.py 90.04% <100.00%> (ø)
cosmos/dbt/selector.py 96.52% <75.00%> (-0.83%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana temporarily deployed to external October 17, 2023 21:43 — with GitHub Actions Inactive
@tatiana tatiana added this to the 1.2.1 milestone Oct 18, 2023
@tatiana tatiana temporarily deployed to external October 19, 2023 09:18 — with GitHub Actions Inactive
@tatiana tatiana changed the title resolved errors occuring when dbt_project_path was None, or str Resolve errors occurring when dbt_project_path is None or str Oct 19, 2023
@tatiana tatiana changed the title Resolve errors occurring when dbt_project_path is None or str Resolve errors occurring when dbt_project_path is None or str Oct 19, 2023
Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent. Thanks a lot for quickly picking up and solving the issue, @MrBones757 !

I left some minor comments in-line, and I strongly suggest we change the following example DAGs, which are used in the integration tests:

  • dev/dags/cosmos_manifest_example.py: to have dbt_project_path as None
  • dev/dags/basic_cosmos_task_group.py: to have dbt_project_path as a string
    This will complement the unit tests you created to help us verify the change works e2e.

We'll release this change as part of Cosmos 1.2.1, at the latest, early next week.

cosmos/converter.py Outdated Show resolved Hide resolved
cosmos/dbt/graph.py Outdated Show resolved Hide resolved
tests/test_converter.py Outdated Show resolved Hide resolved
tests/test_converter.py Outdated Show resolved Hide resolved
@MrBones757 MrBones757 temporarily deployed to external October 19, 2023 10:28 — with GitHub Actions Inactive
@MrBones757
Copy link
Contributor Author

I think the latest commit should address all your feedback :) let me know if theres anything else that needs updating

@MrBones757 MrBones757 temporarily deployed to external October 21, 2023 06:05 — with GitHub Actions Inactive
@MrBones757 MrBones757 temporarily deployed to external October 23, 2023 08:50 — with GitHub Actions Inactive
@MrBones757 MrBones757 temporarily deployed to external October 23, 2023 09:34 — with GitHub Actions Inactive
@MrBones757 MrBones757 temporarily deployed to external October 23, 2023 10:41 — with GitHub Actions Inactive
Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrBones757, we're super close to merging this. Thanks for addressing all the comments and the relevant refactoring on how we represent the DbtProjects within Cosmos.

While validating, I found an issue while running the DAG cosmos_manifest_example using:

AIRFLOW_CONN_AIRFLOW_DB="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 airflow dags test cosmos_manifest_example `date -Iseconds`

It raised the following exception:

[2023-10-23T15:07:16.149+0100] {dag.py:2760} ERROR - (astronomer-cosmos) - Task failed; ti=<TaskInstance: cosmos_manifest_example.customers.run manual__2023-10-23T15:07:10+01:00 [up_for_retry]>
Traceback (most recent call last):
  File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/dag.py", line 2758, in test
    _run_task(ti, session=session)
  File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/dag.py", line 3902, in _run_task
    ti._run_raw_task(session=session)
  File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/utils/session.py", line 74, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 1516, in _run_raw_task
    self._execute_task_with_callbacks(context, test_mode, session=session)
  File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 1679, in _execute_task_with_callbacks
    result = self._execute_task(context, task_orig)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tati/Code/astronomer-cosmos/venv-airflow27/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 1742, in _execute_task
    result = execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 445, in execute
    self.build_and_run_cmd(context=context, cmd_flags=cmd_flags)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 357, in build_and_run_cmd
    result = self.run_command(cmd=dbt_cmd, env=env, context=context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/operators/local.py", line 198, in run_command
    shutil.copytree(
  File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/shutil.py", line 561, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/shutil.py", line 466, in _copytree
    srcname = os.path.join(src, srcentry.name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen posixpath>", line 76, in join
TypeError: expected str, bytes or os.PathLike object, not NoneType

I checked in the CI, and it seems this is the only DAG not being run in any of our integration tests - something that also happens on Cosmos main branch. The fix for this issue is here:
#619. Please, could you rebase and look into this issue?

tabmra added 5 commits October 24, 2023 10:32
Removed nested functions from graph.load_manual
updated dags with alternate inputs to get better e2e test coverage
Stopped using hard-coded Profile Path for Tests and relying on default in old DbtProject - Source it from a ProfileConfig instead.
Dont Pass project_path to DbtNode if it doesnt exist - Allow DbtNode to compare paths based on relative too.
Changed Other DbtProject to LegacyDbtProject consistantly.
Updated tests to pass the fully qualified project path, rather than root and name.
@MrBones757
Copy link
Contributor Author

For now, while i'm testing, i've added a new field to ExecutionConfig called dbt_project_path_root, which i'm referencing in converter.py init as:
dbt_project_path_runtime_root = execution_config.dbt_project_path_root
This way, we can pass this to the operator's args, in combination with the project_name from ProjectConfig to get the path at runtime.

I was thinking of just adding it as a default in converter, but, exposing it as a config option seemed like a better option, though, im aware we don't want to cause any backward compatibility issues with public facing interfaces like this, so happy to remove it from ExecutionConfig, and just hard-code (lookup AIRFLOW_HOME) in converter if thats a better approach

I'm currently unable to test this locally due to some environment constraints on my development environment, though im working on this and should have more tests conducted soon to fully verify the validity of this solution.

@MrBones757
Copy link
Contributor Author

MrBones757 commented Oct 24, 2023

I have confirmed all unit and integration tests run successfully on a fresh vm across the required python/airflow version set/s
I think my choice to add execution_config.dbt_project_path_root in this PR is questionable, and alternate implementation recommendations would be appreciated.

Could you also re-validate the failing test you found on your end to ensure its working as expected?

@tatiana
Copy link
Collaborator

tatiana commented Oct 24, 2023

@MrBones757, thanks for looking into this, addressing it quickly, and finding a workaround for the issue!

I also have mixed feelings about the approach of setting the path in the local operator. As you pointed out, I'm worried about edge cases and the possibility of us introducing a breaking change or unexpected behaviours.

I understand that to execute the dbt commands in the LOCAL mode, we will need the original dbt project code & path. Implementing ticket #568 would solve the issue without needing the more abrupt change proposed in #597 - I see it as a first step to solving this problem. K8s and Docker users have complained about this existing behaviour. That said, fixing #568 is outside the scope of the current PR.

My suggestion for us to wrap this PR:

  1. Undo the changes from the last commit, 02ae25b.
  2. Revert the manifest example; since it uses LOCAL, it needs the project path (I'm sorry for my misleading suggestion)
  3. Rely on the test https://github.com/astronomer/astronomer-cosmos/pull/605/files#diff-f56d97c6216a37d8ca841f31c63b704d05b7e3a42d6dbe783411d6fef8204615R116 to cover the change we introduced related to supporting Manifest, no project path & a non-local Cosmos execution mode.

Once we implement #568, we can update the Manifest example safely - adding the project path only to the execution configuration. I'm adding a comment to the ticket so we don't forget this.

WDYT?

@MrBones757
Copy link
Contributor Author

@tatiana I'm happy to revert that change later, as long as we're sure that this wont break docker / k8s based deployments - i'm not 100% clear on where those execution methods get their paths from. Based on what i could see by following the code though, this property that im looking up in env when dbt_project_path is not defined seems to also control paths for docker / k8s too - can you confirm where the k8s / docker paths are controlled? my assumption was it used the same mechanism..

my thinking here, is that since there would be no root path when only a manifest is provided, dbt would have no base folder for its operations in the containerised modes and thus resul in errors locating files, unconfirmed though - purely an assumption

@tatiana
Copy link
Collaborator

tatiana commented Oct 24, 2023

@MrBones757 You are correct that Kubernetes and Docker rely on the value dbt_project_path.

My main concern on 02ae25b is this configuration change:

dbt_project_path_root: str | Path = Path(os.environ.get("AIRFLOW_HOME", "/dbt")) / "dags" / "dbt"

This feels very opinionated and could lead to unexpected behaviours. I believe we should not set a default project path on the user's behalf. The documentation you mentioned (https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html) is just an example of how to use Cosmos with Kubernetes - but, ultimately - the user is the one who controls the project/path.

The amazing PR you wrote solves the issue Resolve errors occurring when dbt_project_path is str - and it does 90% of what is necessary for us to support dbt_project_path=None. It does the hardest part of the job, which is validating configurations and doing the necessary changes during DAG parsing time. However, I think for us to have the full support for dbt_project_path=None, during execution, we'll also need to implement #568 - in a separate PR.

Since this PR has already grown a lot, I suggest we rename it to:
Resolve errors occurring when dbt_project_path is str and partial support dbt_project_path=None

And get it merged. All the work you did is super valuable - we'll only be missing the last cherry to wrap it up.

Before merging this PR, I'll make sure I validate things continue working with the Cosmos K8s execution mode.

@MrBones757 MrBones757 changed the title Resolve errors occurring when dbt_project_path is None or str Resolve errors occurring when dbt_project_path is str and partial support dbt_project_path=None Oct 24, 2023
@MrBones757
Copy link
Contributor Author

MrBones757 commented Oct 24, 2023

that sounds good to me - to confirm the above:

I will revert the change to the ExecutionConfig, and we will require the project path be defined still (by failing if it isnt?);

Separately to that, i will have a discussion with you about the implementation of #568 as i believe i should have bandwith to do that.

@MrBones757 MrBones757 temporarily deployed to external October 24, 2023 11:08 — with GitHub Actions Inactive
@MrBones757
Copy link
Contributor Author

I cant see any other opportunities for adding coverage in the scope of this PR. Is this slight change required to be resolved to get this merged?

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @MrBones757 , thanks a lot for the contribution and for all the patience in addressing the feedback. The refactoring was amazing, and we're very close to allowing users to support dbt_project_path=None!

@tatiana tatiana merged commit 2f8d0e2 into astronomer:main Oct 25, 2023
39 of 40 checks passed
@MrBones757 MrBones757 deleted the fix-project-path-errors branch October 25, 2023 13:16
tatiana pushed a commit that referenced this pull request Oct 25, 2023
…port dbt_project_path=None (#605)

As part of the changes made #581, some downstream logic was missed
relating to the handling of a None and String-based project dir. This MR
attempts to remedy this issue by adding down steam support for the project
dir being None (including generation of exceptions and guarding), as
well as some property reference changes in the converter.

Closes: #601

Co-authored-by: tabmra <[email protected]>
(cherry picked from commit 2f8d0e2)
tatiana added a commit that referenced this pull request Oct 25, 2023
Bug fixes

* Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605
* Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613
* Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599
* Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606

Others

* Update contributing guide docs by @raffifu in #591
* Fix running test that validates manifest-based DAGs by @tatiana in #619
* pre-commit updates in #604 and #621
tatiana added a commit that referenced this pull request Oct 25, 2023
Bug fixes

* Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605
* Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613
* Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599
* Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606

Others

* Update contributing guide docs by @raffifu in #591
* Fix running test that validates manifest-based DAGs by @tatiana in #619
* pre-commit updates in #604 and #621

(cherry picked from commit 52c34a2)
tatiana added a commit that referenced this pull request Oct 25, 2023
Bug fixes

* Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605
* Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613
* Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599
* Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606
* Fix `LoadMode.AUTOMATIC` behaviour to use `LoadMode.DBT_LS` when `ProfileMapping` is used by @tatiana in #625
* Fix failure if `openlineage-common` raises a jinja exception by @tatiana in #626

Others

* Update contributing guide docs by @raffifu in #591
* Remove unnecessary stack trace from Cosmos initialization by @tatiana in #624
* Fix running test that validates manifest-based DAGs by @tatiana in #619
* pre-commit updates in #604 and #621
tatiana added a commit that referenced this pull request Oct 25, 2023
Bug fixes

* Resolve errors occurring when `dbt_project_path` is str and partial support `dbt_project_path=None` by @MrBones757 in #605
* Fix running dbt tests that depend on multiple models (support `--indirect-selection` buildable) by @david-mag in #613
* Add tests to sources, snapshots and seeds when using `TestBehavior.AFTER_EACH` by @tatiana in #599
* Fix custom selector when select has a subset of tags of the models' tags by @david-mag in #606
* Fix `LoadMode.AUTOMATIC` behaviour to use `LoadMode.DBT_LS` when `ProfileMapping` is used by @tatiana in #625
* Fix failure if `openlineage-common` raises a jinja exception by @tatiana in #626

Others

* Update contributing guide docs by @raffifu in #591
* Remove unnecessary stack trace from Cosmos initialization by @tatiana in #624
* Fix running test that validates manifest-based DAGs by @tatiana in #619
* pre-commit updates in #604 and #621

(cherry picked from commit 635fb7b)
@tatiana tatiana mentioned this pull request Oct 25, 2023
tatiana added a commit that referenced this pull request Oct 25, 2023
Bug fixes

* Resolve errors occurring when `dbt_project_path` is str and partial
support `dbt_project_path=None` by @MrBones757 in #605
* Fix running dbt tests that depend on multiple models (support
`--indirect-selection` buildable) by @david-mag in #613
* Add tests to sources, snapshots and seeds when using
`TestBehavior.AFTER_EACH` by @tatiana in #599
* Fix custom selector when select has a subset of tags of the models'
tags by @david-mag in #606
* Fix `LoadMode.AUTOMATIC` behaviour to use `LoadMode.DBT_LS` when
`ProfileMapping` is used by @tatiana in #625
* Fix failure if `openlineage-common` raises a jinja exception by
@tatiana in #626

Others

* Update contributing guide docs by @raffifu in #591
* Remove unnecessary stack trace from Cosmos initialization by @tatiana
in #624
* Fix running test that validates manifest-based DAGs by @tatiana in
#619
* pre-commit updates in #604 and #621

(cherry picked from commit 635fb7b)
@tatiana
Copy link
Collaborator

tatiana commented Oct 25, 2023

tatiana pushed a commit that referenced this pull request Nov 3, 2023
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
tatiana pushed a commit that referenced this pull request Nov 6, 2023
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
(cherry picked from commit b64eb9a)
tatiana pushed a commit that referenced this pull request Nov 6, 2023
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
(cherry picked from commit b64eb9a)
LennartKloppenburg pushed a commit to LennartKloppenburg/astronomer-cosmos that referenced this pull request Nov 17, 2023
… Rendering and Execution (astronomer#634)

This MR finishes the work that was started in astronomer#605 to add full support
for ProjectConfig.dbt_project_path = None, and implements astronomer#568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: astronomer#568
LennartKloppenburg pushed a commit to LennartKloppenburg/astronomer-cosmos that referenced this pull request Dec 17, 2023
… Rendering and Execution (astronomer#634)

This MR finishes the work that was started in astronomer#605 to add full support
for ProjectConfig.dbt_project_path = None, and implements astronomer#568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: astronomer#568
tatiana pushed a commit that referenced this pull request Jul 4, 2024
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
tatiana pushed a commit that referenced this pull request Jul 29, 2024
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix converting dbt projects that have ProjectConfig.dbt_project_path set as strings
2 participants