Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/changes current working dir when using a dbt project dir #9596

Conversation

rariyama
Copy link
Contributor

resolves #8997
I closed the previous PR by mistake, then I submit this one.

Problem

There is a difference in behaviour between dbt cli and dbt.invoke.
For example, When executing dbtRunner().invoke(["deps", "--project-dir", "project_dir"]) in python, directory
changes to project_dir, even though it doesn't affect when executing dbt deps.

Solution

The reason why directory changes is dbt.task.base.move_to_nearest_project_dir() is called in CleanTask , DepsTask and InitTask . Directory changes to the argument of the function.
It'll be possible to resolve this problem by using those classes as a context manager. The change will happen only in with statement.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@rariyama rariyama requested a review from a team as a code owner February 19, 2024 05:17
@rariyama rariyama requested a review from ChenyuLInx February 19, 2024 05:17
@cla-bot cla-bot bot added the cla:yes label Feb 19, 2024
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @rariyama
I wonder what if you can add the __enter__ and __exit__ to just the base task so we only do it once?

Also tagging @MichelleArk to see if there are any thoughts on starting this with pattern. It looks good to me.

@rariyama
Copy link
Contributor Author

Hi @ChenyuLInx . Thanks for your comments.
I agree with your idea, then I updated my code as follows on this commit 31ba979.

  • Added __enter__ and __exit__ to BaseTask
  • each child TaskClass inherits that class and behaves as a context manager.

I would like to add test code for this update. I feel like it's good enough to add the following code to integration test but what do you think about?

    def test_dbtRunner_invoke(self):
        pd = "<test-project-path>"
        dr = dbtRunner()
        commands = ["deps", "clean", "init"]

        for command in commands:
            req = [command, "--project-dir", pd, "--profiles-dir", pd]
            before_dir = os.getcwd()
            res = dr.invoke(req)
            self.assertEqual(res.success, True)
            # MEMO: Test whether directory is not changed after command is invoked.

            self.assertEqual(before_dir, os.getcwd())

@dbeatty10 dbeatty10 added the community This PR is from a community member label Mar 22, 2024
@dbeatty10
Copy link
Contributor

@rariyama I saw your message in #8997 (comment) and checked the comments in this PR.

Could you do these two things? After that we can have one of our engineers take another look:

  1. Resolve the conflicts
  2. Add the test below
    def test_runner_invoke(self, project):
        project_dir = project.project_root
        runner = dbtRunner()
        commands = ["deps", "clean", "init"]
        
        # TODO: change to some other directory other than the dbt project root

        for command in commands:
            before_dir = os.getcwd()
            res = runner.invoke([command, "--project-dir", project_dir, "--profiles-dir", project_dir])
            self.assertEqual(res.success, True)
            self.assertEqual(before_dir, os.getcwd())

The key thing will be to make sure the test actually captures the scenario in the bug report. i.e., you'll want to make sure that the test fails without your changes, but the test passes after you changes.

@rariyama
Copy link
Contributor Author

Hi @dbeatty10. Thank you for checking my comment.
I resolved conflict in this commit fc22afd and add test assertion in the commit c75835b.

I confirmed the results of both unittest and integration test are as expected by following ways.

  • Added test assertion without my changes and integration test fails.
    • Added test assertion to the source code of main branch and test failed.
  • Added my changes and both unit test and integration test passes.
    • It means my changes can resolve the issue.

I run the test code by the command make test and make integration.

If there's no remaining tasks I have to do, could you please have the engineers review it?

Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.92%. Comparing base (9ca1bc5) to head (b7aa1c6).
Report is 16 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9596      +/-   ##
==========================================
- Coverage   88.93%   88.92%   -0.02%     
==========================================
  Files         180      180              
  Lines       22735    22760      +25     
==========================================
+ Hits        20220    20239      +19     
- Misses       2515     2521       +6     
Flag Coverage Δ
integration 86.23% <100.00%> (-0.02%) ⬇️
unit 62.33% <11.76%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.33% <11.76%> (-0.05%) ⬇️
Integration Tests 86.23% <100.00%> (-0.02%) ⬇️

@rariyama
Copy link
Contributor Author

Hello Team. Could someone please review my PR?

Comment on lines 94 to 99

before_dir = os.getcwd()
res = dbt.invoke(args)
after_dir = os.getcwd()
# The directory has not been changed after running dbt command.
assert before_dir == after_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this assertion would be better if it were directly within a test rather than here. Maybe here instead?

What do you think @ChenyuLInx?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great!! Also makes it checking all future commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test code to tests/functional/dbt_runner/test_dbt_runner.py in this commit.
26ec953

Comment on lines 92 to 99
project.project_root = str(Path(project.project_root).resolve())
super().__init__(args=args)
# N.B. This is a temporary fix for a bug when using relative paths via
# --project-dir with deps. A larger overhaul of our path handling methods
# is needed to fix this the "right" way.
# See GH-7615
project.project_root = str(Path(project.project_root).resolve())
self.project = project

move_to_nearest_project_dir(project.project_root)
self.cli_vars = args.vars
self.project = project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rariyama Did you happen to check if preserving this order worked or not?

I'm wondering if you changed the ordering intentionally or if it was just an accidental side-effect when resolving merge conflicts.

i.e., I'm curious if this would work (which is basically identical to here except for removing the move_to_nearest_project_dir(project.project_root) line):

        super().__init__(args=args)

        # N.B. This is a temporary fix for a bug when using relative paths via
        # --project-dir with deps.  A larger overhaul of our path handling methods
        # is needed to fix this the "right" way.
        # See GH-7615
        project.project_root = str(Path(project.project_root).resolve())
        self.project = project

        self.cli_vars = args.vars

Copy link
Contributor Author

@rariyama rariyama Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the order of variables as you wrote in this commit.
4191735
I confirmed integration test passed on my local environment.

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rariyama for thinking through this and submit this PR. This change makes sense to me!
Since this is something that modifies the behavior of directory I would like to play with it a bit more locally before giving it the approve and merge.

@rariyama
Copy link
Contributor Author

@ChenyuLInx
Thank you for checking my PR. Please try to run on your local environment. If there are something you want to ask or I need to fix, please tell me about that.

@@ -202,6 +200,7 @@ def lock(self) -> None:
fire_event(DepsLockUpdating(lock_filepath=lock_filepath))

def run(self) -> None:
move_to_nearest_project_dir(self.args.project_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did move_to_nearest_project_dir(self.args.project_dir) need to move to this location in order for things to work?

Or was it "nice-to-have" for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a reason for moving the move_to_nearest_project_dir() function. The DepsTask class inherits from the BaseTask class and is designed to behave as a context manager. In BaseTask, the current directory is obtained within __enter__(), and os.chdir() is called within __exit__() to return to the previously obtained directory.

Normally, since __init__() is executed before __enter__(), if move_to_nearest_project_dir() is called inside __init__(), the directory after the move will be recorded, and it will not return correctly to the original directory.

The sequence of events can be described as follows:

  1. __init__().move_to_nearest_project_dir() # Moves to the project directory.
  2. __enter__().os.getcwd() # Gets the current directory, which is the project directory.
  3. __exit__().os.chdir() # Moves back to the project directory obtained in step 2.

By moving the move_to_nearest_project_dir() function to the run() function, the order of execution changes as follows:

  1. __enter__().os.getcwd() # Gets the current directory, i.e., the directory where dbt run is executed.
  2. DepsTask.run().move_to_nearest_project_dir() # Moves to the project directory to execute dbt deps.
  3. __exit__().os.chdir() # Returns to the directory obtained in step 1.

This is why it was necessary to move the move_to_nearest_project_dir() function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an amazing explanation -- thank you!

@ChenyuLInx
Copy link
Contributor

@rariyama looks like the new test you just introduced is failing on the deps command.(I think this is due to missing project fixture as input arg to the test.
def test_directory_does_not_change(self, dbt: dbtRunner, project) -> None:
I pushed a commit to your branch as this is a small fix, hope that's okay with you.
I did some local testing with dbtRunner and the behavior looks good to me.

@ChenyuLInx ChenyuLInx merged commit b56d96d into dbt-labs:main Sep 3, 2024
59 of 60 checks passed
@ChenyuLInx
Copy link
Contributor

@rariyama I went ahead and merged this change! Thank you so much for fixing it!

@rariyama
Copy link
Contributor Author

rariyama commented Sep 4, 2024

Hi @ChenyuLInx
Thank you for the review. If I have another request for a review in the future, I would appreciate your assistance again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes community This PR is from a community member
Projects
None yet
3 participants