Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exclude password-like fields for considering reparse #9844

Merged
merged 8 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Fixes-20240402-135556.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Fixes
body: Exclude password-like fields for considering reparse
time: 2024-04-02T13:55:56.169953-07:00
custom:
Author: ChenyuLInx
Issue: "9795"
13 changes: 9 additions & 4 deletions core/dbt/parser/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -988,10 +988,15 @@ def build_manifest_state_check(self):
env_var_str += f"{key}:{config.profile_env_vars[key]}|"
profile_env_vars_hash = FileHash.from_contents(env_var_str)

# Create a FileHash of the profile file
profile_path = os.path.join(get_flags().PROFILES_DIR, "profiles.yml")
with open(profile_path) as fp:
profile_hash = FileHash.from_contents(fp.read())
# Create a hash of the connection keys, which user has access to in
# jinja context. Thus keys here may affect the parsing result.

# Renaming this variable mean that we will have to do a whole lot more
# change to make sure the previous manifest can be loaded correctly.
# This is an example of naming should be chosen based on the functionality
# rather than the implementation details.
connection_keys = config.credentials._connection_keys()
Copy link
Contributor

@MichelleArk MichelleArk Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like credentials._connection_keys() is just the set of names of credential keys, e.g:

('account', 'user', 'database', 'warehouse', 'role', 'schema', 'authenticator', 'oauth_client_id', 'query_tag', 'client_session_keep_alive', 'host', 'port', 'proxy_host', 'proxy_port', 'protocol', 'connect_retries', 'connect_timeout', 'retry_on_database_errors', 'retry_all', 'insecure_mode', 'reuse_connections')

If we hash just the key names without the values, then partial parsing won't retrigger even if one of the values changes, as it should. We should use list(config.credentials.credential_info()) instead. with_aliases=False looks appropriate here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This also mean I probably should add an integration test to capture the behavior

Copy link
Contributor

@QMalcolm QMalcolm Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You beat me to my suggestion on tests 😂

profile_hash = FileHash.from_contents(pprint.pformat(connection_keys))

# Create a FileHashes for dbt_project for all dependencies
project_hashes = {}
Expand Down
23 changes: 23 additions & 0 deletions tests/unit/test_parse_manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ def _new_file(self, searched, name, match):


class TestPartialParse(unittest.TestCase):
def setUp(self) -> None:
mock_project = MagicMock(RuntimeConfig)
mock_project.cli_vars = ""
mock_project.args = MagicMock()
mock_project.args.profile = "test"
mock_project.args.target = "test"
mock_project.project_env_vars = {}
mock_project.profile_env_vars = {}
mock_project.project_target_path = "mock_target_path"
mock_project.credentials = MagicMock()
self.mock_project = mock_project

@patch("dbt.parser.manifest.ManifestLoader.build_manifest_state_check")
@patch("dbt.parser.manifest.os.path.exists")
@patch("dbt.parser.manifest.open")
Expand All @@ -122,6 +134,17 @@ def test_partial_parse_file_path(self, patched_open, patched_os_exist, patched_s
# if specified in flags, we use the specified path
patched_open.assert_called_with("specified_partial_parse_path", "rb")

def test_partial_parse_profile_change(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this test seems to imply that we're asserting that something with partial parsing did or didn't happen. However looking at the test it seems to only assert that the file hash has changed, which is not indicative itself of whether a full or partial parse happened.

# This test validate that the profile_hash is updated when the connection keys change
profile_hash = "6e94a0aef218fd7aef18b257f0ba9fc33c92a2bc9788fc751868e43ab398137f"
self.mock_project.credentials._connection_keys.return_value = "test"
set_from_args(Namespace(), {})
manifest = ManifestLoader(self.mock_project, {})
assert manifest.manifest.state_check.profile_hash.checksum == profile_hash
self.mock_project.credentials._connection_keys.return_value = "test1"
manifest = ManifestLoader(self.mock_project, {})
assert manifest.manifest.state_check.profile_hash.checksum != profile_hash


class TestFailedPartialParse(unittest.TestCase):
@patch("dbt.tracking.track_partial_parser")
Expand Down