Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1869388 add memoization to to_selectable #2815

Merged

Conversation

sfc-gh-aalam
Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam commented Dec 29, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1869388

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    The process of adding WithQueryBlocks during repeated subquery elimination optimization takes place as follows:

  • Identify repeated sub-query blocks
  • Add a WithQueryBlock - a snowflake LogicalPlan node
  • Create a SnowflakePlan out of the WithQueryBlock.

When parents for the WithQueryBlocks are updated, if the direct parent is a Selectable, we convert the SnowflakePlan into a Selectable using to_selectable. In case the WithQueryBlock has multiple parents, which by construction will always be true, a new Selectable object is created for each parent. See example below where new selectable nodes are highlighted in gray:

image

When large query block is applied on this plan, and the WithQueryBlock is chosen at eligible, since all these blocks have same statistics, they will all be eligible together and possible be materialized we create redundant materializations. The correct construction for such a case should look like the plan shown below:

image

@sfc-gh-aalam sfc-gh-aalam added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Dec 29, 2024
@sfc-gh-aalam sfc-gh-aalam marked this pull request as ready for review January 2, 2025 19:13
@sfc-gh-aalam sfc-gh-aalam requested review from a team as code owners January 2, 2025 19:13
@@ -79,6 +79,7 @@ def mock_session(mock_analyzer) -> Session:
fake_session = mock.create_autospec(Session)
fake_session._cte_optimization_enabled = False
fake_session._analyzer = mock_analyzer
fake_session._lock = mock.MagicMock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a more descriptive name to this lock -- similar to _plan_lock below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lock is actually part of Session object and used as a general purpose lock for serialization:

self._lock = create_rlock(self._conn._thread_safe_session_enabled)

Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

selectable._is_valid_for_replacement = (
snowflake_plan._is_valid_for_replacement
)
self._to_selectable_memo_dict[snowflake_plan._uuid] = selectable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should remember the original logical plan id instead of the resolved plan id, because the different snowflake plan might be created for the same orignal plan node

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's true

return self._to_selectable_memo_dict[plan._uuid]
snowflake_plan = self.resolve(plan)
selectable = SelectSnowflakePlan(snowflake_plan, analyzer=self)
selectable._is_valid_for_replacement = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, it should always be true in query generator instead of propogating from the snowlfake plan

selectable = SelectSnowflakePlan(snowflake_plan, analyzer=self)
selectable._is_valid_for_replacement = snowflake_plan._is_valid_for_replacement
return selectable
with self.session._lock:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a lock here, but not at other place in query_generator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it. query generator is not shared between threads so we should be good. That's why it is not added in other places.

@sfc-gh-aalam sfc-gh-aalam merged commit 10c612e into main Jan 3, 2025
40 checks passed
@sfc-gh-aalam sfc-gh-aalam deleted the aalam-SNOW-1869388-add-memoization-to-to_selectable branch January 3, 2025 18:29
@github-actions github-actions bot locked and limited conversation to collaborators Jan 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants