Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIKE] Experiment with performance of catalog queries #9506

Open
1 task done
Tracked by #9425
dbeatty10 opened this issue Feb 1, 2024 · 3 comments
Open
1 task done
Tracked by #9425

[SPIKE] Experiment with performance of catalog queries #9506

dbeatty10 opened this issue Feb 1, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@dbeatty10
Copy link
Contributor

Housekeeping

  • I am a maintainer of dbt-core

Short description

For dbt-bigquery and dbt-snowflake, experiment with different settings for relation_count:

  • how does it affect the performance of the catalog query over a range of values?
  • is there a point at which the query becomes to big to be accepted for execution?
    • can we determine the size of the query to be submitted so as not to exceed the 1 MB limits set by Snowflake and BigQuery

Acceptance criteria

We can see a graph of the performance to run the catalog query (in seconds) on the y-axis vs. the number of selected nodes on the x-axis.

We'd generally expect it to look like one of the curves below (ideally the constant time blue one, but I'm guessing not 😉):

image

Impact to Adapters

Depending on the results of the experiment, we may choose to use different values for relation_count in dbt-bigquery and/or dbt-snowflake. Alternatively, we may choose to change our implementation in some way.

Context

The work was initially performed in #8521 / #8648.

Then #9394 expressed expectation that we'd get the benefits of #8648 even if more than 100 nodes are selected.

@ChenyuLInx
Copy link
Contributor

@dbeatty10 why only bigquery and snowflake?

@ChenyuLInx
Copy link
Contributor

Another dimension to consider: how many objects are in the schema.

@dbeatty10
Copy link
Contributor Author

@ChenyuLInx yeah, it makes good sense to do both bigquery and snowflake. And also consider the number objects within the schema. 👍

@martynydbt martynydbt assigned aranke and unassigned aranke Apr 25, 2024
@graciegoheen graciegoheen removed this from the v1.9 milestone Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants