-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2271] [Feature] Compute seed file hashes incrementally #7124
Comments
Thanks for opening this issue and the associated PR @noppaz ! Excited to see you and @acurtis-evi collaborate and combine the complementary pieces. Related to #6875 Current behaviorFrom the caveats to state comparison for seeds:
Proposed behaviorLightly edited from here:
Functional approval@jtcohen6 indicated the following in the discussion in Slack:
Acceptance criteria |
PR #7125 aims to solve two issues so the linking in the development pane to the right doesn't link to the PR atm. |
I've added an issue for documentation updates here dbt-labs/docs.getdbt.com#2958 |
Listing the
Here's another:
I made the update here, so you should see both linked now. |
Is this your first time submitting a feature request?
Describe the feature
This relates to a conversion on Slack which also resulted in issue #7117.
If we allow dbt to hash larger seed files I think these should be computed incrementally to avoid allocation of too much memory. This will be most important in CI where memory is more limited.
My idea is to add another classmethod to the
FileHash
dataclass which has the same signature asfrom_contents
. This method can then be used specifically for seed file hashes and from_contents can be continued to be used alongside it.Describe alternatives you've considered
The alternative is to go with what we have which could spike memory usage for those who override the default limit and have a lot of large seed files. Yes this could be an anti pattern but there's no reason to limit the user like that here.
Who will this benefit?
I would argue that the default limit should be increased from 1 MB but as we have the environment variable override possibility in above PR I would be fine with keeping the default at 1 MB for now.
Therefore, the most benefit are more advanced projects which will override the 1 MB limit with the environment variable.
Are you interested in contributing this feature?
Yes, I will create the PR
Anything else?
No response
The text was updated successfully, but these errors were encountered: