-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically cache zh and h1 hash values for all downloaded providers #27811
Comments
A little update, because I've just seen my weekly firewall report. Running |
@apparentlymart @jbardin sorry for tagging you directly, but this issue does not seem to be getting any traction, while the problem is still there, as evidenced by a large number of linked issues. Lack of checksum caching has grown into a painful problem, because I have to look after an internal repository of 200+ terraform modules, and any mass provider upgrade becomes a challenge. I am working from home, I have a typical residential broadband, and when locking 200+ modules requires downloading something like 25Gb and waiting for a few hours, it's an indication that something is not right (because with proper caching it would have only required a few minutes). I would really appreciate it even if you left a comment on the proposal in this PR. |
This problem doesn't require using multiple platforms. Even if you use multiple hosts with the same platform (for example, all linux), if one person has the provider cached running init will only generate the zh hash, but if someone else then runs init and doesn't have that version of the provder hashed, it will add all the h1 hashes as well. |
Current Terraform Version
Use-cases
Dependency locking mechanism has quickly turned from highly anticipated feature into a major pain point after we upgraded to Terraform 0.14. This is due to a combination of provider caching and running Terraform on multiple platforms (see attempted solutions and a list of related issues below).
Ideally we would want Terraform to automatically fetch all available hashes for each provider and add them to the lock file. But according to the explanation here it is currently not supported by the registry API and is unlikely to be fixed in 0.14.
So we would like to mitigate at least the performance impact of dependency locking while we don't have a more comprehensive solution.
Attempted Solutions
After each provider version upgrade we need to run
terraform init -upgrade
once and thenterraform providers lock
twice in each affected module. For cached providers,terraform init
only populates a singleh1
hash, which is calculated from the cached provider's package. Then the first run ofterraform providers lock
without parameters populates allzh
hashes, and finally the second run of e.g.terraform providers lock -platform=windows_amd64 -platform=darwin_amd64
addsh1
hashes for the additional platforms.The main problem with the
terraform providers lock
command is that it never caches its results anywhere. And it never cooperates with provider cache to check if the provider has already been downloaded. Instead, it always fetches the file with SHA256 signatures (from where thezh
hashes originate). Then it always downloads the provider package for a given platform, calculates theh1
hash for it. And then it discards all downloaded files and never persists the hashes anywhere apart from the.terraform.lock.hcl
file. As a result, when locking providers for 3 different platforms, in each of the 10 modules with 3 providers per module,terraform providers lock
will download and throw away provider binaries 3×10×3=90 times. Given the typical size of a provider package, that can add up to several Gb of data needlessly fetched. And as a result the operation takes a lot of time over an average internet connection.Unfortunately, since the new lock file feature cannot be turned off, we are facing a dilemma: either reduce the overhead somehow, or permanently ban the lock files through
.gitignore
, so that they are never shared.Proposal
Since signatures and hashes, like provider packages themselves, are considered to be immutable, there does not seem to be any good reason why the signatures couldn't also be cached in some file inside
~/.terraform.d
. Bothterraform init
andterraform providers lock
should cooperate with provider cache, instead of fighting it or ignoring it. Every timeinit
orlock
runs, it should check the cached signatures first. If it needs to download (and store in a local cache) a new version of a provider (or a package for a different platform), it also needs to update the signature cache with whatever it has just downloaded and/or calculated. Even if in the future the registry protocol allows querying forh1
hashes without downloading the provider package, storing those responses in cache will still make sense for performance reasons.For backward compatibility, this feature may need to be disabled by default and enabled via a new configuration key in
~/.terraformrc
. But enabling the provider cache should always imply enabling signature cache, because otherwise the entire integration with the cache is broken.References
The text was updated successfully, but these errors were encountered: