Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: Enable cache for Docker builds #3821

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

echoix
Copy link
Member

@echoix echoix commented Jun 14, 2024

Currently, none of the docker images use caching. Two of the four docker builds compile PDAL from source, that itself takes 8.3 minutes each. So 2x8.3 minutes could be saved only for that step.

However, usually we would want to have a clean build, built from scratch, when it is a release, so if ever a caching issue was present, it wouldn't affect the released image. But, since the docker builds triggered for releasebranch_8_* are published for each commit (backport) (ie: docker pull osgeo/grass-gis:releasebranch_8_4-ubuntu_wxgui) or commit to main, we're effectively already creating "release"-level docker builds every time. A question to weigh in is whether caching could be used for these or not.

Another question I have, is whether the RUN instruction installing packages (like the apk install for alpine or the apt-get install on debian-based distros) should have the cache invalidated at some point or not. Otherwise, until the layers above the install step invalidates the cache (changing the base image digest or PDAl version, or the text of the install instruction), the layer might never be invalidated and see that the contents of the layer are different (only COPY and ADD do that: https://docs.docker.com/build/cache/invalidation/#general-rules). Otherwise, as suggests the third point of https://docs.docker.com/build/cache/invalidation/#run-instructions, we could need to do a build stage just for installing the dependencies in order to use the --no-cache-filter option. But the question then is when do we determine when to make a run without cache of that step? This becomes equivalent to just having periodically no-cache builds to get new packages built.

To easily implement the no-cache builds, but only sometimes, the Docker workflows should be able to run on workflow_dispatch (manual runs) or scheduled runs, where the no-cache option (available in the build-push-action) could be set. Allowing to run on workflow dispatch would also allow to launch manual runs for a specific ref (branch), as long as its in the same repo (that means unmerged PRs could be launched in the fork only, but at least it could run). But this is for another PR, if ever needed.

The choice of GitHub Actions cache instead of inline, local, or registry cache types (https://docs.docker.com/build/cache/backends/#backends), is that at least with this one, we can manually delete caches when wanted if needed. My second choice would be with the registry type, with mode=max, but without having the chance to control when a caching issue occurs

It is currently a draft, as I also still need to validate the cache size that would be uploaded, to make sure we don't bust the 10 GB limit, causing cache thrashing.

@echoix echoix marked this pull request as draft June 14, 2024 19:59
@echoix
Copy link
Member Author

echoix commented Jun 14, 2024

@mmacata I'd like your thoughts on this, especially with your experiences with multiple local builds using the same existing cache for the install step (layer). Does it invalidate enough? I'm letting this be a draft, but would be ready anytime if the discussion here does not need more changes.

@github-actions github-actions bot added the CI Continuous integration label Jun 14, 2024
@landam landam added this to the 8.5.0 milestone Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants