Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-architecture Docker images and parallel builds #3430

Merged
merged 11 commits into from
Oct 20, 2023

Conversation

agriyakhetarpal
Copy link
Member

@agriyakhetarpal agriyakhetarpal commented Oct 9, 2023

Description

This PR uses QEMU to build images for aarch64/arm64 platforms besides amd64, and fixes the error in #3316 (comment) about an improper tag for one of the images by using a build-matrix setup to create images. Related to #3312.

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist:

  • No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python run-tests.py --all (or $ nox -s tests)
  • The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Oct 9, 2023

I opened this PR from @arjxn-py's fork because that repository has credentials for Docker Hub. Currently, the ALL and the JAX images are going to fail (logs) because conda fails to find a suitable version of jax for which we might need to bump the versions or relax the requirements (0.4.18 has aarch64 wheels and is import-able). I will open a separate issue for it soon because it looks like we might need to update them anyway for GPU support on ARM-based macOS machines and possibly on Windows, see the relevant discussion on #3371 and #3423

@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (774d56d) 99.58% compared to head (3966a61) 99.58%.
Report is 40 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3430   +/-   ##
========================================
  Coverage    99.58%   99.58%           
========================================
  Files          256      256           
  Lines        19998    20003    +5     
========================================
+ Hits         19915    19920    +5     
  Misses          83       83           

see 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@agriyakhetarpal
Copy link
Member Author

So I did some digging and found this: jax-ml/jax#13608 (comment) which sheds some info. One needs to build jaxlib from source to run it inside a Docker container on M-series macOS systems, which is pretty overkill because arm64 macOS wheels for jaxlib have been widely available and PyBaMM can of course use the Jax solver without the existence of the build-time requirements, which are (or rather, were) the main cause of our tricky installation procedure. Should we exclude the JAX and ALL images from arm64 builds, and document this exclusion with a short note somewhere in the Docker installation guide?

@agriyakhetarpal agriyakhetarpal requested review from arjxn-py and Saransh-cpp and removed request for arjxn-py October 10, 2023 01:22
@agriyakhetarpal
Copy link
Member Author

I realised that specifying the platforms in a matrix might overwrite images depending on which image and platform build gets completed (and thereby pushed) first. For now, adding conditional steps looks like an apt solution to control what images are built and pushed.

@arjxn-py
Copy link
Member

I realised that specifying the platforms in a matrix might overwrite images depending on which image and platform build gets completed

Are these images compatible with other platforms as well?

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Oct 10, 2023

Are these images compatible with other platforms as well?

Yes, pulling the image makes things work natively since it auto-detects my platform (so I don't have to specify --platform linux/amd64, which OTOH made things slower by running under QEMU).

P.S. I actually forgot to un-set push: true (sorry!), so the latest images triggered from the workflow on this branch from your fork ended up being pushed to Docker Hub because your repository also had the login credentials set up. They have both arm64 and amd64 builds. The error with jax not being importable is also fixed if I use the latest version of jax, i.e., 0.4.18. I will open an issue about that soon as mentioned above.

Copy link
Member

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing, thanks, @agriyakhetarpal! I'll request @arjxn-py's review before merging this.

.github/workflows/docker.yml Outdated Show resolved Hide resolved
Copy link
Member

@arjxn-py arjxn-py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agriyakhetarpal, looks awesome.
But when i see the images at docker hub, it looks like that the latest image is getting updated instead of all image.

@agriyakhetarpal
Copy link
Member Author

Thanks @agriyakhetarpal, looks awesome.
But when i see the images at docker hub, it looks like that the latest image is getting updated instead of all image.

Yes, this workflow when merged will update the images and fix that issue (already identified in #3316 (comment))

@arjxn-py
Copy link
Member

Yes, this workflow when merged will update the images and fix that issue

Best then, happy to go forward with this 🚀

@agriyakhetarpal
Copy link
Member Author

Thanks for the review @arjxn-py; the workflow won't run in forks now, but could you un-set the secrets in your repository settings just in case? It would be good for security reasons, say, lest a password breach occurs. No repository outside this one should have them to minimise chances of unauthorised access.

@arjxn-py
Copy link
Member

arjxn-py commented Oct 18, 2023

could you un-set the secrets in your repository settings just in case?

Sure I'll do that.
Edit: Done.

@Saransh-cpp Saransh-cpp merged commit 2c75354 into pybamm-team:develop Oct 20, 2023
33 checks passed
@agriyakhetarpal agriyakhetarpal deleted the docker-images-fix branch October 21, 2023 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants