-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multi-arch docker images #399
Conversation
The version of mambaforge appropriate for the architecture that the docker build is running on is built this way. Ref pangeo-data#396
conda-lock also needs to know though. |
/condalock |
.github/workflows/CondaLock.yml
Outdated
- name: Specify which architectures to build lockfiles for | ||
run: | | ||
# space delimited list of architectures to send to conda-lock | ||
echo "BUILD_ARCHS='64 aarch64 ppc64le'" > ${GITHUB_ENV} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it'd be best to just start with one additional? aarch64 and add others down the line if need be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottyhq yeah i agree I mostly added both to test what packages are needed to be fixed for ppc64le
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's build just for linux-aarch64
first as mentioned at #396 (comment)
.github/workflows/CondaLock.yml
Outdated
conda-lock lock --mamba -k explicit -f environment.yml -p linux-64 | ||
../generate-packages-list.py conda-linux-64.lock > packages.txt | ||
for ARCH in ${BUILD_ARCHS}; do | ||
conda-lock lock --mamba -k explicit -f environment.yml -p linux-${ARCH} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a good time to switch to a unified lockfile and actually add conda-lock to the base
environment. (Our current usage of conda-lock pre-dates the unified lockfile format now shown here https://github.com/conda-incubator/conda-lock#basic-usage). Installing the environment with conda-lock would actually recognize the host platform and choose the correct set of packages:
conda-lock lock -f environment.yml -p linux-64 aarch64
conda-lock install -n pangeo-notebook
the drawback is that if someone just has conda or mamba installed they dont recognize the new unified lockfile format... so you could not do conda env create -f conda-lock.yml
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottyhq ok that's pretty cool and we clearly should just move to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking that we might need to update from conda-lock=1.4 to conda-lock=2.x first at
- conda-lock=1.4 |
Also FYI, mamba
doesn't support installing from the unified conda-lock.yml
, but micromamba
does according to https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html#conda-lock-yaml-spec-files. See also mamba-org/mamba#1884 (comment)
Makefile
Outdated
conda-lock lock --mamba -k explicit -f environment.yml -p linux-aarch64; \ | ||
../generate-packages-list.py conda-linux-aarch64.lock > packages-aarch64.txt; \ | ||
conda-lock lock --mamba -k explicit -f environment.yml -p linux-ppc64le; \ | ||
../generate-packages-list.py conda-linux-ppc64le.lock > packages-ppc64le.txt; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above on unified lockfile. but i still really like a simple text file as well though! maybe we create a artifacts
subdirectory to organize files not actually used by docker build?
Makefile
Outdated
conda-lock lock --mamba -k explicit -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-aarch64; \ | ||
../generate-packages-list.py conda-linux-aarch64.lock > packages-aarch64.txt; \ | ||
conda-lock lock --mamba -k explicit -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-ppc64le; \ | ||
../generate-packages-list.py conda-linux-ppc64le.lock > packages-ppc64le.txt; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I have rarely used this Makefile and am always just building things via GitHub Actions ;) I've put in a little more time to embracing docker buildx
and think it might be the right tool to evolve this repo. in particular the bake
tool for several related images https://github.com/scottyhq/pangeo-buildx#usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottyhq I completely agree - I love that repo and think that's what this should be :D What do you think needs to happen to complete that transition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Mainly just more time :) I abandoned it for a bit with the "dont fix what isn't broken" rationale :) but it's also nice to keep evolving so I'll try to open a PR here this week and would love your review
@yuvipanda @weiji14 I'm thinking we should go ahead and merge this. Likely there will be some things to work out, but linux and mac ARM support will be a nice addition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuvipanda @weiji14 I'm thinking we should go ahead and merge this. Likely there will be some things to work out, but linux and mac ARM support will be a nice addition
Yep, agree with going ahead with this. Let me just test running conda-lock
locally on some of these images to check that there are no serious issues. Also a few minor suggestions below 👇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, tested things locally, and locking on base-notebook
and pangeo-notebook
worked, but ml-notebook
and pytorch-notebook
didn't. Maybe we just do base-/pangeo-
for now in this PR, and I can do a follow-up one for ml-/pytorch-
?
.github/workflows/CondaLock.yml
Outdated
cd ml-notebook | ||
conda-lock lock -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-64 -p linux-aarch64 -p osx-64 -p osx-arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensorflow doesn't have a linux-aarch64 package yet - conda-forge/tensorflow-feedstock#136
Could not lock the environment for platform linux-aarch64
Could not solve for environment specs
The following package could not be installed
└─ tensorflow >=2.15.0 cuda120* does not exist (perhaps a typo or a missing channel).
There are osx-64/osx-arm64 packages on conda-forge, though only CPU builds. Not sure if the osx-arm64 build actually uses the Apple Sillicon GPU.
.github/workflows/CondaLock.yml
Outdated
cd pytorch-notebook | ||
conda-lock lock -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-64 -p linux-aarch64 -p osx-64 -p osx-arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytorch does have linux-aarch64/osx-64/osx-arm64 packages on conda-forge, but we set a cuda120
pin in the environment.yml file so locking fails:
Could not lock the environment for platform linux-aarch64
Could not solve for environment specs
The following packages are incompatible
├─ pytorch >=2.1.0 cuda120* is not installable because it requires
│ └─ __cuda, which is missing on the system;
└─ torchvision >=0.16.1 cuda120* does not exist (perhaps a typo or a missing channel).
Co-authored-by: Wei Ji <[email protected]>
Co-authored-by: Wei Ji <[email protected]>
- name: Set up QEMU | ||
uses: docker/setup-qemu-action@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will need to use virtualization, until someone can pay us to use native arm64 runners (https://github.blog/changelog/2024-06-24-github-actions-ubuntu-24-04-image-now-available-for-arm64-runners) 🙂
Added some commits to update the Makefile and Actions to actually build for multiple platforms, and it seems to be working! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-approving since this is mostly ready, but have a look at the suggestions first. We'll probably need to disable osx-64
for ml-notebook
and pytorch-notebook
because of the (Linux-only) CUDA pin set in the environment.yml file.
- name: Set up QEMU | ||
uses: docker/setup-qemu-action@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will need to use virtualization, until someone can pay us to use native arm64 runners (https://github.blog/changelog/2024-06-24-github-actions-ubuntu-24-04-image-now-available-for-arm64-runners) 🙂
Co-authored-by: Wei Ji <[email protected]>
base-notebook
andpangeo-notebook
for:ppc64le (POWER8/9)Ref #396