Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image workflow fails on the master branch #191

Open
IshaanDesai opened this issue Feb 6, 2024 · 10 comments
Open

Docker image workflow fails on the master branch #191

IshaanDesai opened this issue Feb 6, 2024 · 10 comments

Comments

@IshaanDesai
Copy link
Member

IshaanDesai commented Feb 6, 2024

Due to changes in https://github.com/precice/python-bindings/blame/develop/.github/workflows/build-docker.yml#L36-L44 the workflow to update the Docker image is triggered on the master branch. This workflow fails on the master branch: https://github.com/precice/python-bindings/actions/runs/7799939617/job/21271746402 because it tries to build the Docker image using the precice/precice:latest image, which does not have user precice. Previously this did not happen because the workflow was only run for the develop branch.

The question now is, should we change something in the Docker recipe of the bindings or of preCICE to address the error, or shall we just build the Docker image of the bindings always on precice/precice:develop.

@MakisH
Copy link
Member

MakisH commented Feb 6, 2024

What the actual error is

Triggered from a push to the master branch, the build-and-release-docker-image workflow is attempting to install the bindings as a user (precice), but the user does not exist:

ERROR: failed to solve: process "/bin/bash -c python3 -m pip install --user --upgrade pip" did not complete successfully: unable to find user precice: no matching entries in passwd file

Why this is triggered now

The changes in #186 only enabled the workflow to use the correct preCICE Docker image tag when building on master and made a few more things configurable when running the workflow manually. They also enabled executing the workflow on master, as we (may) need such a Docker image to exist for some use cases of the system tests.

So, while this only appears now, it was always an issue that was not triggered. The changes of @valentin-seitz are correct, provided that both images are consistent (keep reading).

It was not triggered so far, because it would only be triggered on master when the workflow changes would be merged into master.

What triggers this error

Indeed, the problem here is the current inconsistency in the setup between the precice/precice:latest and the precice/precice:develop images on Dockerhub.

These originate from this GitHub Actions workflow of preCICE:

We do have plenty of images we eventually need to clean up. They were so far serving different purposes, but we need to synchronize them to cover more use cases. I summon @fsimonis for that.

How to proceed

  1. One solution is to add a precice user in the released Docker image of preCICE.
  2. Another solution is to not install in the user directory. Instead, use a virtual environment.
  3. Finally, we could just remove the on: push: master and "hide" the problem again. @valentin-seitz is there now actually any use case where we need the bindings to be built on master? Because we now don't fetch containers from other repositories anymore.

In the system tests Dockerfile (which is a multi-stage build), we manually create a user (named precice). This is mainly needed for the OpenFOAM cases and potentially any other solvers that assume that a (non-root) user exists.

@fsimonis
Copy link
Member

fsimonis commented Feb 6, 2024

The images serve different purposes:

  • precice/precice:develop is based on ci image, which comes with a bunch of tools and user precice, useful to test bindings without having to compile precice from scratch.
  • precice/precice:latest and version tags are based on a clean ubuntu and are for user-customisation.

Another solution is to not install in the user directory. Instead, use a virtual environment.

This is the way to go. pip on Ubuntu 24.04 doesn't allow installing via pip even when passing --user. venv is a requirement now.

@MakisH
Copy link
Member

MakisH commented Feb 6, 2024

* `precice/precice:latest` and version tags are based on a clean ubuntu and are for user-customisation.

Do we know of any users that use these, what they do with them, and whether it would actually make more sense to provide a user there?

The naming is definitely confusing. I would always expect two images with same names but different tags to only differ in (time-dependent) versions.

@fsimonis
Copy link
Member

fsimonis commented Feb 8, 2024

Do we know of any users that use these, what they do with them, and whether it would actually make more sense to provide a user there?

I am not aware of any users, it was a request at some point. By whom, I don't know.

Personally, I don't see a big value as we provide a debian package. A reproducible image specifies the version of the base image. So, the user can copy&paste the url of the matching deb image, or even checkin the deb package into the source tree.

The naming is definitely confusing. I would always expect two images with same names but different tags to only differ in (time-dependent) versions.

I agree with all points. What's the solution though?
A precice/precice:tag should really be some kind of release image to distribute.
The :develop tag was an idea to make bindings easier to test.

Maybe we need some kind of tooling or develop image which installs precice in an ubuntu ci image for all versions.

Some ideas:

  • precice/precice-dev
  • precice/precice-devel
  • precice/precice-tooling
  • precice/ci-precice
  • precice/base-develop

@MakisH
Copy link
Member

MakisH commented Feb 8, 2024

Maybe we need some kind of tooling or develop image which installs precice in an ubuntu ci image for all versions.

So, something like the current precice:develop, but better named and advertised among developers. I like the idea.

Some ideas:

  • precice-dev / precice-devel: It still somehow refers to develop, which could still be misunderstood as the development version of preCICE if one does not have full context.
  • precice-tooling: I would not understands this as "tooling".
  • ci-precice: To be confused with ci-images.
  • base-develop: I like this the most, as developers know what this is, but no user could think that this is something for them.

@BenjaminRodenberg
Copy link
Member

A remark from my side: As a first step I would suggest to change the following section of the code:

run: |
if [[ '${{ env.BINDINGS_REF }}' == 'master' ]]; then
echo "PRECICE_TAG=latest" >> "$GITHUB_ENV"
echo "TAG=latest" >> "$GITHUB_ENV"
echo "Building TAG: latest"
else
echo "PRECICE_TAG=${{ env.BINDINGS_REF }}" >> "$GITHUB_ENV"
echo "TAG=${{ env.BINDINGS_REF }}" >> "$GITHUB_ENV"
echo "Building TAG: ${{ env.BINDINGS_REF }}"
fi

We did a pre-release that succeeded and only when we merged into master we ran into a failure. This is too late. A pre-release exists exactly for the purpose of testing the packaging and distribution workflow and this is currently not possible. Before changing the way how things are organized and named I think we should make sure that the problem Ishaan desribes above can be reproduced and this part can be tested without doing an actual live release.

@BenjaminRodenberg
Copy link
Member

  • One solution is to add a precice user in the released Docker image of preCICE.

  • Another solution is to not install in the user directory. Instead, use a virtual environment.

For a long term solution we should go for both. One does not exclude the other.

The venv approach seems to be the way to go from the python perspective as @fsimonis mentions above. But this only cures the symptom.

Not having the user precice in the released Docker image but in the develop image is confusing and requires us to implement solutions for edge cases. Moreover, I think it is extremely risky, because if something breaks, it breaks during a release which is bad. If we need precice in develop, why not also make precice available everywhere? The alternative is to entirely remove the user precice from all images, but I cannot judge whether this is possible or not.

@fsimonis
Copy link
Member

fsimonis commented Feb 13, 2024

ci-precice: To be confused with ci-images.

The image we need is precice/ci-ubuntu-20XX + preCICE installation, so precice/ci-precice is not only spot on in terms of naming, but it also keeps the naming of ci-images consistent.

Symptom or not, there are two issues that need to be resolved:

  1. Provide consistent images for bindings/adapter CIs
  2. Not messing with the distro-managed python state and use a venv instead, which is enforced by the pip version shipped with Ubuntu 24.04 LTS. Full story in PEP 668. This is essentially another issues, so we can stop discussing this here.

I propose the following:

  1. We remove precice/precice:develop and use the image purely for releases.
  2. We add precice/XXX and create tags for releases, latest release, and nightly develop versions. These images are based on ci-ubuntu-2204 until we feel like upgrading and install preCICE on top, preferably preCICE Release with Debug log enabled.

For some reason, DockerHub only provides per-image statistics and no per-tag statistics, so we cannot differentiate between our CI pulls and people actually using the release images. At the time of writing, the image is at 15101 pulls total. Thus, separating release from developer images would allow us to track meaningful downloads of the release images.

Workflows exists and need to be moved around a bit.

So, we only need a name for this image precice/XXX. I'll create a poll in our dev channel.

@MakisH
Copy link
Member

MakisH commented Feb 13, 2024

Transferring some points from the discussion in the dev channel.

@IshaanDesai is wondering what is wrong with precice:develop. The problem is that precice:latest and precice:develop are completely unrelated images. Tags are meant to clarify different variants/versions of one image, not of two different things. It is like "Bob as a child, Bob as an adult, Bob as a student, Bob as a cook", but not "Bob Presley" and "Bob Travolta". This is the original cause of confusion, that the two images are unrelated, but they both somehow end up providing some kind of preCICE. So, the name needs to change.

@BenjaminRodenberg asks "Why can we not use the same dockerimage for the release and the CI?". I like this direction, and we essentially need to restructure the images so that the CI is based on the same base image as the release. I would imagine that the base image builds and installs the Debian package, based on some tag. @fsimonis What additional tool or other change wouldn't we be able to install on top of that?

@BenjaminRodenberg also suggested keeping the CI images under a separate namespace/organization, such as precice/precice and ci-precice/precice. I also like this direction (on top of the previous suggestion), as it would give a clean view of the user-facing images.

We could then have precice/precice:develop as a user-facing image that releases a develop build of preCICE, built in release mode, and with no additional bells and whistles.

@fsimonis raised the point that we currently don't even know if people are using the released images. To that, I think the main question is whether we want people to use those. If yes, we can advertise them and some will do. He also raised the fact that we don't have statistics. This is important, but I would not see it as the primary goal here. Our Debian packages are also used by CI workflows.

@BenjaminRodenberg suggested that we use the GHCR for hosting the CI images. This is essentially the same argument as using a separate namespace. In any case, not pushing to a separate image repository helps with the bandwidth.

Before continuing, let's try to agree on what images we actually need for which use cases. The system tests don't need any, they are now decoupled from what each repository is doing.

A. precice/precice:latest could be a production image that users could integrate in their workflows. I do not have any user in mind that currently needs this. If they use Docker, they probably build their own image anyway, based on their own requirements. The need for this is, therefore, hypothetical. Remember that we are a library that needs to be linked to, not a microservice.
B. precice/precice:develop (where :develop could also be :nightly) could be a preview for the same audience as (A).
C. precice/xxx:latest could be a testing image for the Python bindings that exposes the latest preCICE released version, installed. What else is needed? This image is currently needed here.
D. precice/xxx:develop could be a preview image for the same audience as (C).

Regarding the name, if we want xxx to be based on ci-images, and based on the name these images currently have, then precice/ci-ubuntu-2204-precice and similar would be the natural extension. With the assumptions that (i) we don't go too many layers deeper and that (ii) we don't provide images we don't actually use anywhere.

Don't forget that we also have the GitHub Action: https://github.com/precice/setup-precice-action This would be a better starting point for CI.

Overall, I have the feeling that we have (because of exploration) too many options at the moment. We need to merge and simplify, but first we need to map the needs and use cases.

@fsimonis
Copy link
Member

This issue morphed into a medusa of a discussion. I'll open separate issues:

"The problem is that precice:latest and precice:develop are completely unrelated images."

precice/precice#1947

"Why can we not use the same dockerimage for the release and the CI?"

precice/ci-images#6

@BenjaminRodenberg also suggested keeping the CI images under a separate namespace/organization, such as precice/precice and ci-precice/precice

precice/ci-images#7

precice/precice:develop (where :develop could also be :nightly) could be a preview for the same audience as (A).

precice/precice#1948

The venv story

#197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants