Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(linux32): do make Javascript Actions work #1790

Closed
wants to merge 1 commit into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Sep 14, 2024

I propose this alternative to 9c26185 (ci: use regular action versions for linux32 job, 2024-09-12), keeping the updates to Javascript Actions.

The benefit is that it keeps the 32-bit container used in the linux32 job clean of any 64-bit libraries so that we don't accidentally end up testing 64-bit stuff without wanting it.

For good measure, I also reported this problem with that deprecation at actions/upload-artifact#616 even though I know that the GitHub Actions team saw a headcount-losing reorg recently and therefore I do not really expect that they have any bandwidth to help with this. So this work-around is the best we can do for now, I believe.

Cc: Jeff King [email protected]

In February 2023, older `actions/upload-artifact` were deprecated:
https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/
This was recently followed by brown-outs.

However, the `linux32` job relied on those, as there are well-documented
problems (see actions/runner#2115 for example)
running modern, Javascript-based Actions in 32-bit only containers.

To get the CI builds to work again, a work-around was implemented in
https://lore.kernel.org/git/[email protected]
to let the 32-bit container make use of the 64-bit node 20 provided by
the Actions runner.

This, however, runs the risk of using 64-bit executables when we
purposefully chose a Docker image that only contains 32-bit bits and
pieces so that accidental use of 64-bit libraries or executables would
not happen.

Let's go about this the other way round instead, by overriding the amd64
version of node 20 the Actions runner provides with an x86 one (which is
"officially unofficial" by virtue of being hosted on
unofficial-builds.nodejs.org).

This allows us to stop using the now-deprecated versions of
`actions/checkout` and `actions/upload-artifact` before these Actions
became Javascript-based Actions.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho self-assigned this Sep 14, 2024
@dscho
Copy link
Member Author

dscho commented Sep 14, 2024

/submit

Copy link

gitgitgadget bot commented Sep 14, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1790/dscho/use-x86-node-in-linux32-v1

To fetch this version to local tag pr-1790/dscho/use-x86-node-in-linux32-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1790/dscho/use-x86-node-in-linux32-v1

Copy link

gitgitgadget bot commented Sep 14, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Sat, Sep 14, 2024 at 12:42:39AM +0000, Johannes Schindelin via GitGitGadget wrote:

> To get the CI builds to work again, a work-around was implemented in
> https://lore.kernel.org/git/[email protected]
> to let the 32-bit container make use of the 64-bit node 20 provided by
> the Actions runner.
> 
> This, however, runs the risk of using 64-bit executables when we
> purposefully chose a Docker image that only contains 32-bit bits and
> pieces so that accidental use of 64-bit libraries or executables would
> not happen.

How big a risk is this? In my experience with multiarch systems, the
difficulty is the opposite: convincing the non-base toolchain to work at
all, rather than using it accidentally. Especially as the solution in my
patch is not configuring apt for multiarch support, where you could
accidentally install gcc:amd64 versus gcc:i386. Even if we were to
accidentally bring in the cross-platform compiler via the
gcc-10-x86-64-linux-gnu package, you'd have to point $(CC) at it
explicitly.

But maybe others have had a different experience?

> Let's go about this the other way round instead, by overriding the amd64
> version of node 20 the Actions runner provides with an x86 one (which is
> "officially unofficial" by virtue of being hosted on
> unofficial-builds.nodejs.org).

I'm not totally opposed to this direction, but I'm a little concerned
about the stability/maintenance of the solution. In particular:

> +        NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz &&

Will this URL work forever? Looking at the release/ directory, it looks
like it should hang around. They have entries going back to 2019 (which
is not all that old, but I suspect that's when they started the build
repository).

The flip side is: will node20 be sufficient for Actions forever? Node16
was already deprecated in Actions in 2023, and it was released in 2021
(looks like Node releases have a 2-year lifespan in general). So node20
takes us to April 2026 or so.

Of course my solution has similar problems. Probably node24 or whatever
comes next will need another glibc bump.

I was mildly surprised that the build you reference didn't run into that
problem, actually. But I double-checked and it appears to run OK in
Xenial; it needs 2.17, according to some "objdump -T" sleuthing. I'm not
sure why that's so different from the official 64-bit builds.

> +        curl -Lo /tmp/node.tar.gz $NODE_URL &&
> +        tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz

This is pretty intimate with how Actions work (both the node20 version
and the "/__e/" magic). It's hard to say if/when that would bite us.

>     For good measure, I also reported this problem with that deprecation at
>     https://github.com/actions/upload-artifact/issues/616 even though I know
>     that the GitHub Actions team saw a headcount-losing reorg recently and
>     therefore I do not really expect that they have any bandwidth to help
>     with this. So this work-around is the best we can do for now, I believe.

I think it's reasonably well known there already. Here are some relevant
issues I came across:

  - https://github.com/actions/checkout/issues/334
  - https://github.com/actions/checkout/issues/1590
  - https://github.com/actions/setup-node/issues/922
  - https://github.com/actions/runner/issues/2906

Most folks are hitting the glibc issue rather than the i386 one, but the
core of the problem is the same.

IMHO the ultimate solution is a statically-linked node binary (you'd
still be relying on the kernel, but that has a very good track record of
userspace compatibility).

-Peff

Copy link

gitgitgadget bot commented Sep 14, 2024

This patch series was integrated into seen via git@388a5f4.

Copy link

gitgitgadget bot commented Sep 14, 2024

This patch series was integrated into next via git@e937339.

Copy link

gitgitgadget bot commented Sep 14, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <[email protected]> writes:

> I'm not totally opposed to this direction, but I'm a little concerned
> about the stability/maintenance of the solution. In particular:
>
>> +        NODE_URL=https://unofficial-builds.nodejs.org/download/release/v20.17.0/node-v20.17.0-linux-x86.tar.gz &&
> ...
>> +        curl -Lo /tmp/node.tar.gz $NODE_URL &&
>> +        tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz
>
> This is pretty intimate with how Actions work (both the node20 version
> and the "/__e/" magic). It's hard to say if/when that would bite us.

Thanks for clearly expressing the uneasiness I felt, which I could
not turn into concrete words, when I saw the patch first time.

Each of these approaches may have its pros and cons, but I somehow
do not see that the newly proposed alternative is 10x better than
what was reviewed and queued already to be worth the effort to
replace it.

Thanks, both.

@dscho
Copy link
Member Author

dscho commented Sep 14, 2024

The Git maintainer seems to have made up their mind about rejecting the argument that the linux32 job should stay clean of 64-bit libraries/executables. So it seems that this "is not productive use of [my] time". Again.

@dscho dscho closed this Sep 14, 2024
@dscho dscho deleted the use-x86-node-in-linux32 branch September 14, 2024 19:51
Copy link

gitgitgadget bot commented Sep 15, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote:

> Each of these approaches may have its pros and cons, but I somehow
> do not see that the newly proposed alternative is 10x better than
> what was reviewed and queued already to be worth the effort to
> replace it.

That's my feeling, too, but I'd reserve final judgement to see Dscho's
response; it's possible I am under-estimating the 32/64-bit confusion
risk.

I'd also note that his patch does not require bumping the distro
version, which would let us continue testing that old version in GitHub
Actions. That might be worth considering.

-Peff

Copy link

gitgitgadget bot commented Sep 15, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <[email protected]> writes:

> On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote:
>
>> Each of these approaches may have its pros and cons, but I somehow
>> do not see that the newly proposed alternative is 10x better than
>> what was reviewed and queued already to be worth the effort to
>> replace it.
>
> That's my feeling, too, but I'd reserve final judgement to see Dscho's
> response; it's possible I am under-estimating the 32/64-bit confusion
> risk.

FWIW, what you said matches my recollection from years ago ;-) back
when I had to deal with that.

> I'd also note that his patch does not require bumping the distro
> version, which would let us continue testing that old version in GitHub
> Actions. That might be worth considering.

Yes, that is true.

Considering that 16.04 has passed its expiration date for standard
support a few years ago, I am not sure how many more years of
practical/unsupported use and testing we would be getting by giving
cycles for the release in CI, though.

Thanks.

Copy link

gitgitgadget bot commented Sep 17, 2024

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi,

On Sun, 15 Sep 2024, Junio C Hamano wrote:

> Jeff King <[email protected]> writes:
>
> > On Sat, Sep 14, 2024 at 10:17:16AM -0700, Junio C Hamano wrote:
> >
> >> Each of these approaches may have its pros and cons, but I somehow
> >> do not see that the newly proposed alternative is 10x better than
> >> what was reviewed and queued already to be worth the effort to
> >> replace it.

Installing lib64stdc++ indeed does not rely on the implementation detail
that `/__e/node20/` contains the Node version used to execute the Actions
in those Docker containers.

Of course, the fact that installing lib64stdc++ (and no other 64-bit
library) "fixes" the 64-bit Node version is also an implementation detail.

Between GitHub Actions' and Node's development speed, I personally would
expect the latter implementation detail to be changed many times over
before the former implementation detail would be changed.

In particular given that mapping the "externals" by any other name than
`/__e/` risks breaking existing GitHub workflows that might make use of
exactly that directory name, I consider the chances for that name change
to be negligible. It probably won't change, ever.

Of course, my favorite solution would be for `actions/runner` to be fixed
so that it detects the situation and uses a 32-bit variant in that case
[*1*].

Business forces work against this wish, though, so I don't see this
happening.

The next best thing, in my mind, is to come up with a solution that is
general enough that other projects could follow this example, which is
what I did.

And yes, the idea of mixing 32-bit and 64-bit things in a container that
was specifically used to only have 32-bit things still does not convince
me, it still looks like a much better idea to either stick with a
32-bit-only container, or to just do away with the complexity of a
container altogether if the environment does not need to be free of 64-bit
anyway (but why did we bother with that in the first place, then?).

Ciao,
Johannes

Footnote *1*: I have added quite a few details about this here:
https://github.com/actions/upload-artifact/issues/616#issuecomment-2350667347

Copy link

gitgitgadget bot commented Sep 17, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

> Of course, my favorite solution would be for `actions/runner` to be fixed
> so that it detects the situation and uses a 32-bit variant in that case
> [*1*].

Yeah, that would be ideal.

> The next best thing, in my mind, is to come up with a solution that is
> general enough that other projects could follow this example, which is
> what I did.

I guess both patterns can be followed if they discover either, and
it made me wonder if this isn't a problem already solved by others,
but at the end of the day, any solution that was good enough and
unblocks us quickly was needed; the ones that were reviewed earlier
have been fast tracked down to 'maint' as of last night.  That does
not mean we won't have to further improve it, of course.  At least
until 32-bit archs no longer matter, that is ;-).

Thanks.

Copy link

gitgitgadget bot commented Sep 19, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Sep 17, 2024 at 02:20:41PM +0200, Johannes Schindelin wrote:

> Installing lib64stdc++ indeed does not rely on the implementation detail
> that `/__e/node20/` contains the Node version used to execute the Actions
> in those Docker containers.
> 
> Of course, the fact that installing lib64stdc++ (and no other 64-bit
> library) "fixes" the 64-bit Node version is also an implementation detail.

Yes, though "provide a 64-bit c/c++ runtime" does not seem all that
exotic. I'd be much more worried about the implicit library version
dependencies that exist, but that is true even on 64-bit systems. As
shown in the issues I linked earlier, lots of people are hitting a
similar problem with containers that have an older glibc.

The real solution there (IMHO) is a statically linked node in the runner
images, but I'm not sure of the reasons they haven't pursued that yet.

In the absence of that, the fact that your solution uses a node build
which (for reasons I'm still not sure I understand) seems to be OK with
an older glibc feels like a compelling reason. In fact, I find that much
more compelling than the risk of 32/64-bit confusion. I think part of
what I was responding to was the focus on that in your commit message.

> In particular given that mapping the "externals" by any other name than
> `/__e/` risks breaking existing GitHub workflows that might make use of
> exactly that directory name, I consider the chances for that name change
> to be negligible. It probably won't change, ever.

Fair enough. Going into this whole problem, I was not clear where /__e/
was coming from. I had thought at first it was something being carried
along by the Actions themselves (and thus action-specific and likely to
change). But it looks like it is part of the runner image and just
mounted into the container volume, so it is a magic keyword that Actions
and runner images both have to depend on.

> Of course, my favorite solution would be for `actions/runner` to be fixed
> so that it detects the situation and uses a 32-bit variant in that case
> [*1*].

Yes, me too (and preferably statically linked :) ).

> And yes, the idea of mixing 32-bit and 64-bit things in a container that
> was specifically used to only have 32-bit things still does not convince
> me, it still looks like a much better idea to either stick with a
> 32-bit-only container, or to just do away with the complexity of a
> container altogether if the environment does not need to be free of 64-bit
> anyway (but why did we bother with that in the first place, then?).

I think 32-bit builds directly inside a 64-bit runner image is a good
way to do the thing you were initially worried about: accidentally using
tools of the wrong type. Doing the whole build and test within the
32-bit container is something we'd want to keep.

The other alternative, which neither of us shown patches for, but which
I mentioned (courtesy of Ed) in the original thread is: do the Actions
outside the 32-bit container, run docker ourselves mounting the repo,
and then build and test inside the container. That's apparently how
libgit2 does it. It sidesteps the issue entirely, as the container never
runs anything external. And it would be applicable to all projects, no
matter what's in their containers (32-bit, old distros, even qemu of
alternate architectures).

I'm not sure how much work it would be to do, though.

Anyway, it sounds like Junio has merged my patches down to get things
moving again. I'm OK if you want to rebase your proposed fix on top of
that.

-Peff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant