-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GitHub Action runners to builder #422
Comments
Honestly don't think running our own is a good idea, they are a way bigger surface than any other CI we could run. |
I think we can make it reasonably secure with nix-community/srvos#50 |
Sorry but I guess I have a strong opinion on this, very uncomfortable with us hosting github actions runners vs. probably anything else. |
Maybe we should talk about our security posture as a whole. Here are some things that I see: a. Hydra is running as root. I wouldn't be surprised if the Perl code had a bunch of security holes allowing escalation given that nobody is super fluent in it. So we already rely on trust quite a bit. Most of our system is one kernel elevation exploit away from being pwned. (as is the main Hydra by the way). We could eliminate (c) by forcing users that want to publish to the shared cache to use our hosted GitHub Actions runner. I don't have a really good answer for (a) and (b). Maybe there are some mitigations that we can put in place? |
Yeah the current situation probably isn't the best but I see self hosting actions as biting off more than we can chew.
IIRC we started with nixpkgs hydra, switched to the upstream hydra flake, switched back to nixpkgs hydra. At the moment we're running it for three projects, hard to say that it's worth the hassle. I'd just drop hydra in favour of buildbot and do what we can to help those projects move over.
Afraid I don't understand this? |
Can you expand a bit on the issue relative to the other options? Is it the maintenance overhead? |
We can have the node set up with |
I have to agree with both sides in some points. Github ActionsI think there are some further hardenings required to the github actions runner service to bring it on the same level as our sandboxed hercules builds (nix-community/srvos#50). After that I don't think it is inherently more insecure than our nix builds. HydraWe currently do not build hydra pull requests, which would limit the attack surface to people having a repo explicitly. I do not see anything running as root though. This might be have been the case historically. There could be probably more hardening done for individual services though. Cachix keyHowever after that removing the cachix key from our github org would give us some security benefits (also this might be a regression for macOS support where do not have any builder). FODThe network access for fixed-output derivation is far from ideal. I don't think it could actually be used to compromise other builds but it could be used to send out spam or DDoS other services. I think as a first step we should configure something like squid to at least filter some tcp ports - I don't think we need udp support at all for our fetchers. |
A sub-point in that security posture is that if a user is given "trusted-user" probably can escalate access because they can run arbitrary post-build-hooks as the nix-daemon user ID (root). Not sure if Hydra is given that access? |
Yeah, it seems rather complicated and fragile with a dependency tree unlike anything else we're running. Also unlike hercules and hydra wouldn't we be allowing PR builds from anyone on our infra? If so that doesn't seem like a good idea.
Isn't cache poisoning still an issue with this? (same as the other CI systems I guess except it builds PRs and we'd maintaining it instead of github?) |
Should we look at isolating the CI systems from the hosts nix store and hosting separate caches for projects instead of the everyone using one big cache? |
If you do not allow users to control the nix daemon that is used to build packages they can only send derivations and the result is directly uploaded to the binary cache. This is different from how current actions currently work where they could modify packages and then push them to the cache in github actions. |
Isolating the host store from the CI store might be worth while but also probably requires a bit of trial and error (maybe a containerized nix-daemon?). At least in theory I don't not see how cache poisoning can be done without an exploit. It would be great if we could somehow keep one cache because it makes us more efficient as a community even if we have to put more thoughts into how we can design it in a secure way. |
I've thought about something like this a couple of times, might be interesting to pursue.
https://github.com/zhaofengli/attic might be something we could use when it matures. So the token should really be removed but I don't think that we should self host github runners. For cached builds we already have hercules and hydra, we could set up buildkite and We can't really do anything about caching darwin builds without setting up hardware and at the moment there doesn't seem to be darwin equivalents of our current methods of managing deployment, secrets, etc so we'd be managing it manually anyway. I wouldn't say it's a good solution but I don't really see an issue with projects that want to stay on actions needing to use their own cachix cache. Perhaps as a compromise we could have another untrusted shared cache just for actions, either cachix or something we host ourselves? |
Aside from the runner discussion, the architecture that I found works best with Nix is to have a central machine, with remote builders attached to it. Everybody should be on the same network because the nix daemon protocol is sensitive to latency. The remote builders have a So my idea was that we put the GitHub Actions runners, Hydra and Hercule agents on that central host and let the remote builds dispatch the jobs in the short term. Remove the shared Cachix auth token from GitHub so we get this quick win. I think we also want to build a long-term plan but I don't know what it looks like yet exactly. It probably involves hardening Nix itself or wrapping it in sandboxes. So back to the initial topic, I don't understand why Buildkite is OK, but GitHub Actions isn't. Both are packaged and work fine. GitHub Actions has a bit more security because we're able to hide the join token, and I believe that it also benefits from GitHub's SRE proactive measures to combat abuse. Both are proprietary. Buildkite started restricting their free plan. And GitHub Actions integrates the best with GitHub of course. This is what I see, but maybe I'm missing some data? |
The token needs to be removed but that does not mean that we have to have a self hosted actions runner. It is a ongoing maintenance and security burden that I don't think is really comparable to anything else we're running, a moving target, I don't think it's much of a stretch to imagine we end up having to maintain it in nixpkgs if it gets neglected.
Once we start with self hosted actions I think we're basically making a long term commitment to the org.
Buildkite module, packaging, etc is very simple compared to actions runner.
Ah, I wasn't aware of this, I wouldn't have suggested it.
Does hercules support this? |
I'm sorry, but this is not how we generate consensus or generally should be operating in this project. Everybody has opinions. The way to generate a shared understanding is to explain your position. Instead, I had to work hard here to get the information out, and it's still not super clear what the issue is. At first, I thought it was a security concern, or an open source reason, but apparently, it's related to the packaging. I agree that the package is more complicated than, for example, BuildKite. But then buildbot is fine, which has tens of thousands more LOC. I don't know how you put those things in relationship with each other. We don't have to use GitHub Actions. I'm suggesting it because we have a working module that has been secured reasonably well and will be maintained independently. It's also fine if we disagree. What I don't want is to be beholden to opinions and feelings if they are not accompanied by a rational discussion. |
I agree that my comment you're quoting wasn't helpful but this seems a bit out of proportion? I can have multiple concerns and I had mentioned it previously:
Also from the same comment as above this security concern doesn't seem to have been addressed yet:
I didn't mention LOC? The packaging and module for buildbot (and hercules/hydra as well really) is also fairly simple compared to github actions. The other CI systems don't have the 30 day update window that github has for the runner which what I meant by "moving target" (and which doesn't seems to have be mentioned previously) and I don't think github cares about how complicated it for us to maintain. I think we can say with a reasonable level certainty that buildbot, hercules and hydra will still be functional on nixos 3/6/12 months from now, I don't see that we can say the same about the github runner. |
Alight, I think I made my point. Regarding security, the attack surface is smaller than FOD builds. GHA is running in a systemd unit that is more sandboxed than a FOD build. The GHA runners are regularly updated in nixpkgs, with five maintainers in total. They are also getting exercised quite a bit as several customers use them. I understand the instinctive reaction of thinking this is bad, but if I look at the facts, it's really not that bad. With something like Buildbot and Hydra we have an additional DB to manage. There is also a lot more code total to run and that can go wrong. There are new attack surfaces on the UI frontend bits. I think Buildkite was already put aside. So that leaves us with Hercules CI, Garnix and GHA. |
I don't understand?
So we're okay with running unreviewed PRs on our own hardware?
Doesn't mean that it'll always be updated in the 30 day window or that it'll still even work on NixOS in the future?
What are these customers expectations regarding GHA?
I don't understand this either?
This seems to be the first time Garnix has been mentioned? I'm not familiar with it beyond seeing it on a couple of my PRs against Numtide repos, looks okay I guess? Still mentions that it is "beta" but not sure what that means? I'll try explaining my view another way: If we're going to start pushing the org onto self hosted GHA (so they can still have cached builds and we can remove the shared token), I want to be able to say to the org that it will actually be a reliable service. I don't see that I can say that, as we've no guarantee that we get the updates done inside the 30 day window or that we can even get the updates to build and run at all with NixOS/Nixpkgs. We're basically just hoping that github doesn't screw us? I am looking at if we can use the upstream binary via some workaround, fhsenv, container, etc which may mean this becomes less of an issue but so far all I've done is skim through some stuff. |
I don't understand where the fear and uncertainty are coming from. We depend on GitHub not to screw us on so many dimensions, and the platform has been stable for us isn't it? We have multiple maintainers of the package that can react to the 30 days window; it's used in production. FODs are already running arbitrary code on our machines. That being said, I will stop pushing for this. One thing I 100% agree on is that the infra team should not take on more infrastructure than they can manage. |
We're using https://github.com/numtide/srvos/blob/master/roles/github-actions-runner.nix in a number of places now, maybe the community could also benefit from having faster CI by pushing the builds to permanent machines?
Eg: nix-community/nix-vscode-extensions#4
The text was updated successfully, but these errors were encountered: