Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use in k3s alongside normal OCI images #136

Closed
MagicRB opened this issue Apr 14, 2024 · 13 comments
Closed

Use in k3s alongside normal OCI images #136

MagicRB opened this issue Apr 14, 2024 · 13 comments

Comments

@MagicRB
Copy link

MagicRB commented Apr 14, 2024

I've managed to pull apart this flake and make it fit into NixNG and the rest. I have figured out that

plugins."io.containerd.grpc.v1.cri".containerd = {
  snapshotter = "overlayfs";
  disable_snapshot_annotations = true;
};

makes normal OCI images work in kubernetes, while

plugins."io.containerd.grpc.v1.cri".containerd = {
  snapshotter = "nix";
  image-service-endpoint = "unix:///run/nix-snapshotter/nix-snapshotter.sock";
  disable_snapshot_annotations = false;
}

makes nix:0/nix/store/... image refs work. Is it possible to have both work at the same time inside one kubelet?
I know that on the command line I can specify --snapshotter or CONTAINERD_SNAPSHOTTER and set them to nix which will again make nix native containers work, but I'm not sure how to specify that in kubernetes.

@MagicRB
Copy link
Author

MagicRB commented Apr 14, 2024

I found this in containerd containerd/containerd#6899, it still doesn't solve my problem though as I need to run both OCI and nix native images in one Pod. Which I don't think is possible as of now.

A possible way to hack this together would be container (not pod) annotations and then patching this part of containerd:
https://github.com/containerd/containerd/blob/main/pkg/cri/server/container_create.go#L189
(the PR I linked earlier seems to have completely vanished from containerd afaict)

@MagicRB
Copy link
Author

MagicRB commented Apr 14, 2024

plugins."io.containerd.grpc.v1.cri".containerd = {
  snapshotter = "nix";
  image-service-endpoint = "unix:///run/nix-snapshotter/nix-snapshotter.sock";
  disable_snapshot_annotations = false;
};

plugins."io.containerd.transfer.v1.local".unpack_config = [
  {
    platform = "${GOOS}/${GOARCH}";
    snapshotter = "nix";
  }
];

proxy_plugins.nix = {
  type = "snapshot";
  address = "/run/nix-snapshotter/nix-snapshotter.sock";
};

Even with that config I cannot seem to get k3s to cooperate for any containers, nix or OCI, I keep getting

failed to create containerd container: failed to create snapshot: missing parent "k8s.io/8/sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68" bucket: not found

Which is also exactly what I get with the runtimeSnapshotter support in containerd.

@MagicRB
Copy link
Author

MagicRB commented Apr 14, 2024

I'm on a wild debugging streak right now, sorry for detracting this issue but I can't make heads or tails of this. The missing parent issue was resolved by just resetting the whole k3s cluster due to some weird state issue. Now I'm hitting that even with --log-level debug I only see log lines beginning with [image-service] never [nix-snapshotter]. Furthermore the snapshot directory gets created correctly but all the directories are empty, I also straced the nix-snapshotter process and I don't see any mount syscalls happening.

@MagicRB
Copy link
Author

MagicRB commented Apr 15, 2024

I haven't tested this theory, but is it possible I'm running into #129 ? I will test pre loading the images later today that did not help, tried with nix2container load but it still won't exec properly

@elpdt852
Copy link
Collaborator

Is it possible to have both work at the same time inside one kubelet?

Yes nix-snapshotter is backwards compatible, so you should be able to resolve and run regular OCI images and nix-snapshotter images in the same pod. The same image can also have interleaved regular OCI layers & nix-snapshotter layers.

For the server-side, we just configure the kubelet directly via:
https://github.com/pdtpartners/nix-snapshotter/blob/main/modules/nixos/tests/kubernetes.nix#L36

Note that Kubernetes doesn't need to know about nix-snapshotter other than it being an CRI image service. When the kubelet eventually asks containerd to spawn a container, containerd knows which snapshotter to use based on plugins."io.containerd.grpc.v1.cri".containerd.snapshotter:
https://github.com/pdtpartners/nix-snapshotter/blob/main/modules/common/containerd.nix#L66-L68

Using --snapshotter or CONTAINERD_SNAPSHOTTER with either ctr and nerdctl is purely client-side when not using the GRPC interface directly (as opposed to using the CRI interface via crictl or what the kubelet does).

whole k3s cluster due to some weird state issue

Note that rootless k3s doesn't support nix-snapshotter yet: #120, but rootful k3s is working.

I also straced the nix-snapshotter process and I don't see any mount syscalls happening.

Do you have the nix CLI available in the PATH for the nix-snapshotter process?

I'm on a wild debugging streak right now

It would help to have some kind of reproduction case, it seems like there's many moving pieces while you're making it fit with NixNG.

@MagicRB
Copy link
Author

MagicRB commented Apr 15, 2024

Do you have the nix CLI available in the PATH for the nix-snapshotter process?

Yes, I'll verify. verified

It would help to have some kind of reproduction case, it seems like there's many moving pieces while you're making it fit with NixNG.

There is way too many part moving yes, its hard for me to produce a reproducible example. I am trying to find something in the logs, pointing me to the bit i missed. But so far im having no luck.

It's as if the nix-snapshotter code path does even trigger, it does end up in its nix directory so it does go through there. But just passes straight down to the backup overlayfs.

Note that rootless k3s doesn't support nix-snapshotter yet: #120, but rootful k3s is working.

I'm running rootful, always have been. The problem seems to have arisen when I added nix-snapshotter to an existing containerd and k3s.

Both of the links you've provided I've already incorporated. And I've verified in the logs that both containerd and the kubelet know and use nix-snapshotter. I've verified k3s and therefore containerd correctly call into nix-snapshotter. So it must be somewhere in it, where it doesn't actually trigger. I'm trying to make sense of how nix-snapshotter works so I can figure out why it doesn't seem to trigger at all in my environment.

@hinshun
Copy link

hinshun commented Apr 15, 2024

Make sure you’re using this containerd: https://github.com/pdtpartners/nix-snapshotter/blob/main/modules/flake/overlays.nix#L8.

If you aren’t seeing mounts, it must be having trouble either:

  • Retrieving the annotations from the image manifest
  • Failing to create the gcroots for the nix packages
  • Bug in nix-snapshotter

If you could provide a gist with containerd, nix-snapshotter logs, as well as “kubectl describe pod xyz”, that’ll help as well.

@MagicRB
Copy link
Author

MagicRB commented Apr 16, 2024

Ah, I'm using the internal k3s containerd, could that be the culprit? I'll provide the logs later today

@MagicRB
Copy link
Author

MagicRB commented Apr 16, 2024

Oh and after I am done, I'll start a draft of the manual install doc, at least for the k3s rootful situation.

@MagicRB
Copy link
Author

MagicRB commented Apr 16, 2024

I found the bug. I wasn't using the k3s from this flake. I'm trying to fix that currently. It's a deeper bug in NixNG somewhere. For some reason assert (pkgs.k3s == cfg.package); foo fails, where

package = mkPackageOption pkgs "k3s" {};

as to why it fails, is beyond me. Going through pkgs direct I ge tthe version from the overlay. Going through the option I end up with the default nixpkgs version.

EDIT: fixed, I have a weird thing how I pass through options from a NixOS module to the underlying NixNG module. I just copied the option definition which proceeded to use the pkgs from the overarching NixOS system not the NixNG container system... I am now recompiling k3s.

On a side note, I am working on a NixOS module, which inside a systemd-nspawn spins up a completely self contained instance of NixNG with k3s, postgres and all the rest. The idea there is to have everything in one nice network namespace so it doesn't pollute the host as much.

@hinshun
Copy link

hinshun commented Apr 16, 2024

Yes that makes sense. Sorry I should’ve been more clear, rootless only doesn’t work because it can only use its embedded containerd, I.e. rootful only works with external containerd.

This repo provides overlays for both containerd & k3s so embedded & external all work, but we’re still working on upstreaming these bug fixes.

@MagicRB
Copy link
Author

MagicRB commented Apr 16, 2024

yeah, I got it. I think the take away here is that I need to help with the manual install doc :) and that once you get this running, you CANNOT switch or play in any way shape or form with the snapshotters. I had to reset the state of containerd multiple times. If it was throwing weird errors, I reset it.

While we're here, would there be any interest in the NixNG code? As the author, I would be very happy if I found someone who had an interest in it. I stand behind the fact that distroless is nice, until it isn't. In my experience most things do not work and NixNG is as distroless as it can be.

And thanks for the help, I'll finalize my modules and then draft the docs for this little journey :)

@MagicRB MagicRB closed this as completed Apr 16, 2024
@hinshun
Copy link

hinshun commented Apr 16, 2024

As nix-snapshotter stabilizes, I’m moving in the direction of upstreaming NixOS modules, Home Manager modules, etc. This repo is only incubating the changes, so overlays won’t be necessary later. So I rather the NixNG repo be the source of truth for nix-snapshotter NixNG modules.

If you can provide reproduction for snapshotter instability, happy to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants