-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the nydus-snapshotter by default #267
Use the nydus-snapshotter by default #267
Conversation
This will be used, either by the admin or by our tests, to set the correct debug level we want for the payload we're shipping, including the pre-install one. The default is "false", as usually folks won't need fully verbose logs in production. Signed-off-by: Fabiano Fidêncio <[email protected]>
That was a typo that was there since d1268cf, when the code was moved to this repo. Signed-off-by: ChengyuZhu6 <[email protected]> Signed-off-by: Fabiano Fidêncio <[email protected]>
Signed-off-by: ChengyuZhu6 <[email protected]> Signed-off-by: Fabiano Fidêncio <[email protected]>
17a9248 introduced this, and was calling the wrong method. Signed-off-by: ChengyuZhu6 <[email protected]> Signed-off-by: Fabiano Fidêncio <[email protected]>
/test |
aeb6b99
to
7f4ccb7
Compare
7f4ccb7
to
3505319
Compare
/test |
1 similar comment
/test |
This will reduce the risk of something breaking on the container / host due to the lack of libraries that needed to be loaded with a dynamically linked binary. Signed-off-by: Fabiano Fidêncio <[email protected]>
As we'll need to use it to properly clean up the nydus snapshots when uninstalling nydus. Signed-off-by: Fabiano Fidêncio <[email protected]>
db7b913
to
8dbfc63
Compare
Nydus snapshotter will be used for replacing the forked containerd, allowing us to pull the image inside the guest, and this commits is adding the ability to build it. Getting it to be used will come later on on this series. Signed-off-by: ChengyuZhu6 <[email protected]> Signed-off-by: Fabiano Fidêncio <[email protected]>
8dbfc63
to
66e697e
Compare
This is a release of nydus-snapshotter that has absolutely **ZERO** code changed, but that adds a tarball for s390x. This has been done in order to work around the issue we have seen building the binaries for s390x as part of our CI, at least till nydus-snapshotter releases a s390x tarball, which is being tracked as part of containerd/nydus-snapshotter#548 ``` The error we're facing is: 22.67 -> unzip /go/pkg/mod/cache/download/github.com/freddierice/go-losetup/@v/v0.0.0-20220711213114-2a14873012db.zip: zip: checksum error 22.69 go: downloading github.com/rs/xid v1.4.0 22.69 go: downloading github.com/containerd/stargz-snapshotter/estargz v0.14.3 22.69 go: downloading golang.org/x/net v0.10.0 22.71 go: downloading github.com/docker/distribution v2.8.2+incompatible 22.77 -> unzip /go/pkg/mod/cache/download/github.com/rs/xid/@v/v1.4.0.zip: zip: checksum error 22.83 go: downloading github.com/moby/locker v1.0.1 24.73 pkg/filesystem/stargz_adaptor.go:19:2: github.com/KarpelesLab/[email protected]: Get "https://proxy.golang.org/github.com/%21karpeles%21lab/reflink/@v/v1.0.1.zip": local error: tls: bad record MAC ``` This is not ideal, but this is "good enough" to unblock this release. Signed-off-by: Fabiano Fidêncio <[email protected]>
66e697e
to
4f3a56c
Compare
/test |
ee51469
to
a062ef1
Compare
/test |
1 similar comment
/test |
/test-kata-qemu-tdx |
1 similar comment
/test-kata-qemu-tdx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fidencio amazing work as always! Ready to land this IMO.
I will debug the TDX CI for one day before getting it merged, so if we have enough approvals, Tomorrow at noon, UTC, I will get this one merged regardless of the state of the TDX CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested a bit on my node. I think we might not be uninstalling nydus at all (see comment). Otherwise looks good.
1663f90
to
796b637
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We might see more SEV/SNP failures if other PRs (or the baseline) have run given that they will leave the nodes in a bad state. Should work if we rerun a few times tho.
Cool, thanks! Right now I'm re-working the uninstall as it made the jobs related to uninstall / reinstall break. |
796b637
to
1adf99d
Compare
/test |
3e5ab7f
to
a1be903
Compare
/test |
The nydus-snapshotter is going to be used in order to replace our dependency on the forked containerd. For now, we're not actually removing the forked containerd dependency, but rather giving the admin the choice to use nydus-snapshotter. Using the forked containerd will be deprecated as part of our **next** release. Signed-off-by: ChengyuZhu6 <[email protected]> Signed-off-by: Fabiano Fidêncio <[email protected]>
This is done for the kata-containers based payloads. Enclave CC will still keep using the forked containerd Signed-off-by: ChengyuZhu6 <[email protected]> Signed-off-by: Fabiano Fidêncio <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
The condition to set the snapshotter to be used is defintely wrong, as it'd never ever set the snapshotter variable. Signed-off-by: Fabiano Fidêncio <[email protected]>
…ging` ``` panic: odd number of arguments passed as key-value pairs for logging [recovered] panic: odd number of arguments passed as key-value pairs for logging ... github.com/confidential-containers/operator/controllers.(*CcRuntimeReconciler).monitorCcRuntimeInstallation(0xc0002d4190) /workspace/controllers/ccruntime_controller.go:473 +0x3bf ``` Signed-off-by: Huang Huang <[email protected]>
On operator_tests.bats's teardown let's print the description of all pods in the confidential-containers namespace to help on debug fails. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
This should be treated in a better way than adding an one minute sleep, as just having the sleep here won't do us any good in the future. What's basically happening, and forcing us to do this, is the fact that the Uninstall and postUninstall daemonsets are being started at exactly the same time, leading to a race condition when changing the containerd configuration. When looking at the kata-containers payload code, we see that the the label is only set after containerd is successfully reconfigured, and looking at this function we see we shouldn't reach this part before the label is set. However, that's not what we're facing ... In order to unblock this PR to get merged, we're good enough. But this needs proper investigation and hopefully we'll see this patch being reverted sooner than later. Signed-off-by: Fabiano Fidêncio <[email protected]>
a1be903
to
619eb49
Compare
/test |
1 similar comment
/test |
@@ -226,6 +226,20 @@ func (r *CcRuntimeReconciler) processCcRuntimeDeleteRequest() (ctrl.Result, erro | |||
if r.ccRuntime.Spec.Config.PostUninstall.Image == "" { | |||
controllerutil.RemoveFinalizer(r.ccRuntime, RuntimeConfigFinalizer) | |||
} else if r.ccRuntime.Spec.Config.PostUninstall.Image != "" { | |||
// FXIME: This should be treated in a better way, as just having the sleep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FXIME-> FIXME
This looks reasonable. We'll focus on a proper fix in the next release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix the fixme in the next iteration, just in order to save some trees and unblock peer-pods folks. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @fidencio
This PR is a rework of #263, but I don't want to lose that public reference of what's been done.
Please, take a look at each commit for more info on what it's doing and why.