Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the nydus-snapshotter by default #267

Conversation

fidencio
Copy link
Member

This PR is a rework of #263, but I don't want to lose that public reference of what's been done.

Please, take a look at each commit for more info on what it's doing and why.

This will be used, either by the admin or by our tests, to set the
correct debug level we want for the payload we're shipping, including
the pre-install one.

The default is "false", as usually folks won't need fully verbose logs
in production.

Signed-off-by: Fabiano Fidêncio <[email protected]>
That was a typo that was there since d1268cf, when the code was
moved to this repo.

Signed-off-by: ChengyuZhu6 <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
17a9248 introduced this, and was calling the wrong method.

Signed-off-by: ChengyuZhu6 <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio
Copy link
Member Author

/test

@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from aeb6b99 to 7f4ccb7 Compare October 20, 2023 15:38
@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from 7f4ccb7 to 3505319 Compare October 24, 2023 09:15
@fidencio
Copy link
Member Author

/test

1 similar comment
@fidencio
Copy link
Member Author

/test

This will reduce the risk of something breaking on the container / host
due to the lack of libraries that needed to be loaded with a dynamically
linked binary.

Signed-off-by: Fabiano Fidêncio <[email protected]>
As we'll need to use it to properly clean up the nydus snapshots when
uninstalling nydus.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch 3 times, most recently from db7b913 to 8dbfc63 Compare October 24, 2023 10:53
Nydus snapshotter will be used for replacing the forked containerd,
allowing us to pull the image inside the guest, and this commits is
adding the ability to build it.

Getting it to be used will come later on on this series.

Signed-off-by: ChengyuZhu6 <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from 8dbfc63 to 66e697e Compare October 24, 2023 11:36
This is a release of nydus-snapshotter that has absolutely **ZERO** code
changed, but that adds a tarball for s390x.

This has been done in order to work around the issue we have seen
building the binaries for s390x as part of our CI, at least till
nydus-snapshotter releases a s390x tarball, which is being tracked as
part of containerd/nydus-snapshotter#548
```

The error we're facing is:
22.67 -> unzip
   /go/pkg/mod/cache/download/github.com/freddierice/go-losetup/@v/v0.0.0-20220711213114-2a14873012db.zip:
zip: checksum error
22.69 go: downloading github.com/rs/xid v1.4.0
22.69 go: downloading github.com/containerd/stargz-snapshotter/estargz
   v0.14.3
22.69 go: downloading golang.org/x/net v0.10.0
22.71 go: downloading github.com/docker/distribution v2.8.2+incompatible
22.77 -> unzip
   /go/pkg/mod/cache/download/github.com/rs/xid/@v/v1.4.0.zip: zip:
checksum error
22.83 go: downloading github.com/moby/locker v1.0.1
24.73 pkg/filesystem/stargz_adaptor.go:19:2:
   github.com/KarpelesLab/[email protected]: Get
"https://proxy.golang.org/github.com/%21karpeles%21lab/reflink/@v/v1.0.1.zip":
local error: tls: bad record MAC
```

This is not ideal, but this is "good enough" to unblock this release.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from 66e697e to 4f3a56c Compare October 24, 2023 11:42
@fidencio
Copy link
Member Author

/test

@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from ee51469 to a062ef1 Compare October 24, 2023 12:47
@fidencio
Copy link
Member Author

/test

1 similar comment
@fidencio
Copy link
Member Author

/test

@fidencio
Copy link
Member Author

/test-kata-qemu-tdx

1 similar comment
@fidencio
Copy link
Member Author

/test-kata-qemu-tdx

Copy link
Member

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fidencio amazing work as always! Ready to land this IMO.

@fidencio
Copy link
Member Author

I will debug the TDX CI for one day before getting it merged, so if we have enough approvals, Tomorrow at noon, UTC, I will get this one merged regardless of the state of the TDX CI.

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a bit on my node. I think we might not be uninstalling nydus at all (see comment). Otherwise looks good.

@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from 1663f90 to 796b637 Compare October 25, 2023 07:45
@fidencio
Copy link
Member Author

/test

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We might see more SEV/SNP failures if other PRs (or the baseline) have run given that they will leave the nodes in a bad state. Should work if we rerun a few times tho.

@fidencio
Copy link
Member Author

We might see more SEV/SNP failures if other PRs (or the baseline) have run given that they will leave the nodes in a bad state. Should work if we rerun a few times tho.

Cool, thanks!

Right now I'm re-working the uninstall as it made the jobs related to uninstall / reinstall break.

@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from 796b637 to 1adf99d Compare October 25, 2023 18:14
@fidencio
Copy link
Member Author

/test

@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from 3e5ab7f to a1be903 Compare October 25, 2023 18:56
@fidencio
Copy link
Member Author

/test

fidencio and others added 7 commits October 25, 2023 21:07
The nydus-snapshotter is going to be used in order to replace our
dependency on the forked containerd.

For now, we're not actually removing the forked containerd dependency,
but rather giving the admin the choice to use nydus-snapshotter.

Using the forked containerd will be deprecated as part of our **next**
release.

Signed-off-by: ChengyuZhu6 <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
This is done for the kata-containers based payloads. Enclave CC will
still keep using the forked containerd

Signed-off-by: ChengyuZhu6 <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
The condition to set the snapshotter to be used is defintely wrong, as
it'd never ever set the snapshotter variable.

Signed-off-by: Fabiano Fidêncio <[email protected]>
…ging`

```
panic: odd number of arguments passed as key-value pairs for logging [recovered]
        panic: odd number of arguments passed as key-value pairs for logging
...
github.com/confidential-containers/operator/controllers.(*CcRuntimeReconciler).monitorCcRuntimeInstallation(0xc0002d4190)
        /workspace/controllers/ccruntime_controller.go:473 +0x3bf
```

Signed-off-by: Huang Huang <[email protected]>
On operator_tests.bats's teardown let's print the description of all
pods in the confidential-containers namespace to help on debug fails.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
This should be treated in a better way than adding an one minute sleep,
as just having the sleep here won't do us any good in the future.

What's basically happening, and forcing us to do this, is the fact that
the Uninstall and postUninstall daemonsets are being started at exactly
the same time, leading to a race condition when changing the containerd
configuration.

When looking at the kata-containers payload code, we see that the the
label is only set after containerd is successfully reconfigured, and
looking at this function we see we shouldn't reach this part before the
label is set.  However, that's not what we're facing ...

In order to unblock this PR to get merged, we're good enough.  But this
needs proper investigation and hopefully we'll see this patch being
reverted sooner than later.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/enable-nydus-snapshotter-usage-for-pulling-on-the-guest branch from a1be903 to 619eb49 Compare October 25, 2023 19:07
@fidencio
Copy link
Member Author

/test

1 similar comment
@fidencio
Copy link
Member Author

/test

@@ -226,6 +226,20 @@ func (r *CcRuntimeReconciler) processCcRuntimeDeleteRequest() (ctrl.Result, erro
if r.ccRuntime.Spec.Config.PostUninstall.Image == "" {
controllerutil.RemoveFinalizer(r.ccRuntime, RuntimeConfigFinalizer)
} else if r.ccRuntime.Spec.Config.PostUninstall.Image != "" {
// FXIME: This should be treated in a better way, as just having the sleep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: FXIME-> FIXME
This looks reasonable. We'll focus on a proper fix in the next release

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the fixme in the next iteration, just in order to save some trees and unblock peer-pods folks. :-)

Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks @fidencio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants