RFC: enclave-cc improvement ideas #235

mythi · 2023-09-29T09:46:19Z

I gave an enclave-cc Q3 23' update and talked about improvement ideas in the Sept 14 community call (the last ~18minutes or so).

I'll summarize the improvement ideas here so that we can follow-up with detailed discussion.

enclave-cc existing challenges follow the "Kata-CC" challenges:

Image pulling on the "host" to be able to share layers between pods/containers
use of forked containerd for the "image service" functionality

Additionally, enclave-cc implementation is sub-optimal with non-encrypted containers since it uses Occlum encrypted sefs even if it wasn't strictly necessary.

So far we have successfully demonstrated that unmodified containers can be made libOS aware by mounting a "libOS layer" during runtime. I'm calling this "overlayos". :-) With the existing setup we are also using CoCo KBS/KBC to attest enclave-agent to get access to layer encryption secrets.

In the today's architecture, enclave-agent is responsible for preparing the application image "bundle" and shim-rune is responsible for creating the overlayos mount of the application image + "Occlum libOS layer" (installed on each SGX enabled host by CoCo operator).

From Kata-CC we have learned that image layers can be processed by a containerd "remote snapshotter". It's also possible to make the snapshotter configuration per runtime handler.

Proposal 1. - encrypted containers without a forked containerd

a. move per pod enclave-agent to a per node snapshotter run in an Occlum TEE
b. drop existing shim-rune (the Occlum TEE snapshotter provides mounts info to shim-runc to process)
c. #18 can be mitigated by socat to translate /tmp/tee-snapshotter.sock to a format Occlum understands
d. explore if Occlum hostfs mounts can be leveraged to store unpacked layers for improved layer sharing

Proposal 2. - unencrypted containers with user-defined policies + Gramine

To address #73, the problem of intermediate file encryption when non-encrypted images are processed, and the layer sharing problems, a slightly different approach is proposed. It still follows the "overlayos" approach but by using Gramine libOS and its built-in policy/manifest mechanism.

a. have a non-TEE snapshotter that handles image layers the same way as the containerd built-in overlayfs snapshotter but adds Gramine LibOS layer on top (installed on each host by the CoCo operator)
b. user provided manifest + enclave signature is either bind mounted from a configMap or pulled by the snapshotter from an OCI v1.1 enabled registry (with OCI reference types it's possible to build a link between the manifest+enclave signature blob and the image they are for).

The user provided manifest approach follows the same needs as already approved for Kata-CC as the "container metadata validation".

Summary:

Proposal 1. to meet the needs of encrypted containers uses cases and enabled through its own runtimeClass
parallel/complementatry to Proposal 1., Proposal 2 as it's own runtimeClass is added to enable finer-grained TCB controls and better layer sharing.

Extras:
The two proposals keep enclave-cc compatible with the CoCo goals where can unmodified containers run in TEEs. To further tighten the link between "proposal 2" and use of CoCo components, I've built a demonstrator showing how Proposal 2. uses CoCo KBS protocol and KBS to enable k8s sealed secrets for enclave-cc:

# experimental(!) https://github.com/mythi/kbs/tree/gramine-preload-lib

loader.env.LD_PRELOAD = "libsecret_prov.so"
loader.env.SECRET_PROVISION_CONSTRUCTOR = "1"
loader.env.SECRET_PROVISION_SET_KEY = "gramine/encrypted-files/wrap-key"
loader.env.SECRET_PROVISION_CA_CHAIN_PATH = "/ca.crt"
loader.env.SECRET_PROVISION_SERVERS = "https://localhost:8080/"
…
fs.mounts = [
  …,
  { type = “encrypted”, path = "/inside/enclave/path", uri = "file:/inside/container/from/kubelet", key_name = “wrap-key” },
]
…

The text was updated successfully, but these errors were encountered:

dcmiddle · 2023-10-02T16:23:32Z

cc: @mikbras @monavij @fitzthum

fitzthum · 2023-10-03T16:54:19Z

Hm. I don't have the full picture, but I think you're right that the second proposal is a bit like the nydus approach that we are adding elsewhere in the project. The nydus snapshotter is untrusted and we rely on some additional artifacts for verification/decryption.

The first proposal is very interesting. If the image management runs at a node level, it would be shared between different pods or maybe even different clients. This seems a little weird, but I guess the idea is that each pod would be able to verify the evidence of the image management enclave and the API would be designed in a way to limit privilege. I wonder if this approach could make sense for other runtime classes as well? I'm not sure how the overhead would compare to the untrusted snapshotter ideas.

jiangliu · 2023-10-07T08:08:28Z

I'm not familiar with enclave-cc architecture, but the proposal seems reasonable.

Proposal 1. - encrypted containers without a forked containerd

a. move per pod enclave-agent to a per node snapshotter run in an Occlum TEE
b. drop existing shim-rune (the Occlum TEE snapshotter provides mounts info to shim-runc to process)
c. https://github.com/confidential-containers/enclave-cc/issues/18 can be mitigated by socat to translate /tmp/tee-snapshotter.sock to a format Occlum understands
d. explore if Occlum hostfs mounts can be leveraged to store unpacked layers for improved layer sharing

The proposal 1 is actually providing a node-level filesystem server with encryption for enclave-cc.
For normal FUSE-based snapshotter, the filesystem will be provisioned by remote snapshotters through FUSE.
For enclave-cc, we just get rid of the FUSE part and build another communication channel between the encryption file server and libOS layer.

mythi · 2023-10-12T05:32:16Z

maybe even different clients. This seems a little weird, but I guess the idea is that each pod would be able to verify the evidence of the image management enclave and the API would be designed in a way to limit privilege. I wonder if this approach could make

The current idea is to use sealing (#149). Originally we talked about some LA-attested channel between the two but setting this up between a snapshotter and a pod is tricky. Moreover, SGX provides a flexible mechanism to protect data between enclaves signed by the same author.

With the snapshotter (run in an enclave), runtime metadata (entrypoint, envs) and the per image encrypted fs key can be sealed. The per image "init" can be trusted to use that information only and start the app using the runtime metadata provided.

The open is how to trust this runtime metadata but at least it could use the image's config. Perhaps KBS get-resource would also work.

mythi mentioned this issue May 2, 2024

remove reqs-deploy payload confidential-containers/operator#369

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: enclave-cc improvement ideas #235

RFC: enclave-cc improvement ideas #235

mythi commented Sep 29, 2023

dcmiddle commented Oct 2, 2023

fitzthum commented Oct 3, 2023

jiangliu commented Oct 7, 2023

mythi commented Oct 12, 2023

RFC: enclave-cc improvement ideas #235

RFC: enclave-cc improvement ideas #235

Comments

mythi commented Sep 29, 2023

dcmiddle commented Oct 2, 2023

fitzthum commented Oct 3, 2023

jiangliu commented Oct 7, 2023

mythi commented Oct 12, 2023