Skip to content

Commit

Permalink
Integrate k3s and multus
Browse files Browse the repository at this point in the history
Signed-off-by: Manuel Buil <[email protected]>
  • Loading branch information
manuelbuil committed Apr 15, 2024
1 parent 1fe0371 commit 70189b8
Show file tree
Hide file tree
Showing 13 changed files with 414 additions and 9 deletions.
54 changes: 54 additions & 0 deletions docs/adrs/multus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Record architecture decisions

Date: 2024-04-15

## Status

Dismissed

## Context

### Multus

Multus is a CNI multiplexer that allows pods to have multiple network interfaces. We have users that are operating K3s + Multus but it is not super obvious how to configure it to work with K3s and how to add the additional pieces needed (e.g. IPAM or additional CNI plugins). We could facilitate this by creating an integration with Multus.

We we will wait a bit to include whereabouts. That project is using very old dependencies which will creep in CVEs

### Design suggestion

Add multus to the k3s-charts repo. That multus chart will consume the tarball we generate in rke2-charts, i.e. both rke2 and k3s will use the same chart with minimal diffs (e.g. the Chart name will be k3s-multus instead of rke2-multus).

Then, multus will be consumed as traefik:
* The chart gets downloaded with `make download`
* The chart tarball gets embedded in k3s binary with `go generate` and included in `pkg/static/zz_generated_bindata.go`
* The HelmChart manifest pointing to the chart tarball gets embedded in k3s binary with `go generate` and included in `pkg/deploy/zz_generated_bindata.go`

K3s will include a new `--multus` boolean flag. When that flag is true, we would leave the HelmChart manifest installing multus.

The multus chart will install a daemonset that:
* deploys the necessary binaries (multus and common CNI plugins) in each node
* generates the correct CNI plugin
* Installs the required CRDs

It sucks a bit that the daemonset stays dormant forever after doing the job instead of just dying, but the alternatives are worse

## Alternatives

* K3s creates a job that picks the multus and whereabouts CNI plugins from the `image-build-cni-plugins` and copies them to each node. However, configuring jobs to run on each node is not that easy and very error prone. Therefore, we decided to reject this idea

* K3s includes the multus and whereabouts CNI plugins as part of its multi-exec cni binary. However, the whereabouts binary is using very old dependencies which would creep in CVEs. Moreover, the size of the K3s binary would increase more than 10%, something not acceptable for a something that the vast majority of K3s users will not enable

### Limitations

The multus and cni-plugins images do not support ARM architecture. At this first release, that architecture is not supported

### Airgap

We are creating a different tarball that includes the multus images:
* docker.io/rancher/hardened-multus-cni
* docker.io/rancher/hardened-cni-plugins
* docker.io/rancher/mirrored-library-busybox

## Decision

The decision was to dismiss this idea because it was really complicated. The main problem is that because we install Multus using a helm chart, we need to specify the directory where the CNI binaries are (data-dir) and it must be the same for all nodes. This is complicated because this path depends on the K3s build, and so different K3s versions will surely have different paths. As a consequence, we could only use this feature in homogeneous clusters. We could make the CNIs not depend on that (data-dir), as we do when not deploying Flannel but that changes typical K3s deployments and might create extra problems. As this PR was looking to reduce users' problems, creating different problems does not really help.
14 changes: 14 additions & 0 deletions manifests/multus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: multus
namespace: kube-system
spec:
chart: https://%{KUBERNETES_API}%/static/charts/multus-4.0.201+upv4.0.2-build2024020802.tgz
valuesContent: |-
config:
cni_conf:
confDir: /var/lib/rancher/k3s/agent/etc/cni/net.d
binDir: %{DATA_DIR}%
kubeconfig: /var/lib/rancher/k3s/agent/etc/cni/net.d/multus.d/multus.kubeconfig
6 changes: 6 additions & 0 deletions pkg/cli/cmds/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ type Server struct {
EtcdS3Folder string
EtcdS3Timeout time.Duration
EtcdS3Insecure bool
Multus bool
ServiceLBNamespace string
}

Expand Down Expand Up @@ -489,6 +490,11 @@ var ServerFlags = []cli.Flag{
Usage: "(experimental/components) Enable embedded distributed container registry; requires use of embedded containerd",
Destination: &ServerConfig.EmbeddedRegistry,
},
&cli.BoolFlag{
Name: "multus",
Usage: "(experimental/networking) Enable multus",
Destination: &ServerConfig.Multus,
},
NodeNameFlag,
WithNodeIDFlag,
NodeLabels,
Expand Down
6 changes: 6 additions & 0 deletions pkg/cli/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ func run(app *cli.Context, cfg *cmds.Server, leaderControllers server.CustomCont
serverConfig.ControlConfig.EncryptSecrets = cfg.EncryptSecrets
serverConfig.ControlConfig.EtcdExposeMetrics = cfg.EtcdExposeMetrics
serverConfig.ControlConfig.EtcdDisableSnapshots = cfg.EtcdDisableSnapshots
serverConfig.ControlConfig.Multus = cfg.Multus
serverConfig.ControlConfig.VLevel = cmds.LogConfig.VLevel
serverConfig.ControlConfig.VModule = cmds.LogConfig.VModule

Expand Down Expand Up @@ -397,6 +398,11 @@ func run(app *cli.Context, cfg *cmds.Server, leaderControllers server.CustomCont
serverConfig.ControlConfig.Disables["ccm"] = true
}

if !serverConfig.ControlConfig.Multus {
serverConfig.ControlConfig.Skips["multus"] = true
serverConfig.ControlConfig.Disables["multus"] = true
}

tlsMinVersionArg := getArgValueFromList("tls-min-version", serverConfig.ControlConfig.ExtraAPIArgs)
serverConfig.ControlConfig.TLSMinVersion, err = kubeapiserverflag.TLSVersion(tlsMinVersionArg)
if err != nil {
Expand Down
1 change: 1 addition & 0 deletions pkg/daemons/config/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ type CriticalControlArgs struct {
FlannelIPv6Masq bool `cli:"flannel-ipv6-masq"`
FlannelExternalIP bool `cli:"flannel-external-ip"`
EgressSelectorMode string `cli:"egress-selector-mode"`
Multus bool `cli:"multus"`
ServiceIPRange *net.IPNet `cli:"service-cidr"`
ServiceIPRanges []*net.IPNet `cli:"service-cidr"`
}
Expand Down
23 changes: 23 additions & 0 deletions pkg/deploy/zz_generated_bindata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"fmt"
"os"
"os/exec"
"path"
"path/filepath"
"runtime/debug"
Expand Down Expand Up @@ -270,6 +271,13 @@ func stageFiles(ctx context.Context, sc *Context, controlConfig *config.Control)
dnsIPFamilyPolicy = "RequireDualStack"
}

// Find the /var/lib/rancher/k3s/data/${SHA}/bin/ directory. Same procedure we use in pkg/agent/config/config.go
hostLocal, err := exec.LookPath("host-local")
if err != nil {
return errors.Wrap(err, "failed to find host-local")
}
CNIBinDir := filepath.Dir(hostLocal)

templateVars := map[string]string{
"%{CLUSTER_DNS}%": controlConfig.ClusterDNS.String(),
"%{CLUSTER_DNS_LIST}%": fmt.Sprintf("[%s]", util.JoinIPs(controlConfig.ClusterDNSs)),
Expand All @@ -279,6 +287,7 @@ func stageFiles(ctx context.Context, sc *Context, controlConfig *config.Control)
"%{SYSTEM_DEFAULT_REGISTRY}%": registryTemplate(controlConfig.SystemDefaultRegistry),
"%{SYSTEM_DEFAULT_REGISTRY_RAW}%": controlConfig.SystemDefaultRegistry,
"%{PREFERRED_ADDRESS_TYPES}%": addrTypesPrioTemplate(controlConfig.FlannelExternalIP),
"%{DATA_DIR}%": CNIBinDir,
}

skip := controlConfig.Skips
Expand Down
31 changes: 27 additions & 4 deletions pkg/static/zz_generated_bindata.go

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions scripts/airgap/image-list-multus.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
docker.io/rancher/hardened-multus-cni:v4.0.2-build20240208
docker.io/rancher/hardened-cni-plugins:v1.4.0-build20240122
docker.io/rancher/mirrored-library-busybox:1.36.1
21 changes: 16 additions & 5 deletions scripts/package-airgap
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,23 @@ cd $(dirname $0)/..

. ./scripts/version.sh

function createTarball() {
images=$(cat "${1}")
xargs -n1 docker pull <<< "${images}"
docker save ${images} -o dist/artifacts/${2}-${ARCH}.tar
zstd --no-progress -T0 -16 -f --long=25 dist/artifacts/${2}-${ARCH}.tar -o dist/artifacts/${2}-${ARCH}.tar.zst
pigz -v -c dist/artifacts/${2}-${ARCH}.tar > dist/artifacts/${2}-${ARCH}.tar.gz
}


airgap_image_file='scripts/airgap/image-list.txt'
images=$(cat "${airgap_image_file}")
xargs -n1 docker pull <<< "${images}"
docker save ${images} -o dist/artifacts/k3s-airgap-images-${ARCH}.tar
zstd --no-progress -T0 -16 -f --long=25 dist/artifacts/k3s-airgap-images-${ARCH}.tar -o dist/artifacts/k3s-airgap-images-${ARCH}.tar.zst
pigz -v -c dist/artifacts/k3s-airgap-images-${ARCH}.tar > dist/artifacts/k3s-airgap-images-${ARCH}.tar.gz
multus_airgap_image_file='scripts/airgap/image-list-multus.txt'
createTarball ${airgap_image_file} "k3s-airgap-images"
# multus and cni-plugins image do not support arm yet
if [ ${ARCH} != arm ]; then
createTarball ${multus_airgap_image_file} "multus-airgap-images"
fi
if [ ${ARCH} = amd64 ]; then
cp "${airgap_image_file}" dist/artifacts/k3s-images.txt
cp "${multus_airgap_image_file}" dist/artifacts/multus-images.txt
fi
74 changes: 74 additions & 0 deletions tests/e2e/amd64_resource_files/multus-pods.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: macvlan-conf
spec:
config: '{
"cniVersion": "0.3.1",
"plugins": [
{
"type": "macvlan",
"capabilities": { "ips": true },
"master": "eth1",
"mode": "bridge",
"ipam": {
"type": "static",
"routes": [
{
"dst": "0.0.0.0/0",
"gw": "10.1.1.1"
}
]
}
}, {
"capabilities": { "mac": true },
"type": "tuning"
}
]
}'
---

apiVersion: v1
kind: Pod
metadata:
labels:
app: pod-macvlan
name: pod-macvlan
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "macvlan-conf",
"ips": [ "10.1.1.101/24" ],
"mac": "c2:b0:57:49:47:f1",
"gateway": [ "10.1.1.1" ]
}]'
spec:
containers:
- image: praqma/network-multitool
imagePullPolicy: Always
name: multitool
securityContext:
capabilities:
add: ["NET_ADMIN","NET_RAW"]
---

apiVersion: v1
kind: Pod
metadata:
labels:
app: pod2-macvlan
name: pod2-macvlan
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "macvlan-conf",
"ips": [ "10.1.1.102/24" ],
"mac": "c2:b0:57:45:47:f1",
"gateway": [ "10.1.1.1" ]
}]'
spec:
containers:
- image: praqma/network-multitool
imagePullPolicy: Always
name: multitool
securityContext:
capabilities:
add: ["NET_ADMIN","NET_RAW"]
84 changes: 84 additions & 0 deletions tests/e2e/multus/Vagrantfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
ENV['VAGRANT_NO_PARALLEL'] = 'no'
NODE_ROLES = (ENV['E2E_NODE_ROLES'] ||
["server-0", "agent-0" ])
NODE_BOXES = (ENV['E2E_NODE_BOXES'] ||
['generic/ubuntu2004', 'generic/ubuntu2004'])
GITHUB_BRANCH = (ENV['E2E_GITHUB_BRANCH'] || "master")
RELEASE_VERSION = (ENV['E2E_RELEASE_VERSION'] || "")
GOCOVER = (ENV['E2E_GOCOVER'] || "")
NODE_CPUS = (ENV['E2E_NODE_CPUS'] || 2).to_i
NODE_MEMORY = (ENV['E2E_NODE_MEMORY'] || 2048).to_i
# This key must be created using tailscale web
TAILSCALE_KEY = (ENV['E2E_TAILSCALE_KEY'] || "")
NETWORK4_PREFIX = "10.10.10"
install_type = ""

def provision(vm, roles, role_num, node_num)
vm.box = NODE_BOXES[node_num]
vm.hostname = "#{roles[0]}-#{role_num}"
node_ip4 = "#{NETWORK4_PREFIX}.#{100+node_num}"
vm.network "private_network", ip: node_ip4, netmask: "255.255.255.0"

scripts_location = Dir.exists?("./scripts") ? "./scripts" : "../scripts"
vagrant_defaults = File.exists?("./vagrantdefaults.rb") ? "./vagrantdefaults.rb" : "../vagrantdefaults.rb"
load vagrant_defaults

defaultOSConfigure(vm)
addCoverageDir(vm, roles, GOCOVER)
install_type = getInstallType(vm, RELEASE_VERSION, GITHUB_BRANCH)

vm.provision "Ping Check", type: "shell", inline: "ping -c 2 k3s.io"

if roles.include?("server") && role_num == 0
server_IP = nil
vm.provision :k3s, run: 'once' do |k3s|
k3s.config_mode = '0644' # side-step https://github.com/k3s-io/k3s/issues/4321
k3s.args = "server "
k3s.config = <<~YAML
cluster-init: true
node-ip: #{node_ip4}
token: vagrant
multus: true
YAML
k3s.env = ["K3S_KUBECONFIG_MODE=0644", install_type]
end
end
if roles.include?("agent")
vm.provision :k3s, run: 'once' do |k3s|
k3s.config_mode = '0644' # side-step https://github.com/k3s-io/k3s/issues/4321
k3s.args = "agent "
k3s.config = <<~YAML
server: https://#{NETWORK4_PREFIX}.100:6443
node-ip: #{node_ip4}
token: vagrant
YAML
k3s.env = ["K3S_KUBECONFIG_MODE=0644", install_type]
end
end
end

Vagrant.configure("2") do |config|
config.vagrant.plugins = ["vagrant-k3s", "vagrant-reload", "vagrant-libvirt", "vagrant-scp"]
config.vm.provider "libvirt" do |v|
v.cpus = NODE_CPUS
v.memory = NODE_MEMORY
end

if NODE_ROLES.kind_of?(String)
NODE_ROLES = NODE_ROLES.split(" ", -1)
end
if NODE_BOXES.kind_of?(String)
NODE_BOXES = NODE_BOXES.split(" ", -1)
end

# Must iterate on the index, vagrant does not understand iterating
# over the node roles themselves
NODE_ROLES.length.times do |i|
name = NODE_ROLES[i]
config.vm.define name do |node|
roles = name.split("-", -1)
role_num = roles.pop.to_i
provision(node.vm, roles, role_num, i)
end
end
end
Loading

0 comments on commit 70189b8

Please sign in to comment.