-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ADR about failed multus integration #9434
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I usually just write it up in the final format the ADR will be accepted in, so marked as Approved, and with the Design suggestion
section as if it was the Decision... so that people can approve and you don't need to update it again and get another round of approvals just to do minor rewording.
docs/adrs/multus.md
Outdated
|
||
The suggestion is to include a new `--multus` boolean flag. When that flag gets activated, we deploy a Kubernetes job that will run in that node and will: | ||
* Run the [cni-plugins image](https://github.com/rancher/image-build-cni-plugins) which will deploy in the node all the widely used CNI binaries (e.g. macvlan), the multus binary and the whereabouts binary | ||
* Install the required multus CRD `network-attachment-definitions.k8s.cni.cncf.io` and the whereabouts CRDs `ippools.whereabouts.cni.cncf.io` & `overlappingrangeipreservations.whereabouts.cni.cncf.io` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably do like we do with disabled components in RKE2, and ship these as a packaged manifest, and then just add them to the --disable
list if --multus
isn't set.
docs/adrs/multus.md
Outdated
* Create the appropriate Multus CNI config file and place it in the right path | ||
* Start the whereabouts controller | ||
|
||
These tasks are more or less what the upstream Multus and whereabouts daemonset do but, at least for Multus, it seems too overkill to have a daemonset running forever just for deploying the binaries and creating the correct multus cni config. A job that runs the `cni-plugins image` once the nodes comes up seems like a better solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure a Job is the best way to do this, given that that the process for assigning job pods to specific nodes is a bit clumsy. We would need to reinvent much of the logic around creating and managing job-node assignment, and this logic already exists in the daemonset controller. I think we should either drop the required files directly from the k3s supervisor process, or use a daemonset to ensure that the install pod gets run on every node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is too clumsy, then perhaps it is best for k3s to do that job but then, I don't want to increase the size of k3s by adding too many binaries that most people won't use... I'm starting to wonder if perhaps the idea of that user that suggested to add "macvlan" to the cni multi-exec might be the best one :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macvlan + multus + whereabouts +.... I'll have fun in the next days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding multus+whereabouts to the multi-exec is not worth it.:
- Multus binary increases the size of k3s significantly. Given that only a handful of users will benefit from this, I don't think it makes sense
- whereabouts binary has very very old dependencies. That project actually looks sort of unmaintained. Anyway, it would creep in a lot of CVEs and we don't want that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question: do we want to add the multus images to the core k3s airgap image list, or should we have an optional image tarball, like we do for rke2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, probably the latter makes more sense. What would be the disadvantages of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disadvantage to adding an optional tarball is that we have to update build scripts, QA processes, and documentation to handle the optional tarballs. We have never had more than 1 for k3s. How big are the additional images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. To minimize that pain, I'd leave the same name for the current tarball k3s-airgap-images-${ARCH}.tar
and the images file k3s-images.txt
8455315
to
5a04cb4
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9434 +/- ##
==========================================
- Coverage 48.66% 43.77% -4.89%
==========================================
Files 158 151 -7
Lines 14017 13580 -437
==========================================
- Hits 6821 5945 -876
- Misses 5906 6416 +510
+ Partials 1290 1219 -71
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
5073222
to
a2f3a96
Compare
47541a7
to
1efc154
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the k3s-multus chart has the same amd64 nodeSelector issue as the rke2 chart. We should fix that before we add the chart to k3s, since we do fully support arm64.
This PR is postponed for the April release |
db299d8
to
59ef7c7
Compare
pkg/server/server.go
Outdated
@@ -279,6 +287,7 @@ func stageFiles(ctx context.Context, sc *Context, controlConfig *config.Control) | |||
"%{SYSTEM_DEFAULT_REGISTRY}%": registryTemplate(controlConfig.SystemDefaultRegistry), | |||
"%{SYSTEM_DEFAULT_REGISTRY_RAW}%": controlConfig.SystemDefaultRegistry, | |||
"%{PREFERRED_ADDRESS_TYPES}%": addrTypesPrioTemplate(controlConfig.FlannelExternalIP), | |||
"%{DATA_DIR}%": CNIBinDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not call this DATA_DIR
- this could be confusing as normally that would refer to the value of the --data-dir
flag.
"%{DATA_DIR}%": CNIBinDir, | |
"%{CNI_BIN_DIR}%": CNIBinDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other problems with injecting the server's CNI bin dir path into the chart like this though:
- CNI bin dir is version dependent. The value will change whenever the server is upgraded, and will only work if all nodes are on the exact same build of k3s:
brandond@seago:~$ grep bin_dir /var/lib/rancher/k3s/agent/etc/containerd/config.toml
bin_dir = "/var/lib/rancher/k3s/data/0bd989dbe82165818ee98da4cc70b043adabccc23827c9bbd24b1460d4f3f2e1/bin"
- CNI bin dir is prefixed by --data-dir, which may be set to different values on different hosts.
You're going to need to find some way to set this on per-node basis, instead of injecting it via the chart. You could MAYBE get away with hardcoding it as /var/lib/rancher/k3s/data/current
but this only fixes problem 1, not problem 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Brad, those are great points I did not think about
pkg/daemons/config/types.go
Outdated
@@ -157,6 +157,7 @@ type CriticalControlArgs struct { | |||
FlannelIPv6Masq bool `cli:"flannel-ipv6-masq"` | |||
FlannelExternalIP bool `cli:"flannel-external-ip"` | |||
EgressSelectorMode string `cli:"egress-selector-mode"` | |||
Multus bool `cli:"multus"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this needs tidying
pkg/cli/server/server.go
Outdated
@@ -397,6 +398,11 @@ func run(app *cli.Context, cfg *cmds.Server, leaderControllers server.CustomCont | |||
serverConfig.ControlConfig.Disables["ccm"] = true | |||
} | |||
|
|||
if !serverConfig.ControlConfig.Multus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we require that the bundled multus be used alongside the embedded flannel? Should we raise an error if multus is enabled, but flannel backend is set to none?
manifests/multus.yaml
Outdated
config: | ||
cni_conf: | ||
confDir: /var/lib/rancher/k3s/agent/etc/cni/net.d | ||
binDir: %{DATA_DIR}% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binDir: %{DATA_DIR}% | |
binDir: %{CNI_BIN_DIR}% |
Signed-off-by: Manuel Buil <[email protected]>
docs/adrs/multus.md
Outdated
|
||
## Alternatives | ||
|
||
* Deploy multus and whereabouts like traefik, reusing the multus chart in rke2-charts. However, that will deploy a daemonset for both multus and whereabouts which will consume more resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering Brad's comment above about using a daemonset, this looks it would be the simplest way to deploy multus+whereabouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are reading an old version of the ADR 🤔 . In the latest version, it is explained the reasons why this option is complicated. In the end, it was the option I tried
Proposed Changes
I worked for a while in the integration of k3s and Multus but it was pretty complicated. This ADR tries to explain what was tried and why I finally gave up
Types of Changes
New feature
Verification
Testing
Linked Issues
User-Facing Change
Further Comments