Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [rke2] - set node-ip, use internal-only-ips for rke2 registration, set the ip type properly for the machine addresses #156

Merged
merged 2 commits into from
Mar 4, 2024

Conversation

AshleyDumaine
Copy link
Contributor

@AshleyDumaine AshleyDumaine commented Feb 29, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:
Adds the private IP to the node-ip and the public IP to the tls-san config for RKE2 like we had to do for K3s so the TLS certs are valid. Without node-ip, the container logs can't be retrieved and without the tls-san addition on top of that, server joining stops working. Ideally we need to have registration done via controlPlaneEndpoint registration though that's still in progress.
I'm aware hostname -I is brittle with the ordering per the manpage, but I don't have a much better solution at this time for this workaround.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Copy link
Collaborator

@rahulait rahulait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested this, works fine. LGTM

Seeing issues with other control plane nodes joining the cluster:

Feb 29 19:13:21 rah1-control-plane-wrg4n rke2[3412]: time="2024-02-29T19:13:21Z" level=fatal msg="starting kubernetes: preparing server: CA cert validation failed: Get \"https://172.234.204.152:9345/cacerts\": tls: failed to verify certificate: x509: certificate is valid for 10.43.0.1, 127.0.0.1, 172.234.208.182, 192.168.144.181, ::1, not 172.234.204.152"
Feb 29 19:13:21 rah1-control-plane-wrg4n systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 29 19:13:21 rah1-control-plane-wrg4n systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Feb 29 19:13:21 rah1-control-plane-wrg4n systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).

We might need node-external-ip flag or something similar which adds node's public ip as well.

@rahulait rahulait self-requested a review February 29, 2024 19:25
@AshleyDumaine
Copy link
Contributor Author

tested this, works fine. LGTM

Seeing issues with other control plane nodes joining the cluster:

Feb 29 19:13:21 rah1-control-plane-wrg4n rke2[3412]: time="2024-02-29T19:13:21Z" level=fatal msg="starting kubernetes: preparing server: CA cert validation failed: Get \"https://172.234.204.152:9345/cacerts\": tls: failed to verify certificate: x509: certificate is valid for 10.43.0.1, 127.0.0.1, 172.234.208.182, 192.168.144.181, ::1, not 172.234.204.152"
Feb 29 19:13:21 rah1-control-plane-wrg4n systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 29 19:13:21 rah1-control-plane-wrg4n systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Feb 29 19:13:21 rah1-control-plane-wrg4n systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).

We might need node-external-ip flag or something similar which adds node's public ip as well.

Unfortunately we can't set that for cloud-provider=external, rke2 fails to start in that case:

Feb 29 21:37:08 test-rke2-control-plane-jm9sv rke2[1092]: time="2024-02-29T21:37:08Z" level=fatal msg="can't set node-external-ip while using cloud provider"
Feb 29 21:37:08 test-rke2-control-plane-jm9sv systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 29 21:37:08 test-rke2-control-plane-jm9sv systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Feb 29 21:37:08 test-rke2-control-plane-jm9sv systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).

@AshleyDumaine AshleyDumaine changed the title [fix] add node-ip to rke2 drop-in config [fix] [rke2] - set node-ip, add public ip to tls-san Feb 29, 2024
@AshleyDumaine AshleyDumaine added rke2 Pull requests pertaining to the rke2 flavor bug Something isn't working labels Feb 29, 2024
@AshleyDumaine
Copy link
Contributor Author

This might actually be something that can be fixed if we set the IP type correctly on the machine controller, I just noticed we always set it to clusterv1.MachineExternalIP on https://github.com/linode/cluster-api-provider-linode/blob/main/controller/linodemachine_controller.go#L383 which will cause issues with RKE2 registration. I'll try out the fix there instead of this workaround and see if it helps.

@AshleyDumaine
Copy link
Contributor Author

Unfortunately even addressing the external vs internal IP type in our controller for the machine addresses, we still need the change to set the node-ip and add the public ip to the tls-san.

@AshleyDumaine AshleyDumaine changed the title [fix] [rke2] - set node-ip, add public ip to tls-san [fix] [rke2] - set node-ip, add public ip to tls-san, set the ip type properly for the machine addresses Mar 1, 2024
@AshleyDumaine AshleyDumaine added the go Pull requests that update Go code label Mar 1, 2024
@AshleyDumaine AshleyDumaine force-pushed the rke2-tls branch 2 times, most recently from 09d8e27 to 848ff2b Compare March 1, 2024 15:47
@rahulait
Copy link
Collaborator

rahulait commented Mar 1, 2024

Tested this, LGTM

@AshleyDumaine AshleyDumaine changed the title [fix] [rke2] - set node-ip, add public ip to tls-san, set the ip type properly for the machine addresses [fix] [rke2] - set node-ip, use internal-only-ips for rke2 registration, set the ip type properly for the machine addresses Mar 4, 2024
Copy link
Collaborator

@rahulait rahulait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AshleyDumaine AshleyDumaine merged commit 109e9a0 into main Mar 4, 2024
6 checks passed
@AshleyDumaine AshleyDumaine deleted the rke2-tls branch March 4, 2024 16:48
amold1 pushed a commit that referenced this pull request May 17, 2024
…on, set the ip type properly for the machine addresses (#156)

* add workaround to get both container logs working and server join working without controlPlaneEndpoint registration support on rke2

* fix ip type setting for addresses on machine controller
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working go Pull requests that update Go code rke2 Pull requests pertaining to the rke2 flavor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants