Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet can't running after renew certificates #2054

Closed
pytimer opened this issue Mar 5, 2020 · 24 comments
Closed

Kubelet can't running after renew certificates #2054

pytimer opened this issue Mar 5, 2020 · 24 comments
Labels
priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@pytimer
Copy link

pytimer commented Mar 5, 2020

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:27:49Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
  • Cloud provider or hardware configuration:
    bare-metal
  • OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
Linux k8s-236 3.10.0-957.12.2.el7.x86_64 #1 SMP Mon May 20 08:41:20 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

I using the kubeadm command to renew the cluster certificates, the kubeadm alpha certs renew all works well, and i found the certificates have been changed. But when restart kubelet, kubelet can't running, it exited. I don't know why?
How to do renew control plane certficates and kubelet certficates, which docs can i reference.

What you expected to happen?

Renew all cluster certificates successfully and kubelet running.

How to reproduce it (as minimally and precisely as possible)?

  1. create cluster via kubeadm.
  2. enable kubelet dynamic config.
  3. update the host datetime, make the certificates expiration.
  4. run kubeadm alpha certs renew all command
  5. systemctl restart kubelet

Anything else we need to know?

kubelet log:

server.go:821] Client rotation is on, will bootstrap in background
bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2021-03-04 14:19:56 +0000 UTC
systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
@SataQiu
Copy link
Member

SataQiu commented Mar 5, 2020

/cc @fabriziopandini
I have reproduced this problem. After the certificates expire, execute kubeadm alpha certs renew all, and kubelet will not be able to restart normally.

@neolit123
Copy link
Member

systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

the bootstrap-kubelet.conf missing should not be an issue as long as there is a kubelet.conf, see the kubelet flags:

--bootstrap-kubeconfig string
--
  | Path to a kubeconfig file that will be used to get client certificate for kubelet. If the file specified by --kubeconfig does not exist, the bootstrap kubeconfig is used to request a client certificate from the API server. On success, a kubeconfig file referencing the generated client certificate and key is written to the path specified by --kubeconfig. The client certificate and key file will be stored in the directory pointed by --cert-dir.

i'm going to test this as well.

@neolit123
Copy link
Member

enable kubelet dynamic config.

please note that this feature is not really supported by kubeadm anymore and we don't have e2e tests for it...

@pytimer
Copy link
Author

pytimer commented Mar 5, 2020

please note that this feature is not really supported by kubeadm anymore and we don't have e2e tests for it...

I enable kubelet dynamic config manually, not use kubeadm command.

@pytimer
Copy link
Author

pytimer commented Mar 5, 2020

systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

the bootstrap-kubelet.conf missing should not be an issue as long as there is a kubelet.conf, see the kubelet flags:

Did kubelet use kubelet.conf first, if error it use bootstrap-kubelet.conf retry?

@neolit123
Copy link
Member

neolit123 commented Mar 5, 2020

Did kubelet use kubelet.conf first, if error it use bootstrap-kubelet.conf retry?

there should not be an error if the bootstrap-kubelet.conf is missing, but the kubelet.conf is present.

@neolit123
Copy link
Member

@pytimer @SataQiu i cannot reproduce the issue. here are my steps:

kubeadm init ... --kubernetes-version=v1.17.3
kubeadm alpha certs renew all
systemctl restart kubelet

if you are doing the steps exactly like so:

update the host datetime, make the certificates expiration.
run kubeadm alpha certs renew all command
systemctl restart kubelet

by running certs renew you are generating certs with the "future date".
i don't think this is correct.

/priority awaiting-more-evidence

@k8s-ci-robot k8s-ci-robot added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Mar 5, 2020
@pytimer
Copy link
Author

pytimer commented Mar 5, 2020

@pytimer @SataQiu i cannot reproduce the issue. here are my steps:

kubeadm init ... --kubernetes-version=v1.17.3
kubeadm alpha certs renew all
systemctl restart kubelet

if you are doing the steps exactly like so:

update the host datetime, make the certificates expiration.
run kubeadm alpha certs renew all command
systemctl restart kubelet

by running certs renew you are generating certs with the "future date".
i don't think this is correct.

Update the host datetime to assume certificate expired.Why did you think this is wrong, can you explain it, i don't understand it. If this is wrong, which way to test certs renew command works well? Thanks.

@neolit123
Copy link
Member

Update the host datetime to assume certificate expired.

ok, i was able to understand better what you are trying to do.

update the host datetime, make the certificates expiration.

on a running cluster this means the kubelet now has an invalid client certificate for the API server stored in the file kubelet.conf.

run kubeadm alpha certs renew all command

it's important to note that this command does not update kubelet.conf. the client credentials in there are managed by the kubelet.

systemctl restart kubelet

if the year has changed and if you run this command the kubelet will find an outdated certificate in the kubelet.conf file and it will start looking for bootstrap-kubelet.conf. but nowadays we delete this file after TLS bootstrap for security reasons.

so the kubelet will fail because it has not credentials to connect to the API.

normally this will not happen because the kubelet certificate manager has logic to monitor for certificate expiration and it will auto-rotate your client certificates once ~70% of the expiration period is reached. by forcing a date you are bypassing this mechanism and you end up with a certificate which was not rotated.

If this is wrong, which way to test certs renew command works well?

i suggest you don't touch your system date.
modify the kubeadm source code to issue certificates that expire after 10 minutes and try to renew them after expiration. this will let you verify the certs renew works.
but again, kubelet.conf is not managed by kubeadm.

hope this explains the issue.

@pytimer
Copy link
Author

pytimer commented Mar 6, 2020

Thanks @neolit123

modify the kubeadm source code to issue certificates that expire after 10 minutes and try to renew them after expiration.

Kubeadm certs renew command works well.

normally this will not happen because the kubelet certificate manager has logic to monitor for certificate expiration and it will auto-rotate your client certificates once ~70% of the expiration period is reached.

@neolit123 Update the host time make kubelet client certificates expriation time between 70%~100%, i found kubelet can't rotate certificates, the new certificates created until kubelet restarted. Do you know reason? I can't find some documents about it.

@neolit123
Copy link
Member

@neolit123 Update the host time make kubelet client certificates expriation time between 70%~100%, i found kubelet can't rotate certificates, the new certificates created until kubelet restarted. Do you know reason? I can't find some documents about it.

i don't think that updating the host time is a valid approach for testing. this could be tripping the kubelet client cert rotation. but yes, the exact process is not documented fully:
https://kubernetes.io/docs/tasks/tls/certificate-rotation/

@neolit123
Copy link
Member

neolit123 commented Mar 6, 2020

if the kubelet client certificate expired for some reason you can renew it manually. look at the contents of openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text and search online how to use the /etc/kubernetes/pki/ca.{key|crt} to sign a new one.

@dimm0
Copy link

dimm0 commented Mar 31, 2020

I've hit this in 1.17.1, after master node rebooted
/var/lib/kubelet/pki/kubelet-client-current.pem is renewed. kubeadm alpha certs check-expiration says all are renewed. Is there anything else to renew?

Please help!

@dimm0
Copy link

dimm0 commented Mar 31, 2020

I manually copied the renewed cert and key from /var/lib/kubelet/pki/kubelet-client-current.pem to /etc/kubernetes/kubelet.conf, and it started. Phew!

@neolit123
Copy link
Member

instead of copying the cert/key inside the kubeconfig, see what is suggested here:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

Warning:
On nodes created with kubeadm init, prior to kubeadm version 1.17...

@dimm0
Copy link

dimm0 commented Mar 31, 2020

Ahh! I see, thanks!

@wfsly
Copy link

wfsly commented Aug 9, 2021

I got a cluster, all certs are expired except ca.crt, so control plane are also down.
I fix it with renew certs and recreate bootstrap-kubelet.conf.

  1. kubeadm alpha certs renew all on each node. and restart kubelet, it doesn't work in my case
  2. kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/pki/ca.crt --embed-certs=true --server=https://%v:%v --kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
  3. on master node, export BOOTSTRAP_TOKEN=$(echo {{ <node_ip> }} | md5sum | awk '{print $1}' ). on worker node, export BOOTSTRAP_TOKEN=$(kubeadm token create --description kubelet-bootstrap-token --groups system:bootstrappers:$(echo <nodeIP> | md5sum | awk '{print $1}' ) --kubeconfig /root/.kube/config)
  4. kubectl config set-credentials kubelet-bootstrap --token=${BOOTSTRAP_TOKEN} --kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
  5. kubectl config set-context default --cluster=kubernetes --user=kubelet-bootstrap --kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
  6. kubectl config use-context default --kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
  7. systemctl restart kubelet

@nodesocket
Copy link

nodesocket commented Feb 4, 2022

Just ran into this and blocked. Cluster is currently down. Running Kubernetes v1.15.12.

I ran the following on the master/control plane:

ubuntu@kubernetes-master:~/.kube$ sudo kubeadm alpha certs renew all

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healtcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Then rebooted the master/control plane.

However, kubelet is failing to start still:

Feb  4 04:01:52 kubernetes-master systemd[1]: Started Kubernetes systemd probe.
Feb  4 04:01:52 kubernetes-master kubelet[7251]: I0204 04:01:52.544916    7251 server.go:425] Version: v1.15.12
Feb  4 04:01:52 kubernetes-master kubelet[7251]: I0204 04:01:52.545090    7251 plugins.go:103] No cloud provider specified.
Feb  4 04:01:52 kubernetes-master kubelet[7251]: I0204 04:01:52.545103    7251 server.go:789] Client rotation is on, will bootstrap in background
Feb  4 04:01:52 kubernetes-master kubelet[7251]: E0204 04:01:52.546511    7251 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2022-01-26 02:08:58 +0000 UTC
Feb  4 04:01:52 kubernetes-master kubelet[7251]: F0204 04:01:52.546545    7251 server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Feb  4 04:01:52 kubernetes-master systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Feb  4 04:01:52 kubernetes-master systemd[1]: kubelet.service: Failed with result 'exit-code'

All the certificates are showing as valid:

ubuntu@kubernetes-master:~/.kube$ sudo kubeadm alpha certs check-expiration
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Feb 04, 2023 04:00 UTC   364d            no
apiserver                  Feb 04, 2023 04:00 UTC   364d            no
apiserver-etcd-client      Feb 04, 2023 04:00 UTC   364d            no
apiserver-kubelet-client   Feb 04, 2023 04:00 UTC   364d            no
controller-manager.conf    Feb 04, 2023 04:00 UTC   364d            no
etcd-healthcheck-client    Feb 04, 2023 04:00 UTC   364d            no
etcd-peer                  Feb 04, 2023 04:00 UTC   364d            no
etcd-server                Feb 04, 2023 04:00 UTC   364d            no
front-proxy-client         Feb 04, 2023 04:00 UTC   364d            no
scheduler.conf             Feb 04, 2023 04:00 UTC   364d            no

What am I missing? Do I need to manually update a configuration file?

@neolit123
Copy link
Member

neolit123 commented Feb 4, 2022 via email

@nodesocket
Copy link

@neolit123 thanks for the reply.

The fix for me was to copy the contents of /etc/kubernetes/admin.conf specifically the keys client-certificate-data and client-key-data and paste those new strings into the file /etc/kubernetes/kubelet.conf under the same keys. Then just a sudo service kubelet restart.

Why is this not more obvious when invoking kubeadm alpha certs renew all? If it's not going to update kubelet.conf there should be a warning message or something. This costs me a few hours last night of time and downtime.

@neolit123
Copy link
Member

neolit123 commented Feb 5, 2022

The fix for me was to copy the contents of /etc/kubernetes/admin.conf specifically the keys client-certificate-data and client-key-data and paste those new strings into the file /etc/kubernetes/kubelet.conf under the same keys. Then just a sudo service kubelet restart.

this is not ideal, as it grants the kubelet client super admin credentials....you are also hardcoding non-rotatable credentials.
the link i've mentioned has steps to generate better scoped credentials:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#kubelet-client-cert

Why is this not more obvious when invoking kubeadm alpha certs renew all? If it's not going to update kubelet.conf there should be a warning message or something. This costs me a few hours last night of time and downtime.

this is documented:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/

kubelet.conf is not included in the list above because kubeadm configures kubelet for automatic certificate renewal with rotatable certificates under /var/lib/kubelet/pki. To repair an expired kubelet client certificate see Kubelet client certificate rotation fails.

but back to the original problem, for the credentials to expire something must have gotten wrong on the node...or you had hardcoded kubelet.conf credentials instead of pointing to the rotatable symlink (explained in the docs).
the kubelet rotates the symlinked files every ~8 months.

EDIT: also check this warning in the docs:

On nodes created with kubeadm init, prior to kubeadm version 1.17, there is a bug where you manually have to modify the contents of kubelet.conf. After kubeadm init finishes, you should update kubelet.conf to point to the rotated kubelet client certificates, by replacing client-certificate-data and client-key-data with:

@nodesocket
Copy link

nodesocket commented Feb 5, 2022

@neolit123 thanks for the detailed reply.

I think the core problem was the original Kubernetes cluster was really old and upgraded via kubeadm a few times. I'm wanting to completely rebuild from scratch using 1.21.5. I assume doing this will use the correct rotatable symlinks and update kubelet.conf to point to the those symlinks?

This is of-course an on-prem cluster (not cloud managed) and created using kubeadm.

sudo kubeadm init --config /tmp/kubernetes-master-config.yaml | sudo tee /tmp/kubeadm-init.log

@neolit123
Copy link
Member

neolit123 commented Feb 5, 2022 via email

@nikunj-diwan
Copy link

nikunj-diwan commented Aug 9, 2023

Hi @neolit123,

Can you please help me.

was faced certificate expiry issue and renewed it properly but /var/lib/kubelet/pki/kubelet-client-current.pem certificate doesn't updated so, as per the steps mentioned on below Kubernetes link I did the steps to regenerated kubelet.conf file and I made a mistake were I have changed node name in newly generated kubelet.conf file due to that not it's giving permission related error s and node doesn't coming into ready state.

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1"
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2"
kubeadm version: &version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1"
Kubernetes v1.27.1
Operating system: Red Hat Enterprise Linux Server release 7.9 (Maipo)
crictl version v1.26.0
Docker version: 23.0.3

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#kubelet-client-cert

kubelet error.txt

Thanks,
Nikunj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

8 participants