Questions:
- Get cluster components logs.
Solution
Logs depend on how your cluster was deployed.
For our deployment done in Cluster Architecture, Installation & Configuration here is how to get logs.
# Kubelet on all nodes
sudo journalctl -u kubelet
# API server
kubectl -n kube-system logs kube-apiserver-k8s-controlplane
# Controller Manager
kubectl -n kube-system logs kube-controller-manager-k8s-controlplane
# Scheduler
kubectl -n kube-system logs kube-scheduler-k8s-controlplane
Questions:
- Create an nginx pod with a liveness and a readiness probe for the port 80.
Solution
pod-ness.yaml:
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
run: nginx
spec:
containers:
- name: nginx
image: nginx:latest
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /
port: 80
kubectl apply -f pod-ness.yaml
kubectl describe pods nginx
...
Liveness: http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:80/ delay=5s timeout=1s period=5s #success=1 #failure=3
...
Doc: https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
Questions:
- Install the metrics server and show metrics for nodes and for pods in
kube-system
namespace.
Solution
git clone https://github.com/kubernetes-sigs/metrics-server
# Add --kubelet-insecure-tls to metrics-server/manifests/base/deployment.yaml if necessary
...
containers:
- name: metrics-server
image: gcr.io/k8s-staging-metrics-server/metrics-server:master
imagePullPolicy: IfNotPresent
args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
...
# Deploy the metrics server
kubectl apply -k metrics-server/manifests/base/
# Wait for the server to get metrics and show them
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-controlplane 271m 13% 1075Mi 28%
k8s-node-1 115m 5% 636Mi 33%
k8s-node-2 97m 4% 564Mi 29%
kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-558bd4d5db-6cdkr 6m 11Mi
coredns-558bd4d5db-k9qxs 5m 19Mi
etcd-k8s-controlplane 27m 71Mi
kube-apiserver-k8s-controlplane 112m 312Mi
kube-controller-manager-k8s-controlplane 34m 56Mi
kube-flannel-ds-nr5ms 4m 11Mi
kube-flannel-ds-vl79c 5m 13Mi
kube-flannel-ds-xvp8z 7m 14Mi
kube-proxy-jjvc9 2m 20Mi
kube-proxy-mwwnn 1m 17Mi
kube-proxy-wr4v7 1m 21Mi
kube-scheduler-k8s-controlplane 8m 18Mi
metrics-server-ffc48cc6c-g92v8 6m 16Mi
Doc: https://kubernetes.io/docs/concepts/cluster-administration/logging/
Questions:
- Get logs from the nginx pod deployed earlier and redirect them to a file.
Solution
kubectl logs nginx > nginx.log
Doc: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/
Questions:
- Launch a pod with a busybox container that launches with the
sheep 3600
command (this command doesn't exist. - Get the logs from the pod, then correct the error to make it launch
sleep 3600
.
Solution
podfail.yaml:
apiVersion: v1
kind: Pod
metadata:
labels:
run: podfail
name: podfail
spec:
containers:
- image: busybox:latest
name: podfail
args:
- sheep
- "3600"
kubectl apply -f podfail.yaml
kubectl describe pods podfail
...
Warning Failed 5s (x2 over 6s) kubelet Error: failed to create containerd task: OCI runtime create failed: container_linux.go:367: starting container process caused: exec: "sheep": executable file not found in $PATH: unknown
...
kubectl delete -f podfail.yaml
# Change sheep to sleep
kubectl apply -f podfail.yaml
...
Normal Started 4s kubelet Started container podfail #Not failing anymore
...
Doc: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/
Questions:
- Get logs from the control plane in the
kube-system
namespace.
Solution
# API server
kubectl -n kube-system logs kube-apiserver-k8s-controlplane
# Controller Manager
kubectl -n kube-system logs kube-controller-manager-k8s-controlplane
# Scheduler
kubectl -n kube-system logs kube-scheduler-k8s-controlplane
Doc: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/
Questions:
- Check the node status and the system logs for kubelet on the failing node.
Solution
kubectl describe node k8s-node-1
# From k8s-node-1 if reachable
sudo journalctl -u kubelet | grep -i error
Doc: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
Questions:
- Check the
kube-dns
service running in thekube-system
namespace and check the endpoints behind the service. Check the pods that serve the endpoints.
Solution
kubectl -n kube-system describe svc kube-dns
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.96.0.10
IPs: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.244.1.7:53,10.244.1.8:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.244.1.7:53,10.244.1.8:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.244.1.7:9153,10.244.1.8:9153
Session Affinity: None
Events: <none>
kubectl -n kube-system describe ep kube-dns
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-05-19T08:39:25Z
Subsets:
Addresses: 10.244.1.7,10.244.1.8
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns-tcp 53 TCP
dns 53 UDP
metrics 9153 TCP
Events: <none>
kubectl -n kube-system get pods -l k8s-app=kube-dns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-558bd4d5db-6cdkr 1/1 Running 1 5d3h 10.244.1.8 k8s-node-1 <none> <none>
coredns-558bd4d5db-k9qxs 1/1 Running 1 5d3h 10.244.1.7 k8s-node-1 <none> <none>