diff --git a/docs/advanced/prometheus.md b/docs/advanced/prometheus.md new file mode 100644 index 0000000000..8857f9076c --- /dev/null +++ b/docs/advanced/prometheus.md @@ -0,0 +1,180 @@ +--- +title: Monitoring KubeEdge Edge Nodes with Prometheus +sidebar_position: 6 +--- +# Monitoring KubeEdge Edge Nodes with Prometheus + +## Environment Information + +| Component | Version | +|------------| ---------------------------------- | +| containerd | 1.7.2 | +| k8s | 1.26.0 | +| KubeEdge | 1.16.0 | +| Jetson model type | NVIDIA Jetson Xavier NX (16GB ram) | + +> Regarding the KubeEdge version description:This feature is recommended for version 1.15.0 and above. Since v1.17.0 supports edge pods using InclusterConfig, the approach is different for versions before and after v1.17.0. This document will use v1.16.0 as examples to illustrate the steps. + +## Deploying Prometheus + +We can quickly install using the [Helm Charts](https://prometheus-community.github.io/helm-charts/) of [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus), or we can install it manually. + +It is important to pay attention to the compatibility between the Kubernetes version and kube-prometheus. + +```shell +git clone https://github.com/prometheus-operator/kube-prometheus.git +cd kube-prometheus +kubectl apply --server-side -f manifests/setup +kubectl wait \ + --for condition=Established \ + --all CustomResourceDefinition \ + --namespace=monitoring +kubectl apply -f manifests/ +``` + +You can see that a ClusterIP type Service has been created for grafana, alertmanager, and prometheus. Of course, if we want to access these two services from the Internet, we can create the corresponding Ingress objects or use NodePort type Services. Here, for simplicity, we directly use NodePort type services. Edit the 3 Services of grafana, alertmanager-main, and prometheus-k8s to change the service type to NodePort: + +![](../..\static\img\advanced\prometheus-svc.png) + +```shell +kubectl edit svc grafana -n monitoring +kubectl edit svc alertmanager-main -n monitoring +kubectl edit svc prometheus-k8s -n monitoring +``` + +Due to the latest version of kube-prometheus setting NetworkPolicy, even if NodePort is configured, access is not possible. You need to modify the NetworkPolicy to allow access from the 10 network segment IP. + +![](../..\static\img\advanced\NetworkPolicy.png) + +``` +kubectl edit NetworkPolicy prometheus-k8s -n monitoring +kubectl edit NetworkPolicy grafana -n monitoring +kubectl edit NetworkPolicy alertmanager-main -n monitoring +``` + +Now you can access the prometheus and grafana services via NodePort. + +![](../..\static\img\advanced\prometheus-page.png) + + + +## **Deploying** KubeEdge + +After deploying KubeEdge, it was found that the node-exporter pod on the edge node could not start. + +Edit the failed pod with `kubectl edit` and found that the kube-rbac-proxy container failed to start. Looking at the logs of this container, it was found that kube-rbac-proxy wanted to obtain the environment variables KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT, but failed to do so, hence the startup failure. + +![](../..\static\img\advanced\kubeedge-error.png) + +Consulting with the KubeEdge community from Huawei, it was learned that version 1.17 of KubeEdge will add the settings for these two environment variables. The KubeEdge [community proposal link](https://github.com/wackxu/kubeedge/blob/4a7c00783de9b11e56e56968b2cc950a7d32a403/docs/proposals/edge-pod-list-watch-natively.md). + +On the other hand, it is recommended to install edgemesh. After installation, pods on the edge can access kubernetes.default.svc.cluster.local:443. + +#### 1. Install edgemesh + +1. Configure the cloudcore configmap + + `kubectl edit cm cloudcore -n kubeedge` Set dynamicController=true. + + After modification, restart cloudcore `kubectl delete pod cloudcore-776ffcbbb9-s6ff8 -n kubeedge` + +2. Configure the edgecore module, set metaServer=true and clusterDNS + + ```shell + $ vim /etc/kubeedge/config/edgecore.yaml + + modules: + ... + metaManager: + metaServer: + enable: true //Configure here + ... + + modules: + ... + edged: + ... + tailoredKubeletConfig: + ... + clusterDNS: //Configure here + - 169.254.96.16 + ... + + //Restart edgecore + $ systemctl restart edgecore + ``` + + ![](../..\static\img\advanced\clusterDNS.png) + + + + After modification, verify whether the modification was successful. + + ``` + $ curl 127.0.0.1:10550/api/v1/services + + {"apiVersion":"v1","items":[{"apiVersion":"v1","kind":"Service","metadata":{"creationTimestamp":"2021-04-14T06:30:05Z","labels":{"component":"apiserver","provider":"kubernetes"},"name":"kubernetes","namespace":"default","resourceVersion":"147","selfLink":"default/services/kubernetes","uid":"55eeebea-08cf-4d1a-8b04-e85f8ae112a9"},"spec":{"clusterIP":"10.96.0.1","ports":[{"name":"https","port":443,"protocol":"TCP","targetPort":6443}],"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}},{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"creationTimestamp":"2021-04-14T06:30:07Z","labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"KubeDNS"},"name":"kube-dns","namespace":"kube-system","resourceVersion":"203","selfLink":"kube-system/services/kube-dns","uid":"c221ac20-cbfa-406b-812a-c44b9d82d6dc"},"spec":{"clusterIP":"10.96.0.10","ports":[{"name":"dns","port":53,"protocol":"UDP","targetPort":53},{"name":"dns-tcp","port":53,"protocol":"TCP","targetPort":53},{"name":"metrics","port":9153,"protocol":"TCP","targetPort":9153}],"selector":{"k8s-app":"kube-dns"},"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}}],"kind":"ServiceList","metadata":{"resourceVersion":"377360","selfLink":"/api/v1/services"}} + + ``` + + 3. install edgemesh + + ``` + git clone https://github.com/kubeedge/edgemesh.git + cd edgemesh + + kubectl apply -f build/crds/istio/ + + Configure PSK and Relay Node + vim 04-configmap.yaml + + relayNodes: + - nodeName: masternode ## your relay node name + advertiseAddress: + - x.x.x.x ## your relay node ip + + kubectl apply -f build/agent/resources/ + ``` + + ![](../..\static\img\advanced\edgemesh.png) + +#### 2. Modify dnsPolicy + +After the deployment of edgemesh is complete, the two environment variables in node-exporter on the edge node are still empty, and it is not possible to access kubernetes.default.svc.cluster.local:443. The reason is that the DNS server configuration in the pod is incorrect. It should be 169.254.96.16, but it is the same as the host's DNS configuration. + +```shell +kubectl exec -it node-exporter-hcmfg -n monitoring -- sh +Defaulted container "node-exporter" out of: node-exporter, kube-rbac-proxy +$ cat /etc/resolv.conf +nameserver 127.0.0.53 +``` + +Change the dnsPolicy to ClusterFirstWithHostNet, then restart node-exporter. + +`kubectl edit ds node-exporter -n monitoring` + + dnsPolicy: ClusterFirstWithHostNet + hostNetwork: true + +#### 3. Add environment variables + +vim /etc/systemd/system/edgecore.service + +![](../..\static\img\advanced\env.png) + +``` +Environment=METASERVER_DUMMY_IP=kubernetes.default.svc.cluster.local +Environment=METASERVER_DUMMY_PORT=443 +``` + +After modification, restart edgecore + +``` +systemctl daemon-reload +systemctl restart edgecore +``` + +**node-exporter is now running!**!!!! + +In the edge node, you can find that the data of the edge node has been collected by curling http://127.0.0.1:9100/metrics. + diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/advanced/prometheus.md b/i18n/zh/docusaurus-plugin-content-docs/current/advanced/prometheus.md new file mode 100644 index 0000000000..6c07ac312a --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/advanced/prometheus.md @@ -0,0 +1,195 @@ +--- +title: 使用 Prometheus 监控 KubeEdge 边缘节点 +sidebar_position: 6 +--- + +# 使用 Prometheus 监控 KubeEdge 边缘节点 + +## 环境信息 + +| 组件 | 版本 | +| ---------- | ---------------------------------- | +| containerd | 1.7.2 | +| k8s | 1.26.0 | +| KubeEdge | 1.16.0 | +| Jetson型号 | NVIDIA Jetson Xavier NX (16GB ram) | + +> 关于 KubeEdge 版本说明:建议1.15.0及以上版本使用此功能。由于 v1.17.0 支持使用 InclusterConfig 的边缘 pod,因此 v1.17.0 之前和之后的版本的方法是不同的。本文档将以 v1.16.0 为例来说明操作步骤。 + +## 部署 prometheus + +我们可以直接使用 [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) 的 [Helm Charts](https://prometheus-community.github.io/helm-charts/) 来进行快速安装,也可以直接手动安装。 + +需要注意 Kubernetes 版本和 `kube-prometheus` 的兼容。 + +```shell +git clone https://github.com/prometheus-operator/kube-prometheus.git +cd kube-prometheus +kubectl apply --server-side -f manifests/setup +kubectl wait \ + --for condition=Established \ + --all CustomResourceDefinition \ + --namespace=monitoring +kubectl apply -f manifests/ +``` + +可以看到上面针对 grafana、alertmanager 和 prometheus 都创建了一个类型为 ClusterIP 的 Service,当然如果我们想要在外网访问这两个服务的话可以通过创建对应的 Ingress 对象或者使用 NodePort 类型的 Service,我们这里为了简单,直接使用 NodePort 类型的服务即可,编辑 `grafana`、`alertmanager-main` 和 `prometheus-k8s` 这 3 个 Service,将服务类型更改为 NodePort: + +![](../../../../..\static\img\advanced\prometheus-svc.png) + +```shell +kubectl edit svc grafana -n monitoring +kubectl edit svc alertmanager-main -n monitoring +kubectl edit svc prometheus-k8s -n monitoring +``` + +由于最新版本的 kube-prometheus 设置了网络策略,即使配置了 NodePort 也无法访问。需要修改 NetworkPolicy,允许 10网段的 IP访问。 + +![](../../../../..\static\img\advanced\NetworkPolicy.png) + + + +``` +kubectl edit NetworkPolicy prometheus-k8s -n monitoring +kubectl edit NetworkPolicy grafana -n monitoring +kubectl edit NetworkPolicy alertmanager-main -n monitoring +``` + +这样就可以通过 NodePort 访问 prometheus 和 grafana 服务了 + +![](../../../../..\static\img\advanced\prometheus-page.png) + + + + + + + +## 部署 KubeEdge + +部署完 KubeEdge 发现,node-exporter 在边缘节点的 pod 起不来。 + +去节点上查看 node-exporter 容器日志,发现是其中的 kube-rbac-proxy 这个 container 启动失败,看这个 container 的logs。发现是 kube-rbac-proxy 想要获取 KUBERNETES_SERVICE_HOST 和 KUBERNETES_SERVICE_PORT 这两个环境变量,但是获取失败,所以启动失败。 + +![](../../../../../static\img\advanced\kubeedge-error.png) + + + +和华为 KubeEdge 的社区同学咨询,KubeEdge 1.17版本将会增加这两个环境变量的设置。[KubeEdge 社区 proposals 链接](https://github.com/wackxu/kubeedge/blob/4a7c00783de9b11e56e56968b2cc950a7d32a403/docs/proposals/edge-pod-list-watch-natively.md)。 + +另一方面,推荐安装 edgemesh,安装之后在 edge 的 pod 上就可以访问 kubernetes.default.svc.cluster.local:443 了。 + +#### 1. edgemesh部署 + +1. 配置 cloudcore configmap + + `kubectl edit cm cloudcore -n kubeedge` 设置 dynamicController=true. + + 修改完 重启 cloudcore `kubectl delete pod cloudcore-776ffcbbb9-s6ff8 -n kubeedge` + +2. 配置 edgecore 模块,配置 metaServer=true 和 clusterDNS + + ```shell + $ vim /etc/kubeedge/config/edgecore.yaml + + modules: + ... + metaManager: + metaServer: + enable: true //配置这里 + ... + + modules: + ... + edged: + ... + tailoredKubeletConfig: + ... + clusterDNS: //配置这里 + - 169.254.96.16 + ... + + //重启edgecore + $ systemctl restart edgecore + ``` + + + + + + ![](../../../../..\static\img\advanced\clusterDNS.png) + + + + 修改完 验证是否修改成功 + + ``` + $ curl 127.0.0.1:10550/api/v1/services + + {"apiVersion":"v1","items":[{"apiVersion":"v1","kind":"Service","metadata":{"creationTimestamp":"2021-04-14T06:30:05Z","labels":{"component":"apiserver","provider":"kubernetes"},"name":"kubernetes","namespace":"default","resourceVersion":"147","selfLink":"default/services/kubernetes","uid":"55eeebea-08cf-4d1a-8b04-e85f8ae112a9"},"spec":{"clusterIP":"10.96.0.1","ports":[{"name":"https","port":443,"protocol":"TCP","targetPort":6443}],"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}},{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"creationTimestamp":"2021-04-14T06:30:07Z","labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"KubeDNS"},"name":"kube-dns","namespace":"kube-system","resourceVersion":"203","selfLink":"kube-system/services/kube-dns","uid":"c221ac20-cbfa-406b-812a-c44b9d82d6dc"},"spec":{"clusterIP":"10.96.0.10","ports":[{"name":"dns","port":53,"protocol":"UDP","targetPort":53},{"name":"dns-tcp","port":53,"protocol":"TCP","targetPort":53},{"name":"metrics","port":9153,"protocol":"TCP","targetPort":9153}],"selector":{"k8s-app":"kube-dns"},"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}}],"kind":"ServiceList","metadata":{"resourceVersion":"377360","selfLink":"/api/v1/services"}} + + ``` + + 3. 安装 edgemesh + + ``` + git clone https://github.com/kubeedge/edgemesh.git + cd edgemesh + + kubectl apply -f build/crds/istio/ + + PSK 和 Relay Node 设置 + vim 04-configmap.yaml + + relayNodes: + - nodeName: masternode ## your relay node name + advertiseAddress: + - x.x.x.x ## your relay node ip + + + + kubectl apply -f build/agent/resources/ + ``` + + ![](../../../../..\static\img\advanced\edgemesh.png) + +#### 2. 修改dnsPolicy + +edgemesh部署完成后,edge节点上的node-exporter中的两个境变量还是空的,也无法访问kubernetes.default.svc.cluster.local:443,原因是该pod中的dns服务器配置错误,应该是169.254.96.16的,但是却是跟宿主机一样的dns配置。 + +```shell +kubectl exec -it node-exporter-hcmfg -n monitoring -- sh +Defaulted container "node-exporter" out of: node-exporter, kube-rbac-proxy +$ cat /etc/resolv.conf +nameserver 127.0.0.53 +``` + +将dnsPolicy修改为ClusterFirstWithHostNet,之后重启node-exporter,dns的配置正确 + +`kubectl edit ds node-exporter -n monitoring` + + dnsPolicy: ClusterFirstWithHostNet + hostNetwork: true + +#### 3. 添加环境变量 + +vim /etc/systemd/system/edgecore.service + +![](../../../../..\static\img\advanced\env.png) + +``` +Environment=METASERVER_DUMMY_IP=kubernetes.default.svc.cluster.local +Environment=METASERVER_DUMMY_PORT=443 +``` + +修改完重启 edgecore + +``` +systemctl daemon-reload +systemctl restart edgecore +``` + +**node-exporter 变成 running**!!!! + +在边缘节点 `curl http://127.0.0.1:9100/metrics` 可以发现 采集到了边缘节点的数据。 + diff --git a/static/img/advanced/NetworkPolicy.png b/static/img/advanced/NetworkPolicy.png new file mode 100644 index 0000000000..19ce00811f Binary files /dev/null and b/static/img/advanced/NetworkPolicy.png differ diff --git a/static/img/advanced/clusterDNS.png b/static/img/advanced/clusterDNS.png new file mode 100644 index 0000000000..db0c915bcb Binary files /dev/null and b/static/img/advanced/clusterDNS.png differ diff --git a/static/img/advanced/edgemesh.png b/static/img/advanced/edgemesh.png new file mode 100644 index 0000000000..a44bde9c48 Binary files /dev/null and b/static/img/advanced/edgemesh.png differ diff --git a/static/img/advanced/env.png b/static/img/advanced/env.png new file mode 100644 index 0000000000..cb68672c07 Binary files /dev/null and b/static/img/advanced/env.png differ diff --git a/static/img/advanced/kubeedge-error.png b/static/img/advanced/kubeedge-error.png new file mode 100644 index 0000000000..a4dfbc8564 Binary files /dev/null and b/static/img/advanced/kubeedge-error.png differ diff --git a/static/img/advanced/prometheus-page.png b/static/img/advanced/prometheus-page.png new file mode 100644 index 0000000000..8bd02a91d3 Binary files /dev/null and b/static/img/advanced/prometheus-page.png differ diff --git a/static/img/advanced/prometheus-svc.png b/static/img/advanced/prometheus-svc.png new file mode 100644 index 0000000000..d1ccedeb83 Binary files /dev/null and b/static/img/advanced/prometheus-svc.png differ