Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring KubeEdge Edge Nodes with Prometheus(branch 1.15.0) #589

Open
wants to merge 4 commits into
base: release-1.15
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions docs/advanced/prometheus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
title: Monitoring KubeEdge Edge Nodes with Prometheus
sidebar_position: 6
---
# Monitoring KubeEdge Edge Nodes with Prometheus

## Environment Information

| Component | Version |
|------------| ---------------------------------- |
| containerd | 1.7.2 |
| k8s | 1.26.0 |
| KubeEdge | 1.15.1 |
| Jetson model type | NVIDIA Jetson Xavier NX (16GB ram) |

> Regarding the KubeEdge version description:This feature is recommended for version 1.15.0 and above. Since v1.17.0 supports edge pods using InclusterConfig, the approach is different for versions before and after v1.17.0. This document will use v1.15.1 as examples to illustrate the steps.


## Deploying Prometheus

We can quickly install using the [Helm Charts](https://prometheus-community.github.io/helm-charts/) of [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus), or we can install it manually.

It is important to pay attention to the compatibility between the Kubernetes version and kube-prometheus.

```shell
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl apply --server-side -f manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
```

You can see that a ClusterIP type Service has been created for grafana, alertmanager, and prometheus. Of course, if we want to access these two services from the Internet, we can create the corresponding Ingress objects or use NodePort type Services. Here, for simplicity, we directly use NodePort type services. Edit the 3 Services of grafana, alertmanager-main, and prometheus-k8s to change the service type to NodePort:

![](../..\static\img\advanced\image-20240524161614721.png)
zhuyaguang marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl edit svc grafana -n monitoring
kubectl edit svc alertmanager-main -n monitoring
kubectl edit svc prometheus-k8s -n monitoring
```

Due to the latest version of kube-prometheus setting NetworkPolicy, even if NodePort is configured, access is not possible. You need to modify the NetworkPolicy to allow access from the 10 network segment IP.

![](../..\static\img\advanced\image-20240530111340823.png)

```
kubectl edit NetworkPolicy prometheus-k8s -n monitoring
kubectl edit NetworkPolicy grafana -n monitoring
kubectl edit NetworkPolicy alertmanager-main -n monitoring
```

Now you can access the prometheus and grafana services via NodePort.

![](../..\static\img\advanced\image-20240530111642034.png)



## **Deploying** KubeEdge

After deploying KubeEdge, it was found that the node-exporter pod on the edge node could not start.

Edit the failed pod with `kubectl edit` and found that the kube-rbac-proxy container failed to start. Looking at the logs of this container, it was found that kube-rbac-proxy wanted to obtain the environment variables KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT, but failed to do so, hence the startup failure.

![](../..\static\img\advanced\image-20240612153658785.png)

Consulting with the KubeEdge community from Huawei, it was learned that version 1.17 of KubeEdge will add the settings for these two environment variables. The KubeEdge [community proposal link](https://github.com/wackxu/kubeedge/blob/4a7c00783de9b11e56e56968b2cc950a7d32a403/docs/proposals/edge-pod-list-watch-natively.md).

On the other hand, it is recommended to install edgemesh. After installation, pods on the edge can access kubernetes.default.svc.cluster.local:443.

#### 1. Install edgemesh

1. Configure the cloudcore configmap

`kubectl edit cm cloudcore -n kubeedge` Set dynamicController=true.

After modification, restart cloudcore `kubectl delete pod cloudcore-776ffcbbb9-s6ff8 -n kubeedge`

2. Configure the edgecore module, set metaServer=true and clusterDNS

```shell
$ vim /etc/kubeedge/config/edgecore.yaml

modules:
...
metaManager:
metaServer:
enable: true //Configure here
...

modules:
...
edged:
...
tailoredKubeletConfig:
...
clusterDNS: //Configure here
- 169.254.96.16
...

//Restart edgecore
$ systemctl restart edgecore
```

![](../..\static\img\advanced\image-20240329152628525.png)



After modification, verify whether the modification was successful.

```
$ curl 127.0.0.1:10550/api/v1/services

{"apiVersion":"v1","items":[{"apiVersion":"v1","kind":"Service","metadata":{"creationTimestamp":"2021-04-14T06:30:05Z","labels":{"component":"apiserver","provider":"kubernetes"},"name":"kubernetes","namespace":"default","resourceVersion":"147","selfLink":"default/services/kubernetes","uid":"55eeebea-08cf-4d1a-8b04-e85f8ae112a9"},"spec":{"clusterIP":"10.96.0.1","ports":[{"name":"https","port":443,"protocol":"TCP","targetPort":6443}],"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}},{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"creationTimestamp":"2021-04-14T06:30:07Z","labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"KubeDNS"},"name":"kube-dns","namespace":"kube-system","resourceVersion":"203","selfLink":"kube-system/services/kube-dns","uid":"c221ac20-cbfa-406b-812a-c44b9d82d6dc"},"spec":{"clusterIP":"10.96.0.10","ports":[{"name":"dns","port":53,"protocol":"UDP","targetPort":53},{"name":"dns-tcp","port":53,"protocol":"TCP","targetPort":53},{"name":"metrics","port":9153,"protocol":"TCP","targetPort":9153}],"selector":{"k8s-app":"kube-dns"},"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}}],"kind":"ServiceList","metadata":{"resourceVersion":"377360","selfLink":"/api/v1/services"}}

```

3. install edgemesh

```
git clone https://github.com/kubeedge/edgemesh.git
cd edgemesh

kubectl apply -f build/crds/istio/

Configure PSK and Relay Node
vim 04-configmap.yaml

relayNodes:
- nodeName: masternode ## your relay node name
advertiseAddress:
- x.x.x.x ## your relay node ip

kubectl apply -f build/agent/resources/
```

![](../..\static\img\advanced\image-20240329154436074.png)

#### 2. Modify dnsPolicy

After the deployment of edgemesh is complete, the two environment variables in node-exporter on the edge node are still empty, and it is not possible to access kubernetes.default.svc.cluster.local:443. The reason is that the DNS server configuration in the pod is incorrect. It should be 169.254.96.16, but it is the same as the host's DNS configuration.

```shell
kubectl exec -it node-exporter-hcmfg -n monitoring -- sh
Defaulted container "node-exporter" out of: node-exporter, kube-rbac-proxy
$ cat /etc/resolv.conf
nameserver 127.0.0.53
```

Change the dnsPolicy to ClusterFirstWithHostNet, then restart node-exporter.

`kubectl edit ds node-exporter -n monitoring`

dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true

#### 3. Add environment variables

vim /etc/systemd/system/edgecore.service

![](../..\static\img\advanced\image-20240329155133337.png)

```
Environment=METASERVER_DUMMY_IP=kubernetes.default.svc.cluster.local
Environment=METASERVER_DUMMY_PORT=443
```

After modification, restart edgecore

```
systemctl daemon-reload
systemctl restart edgecore
```

**node-exporter is now running!**!!!!

In the edge node, you can find that the data of the edge node has been collected by curling http://127.0.0.1:9100/metrics.

196 changes: 196 additions & 0 deletions i18n/zh/docusaurus-plugin-content-docs/current/advanced/prometheus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
---
title: 使用 Prometheus 监控 KubeEdge 边缘节点
sidebar_position: 6
---

# 使用 Prometheus 监控 KubeEdge 边缘节点

## 环境信息

| 组件 | 版本 |
| ---------- | ---------------------------------- |
| containerd | 1.7.2 |
| k8s | 1.26.0 |
| KubeEdge | 1.15.1 |
| Jetson型号 | NVIDIA Jetson Xavier NX (16GB ram) |

> 关于 KubeEdge 版本说明:建议1.15.0及以上版本使用此功能。由于 v1.17.0 支持使用 InclusterConfig 的边缘 pod,因此 v1.17.0 之前和之后的版本的方法是不同的。本文档将以 v1.15.1 为例来说明操作步骤。


## 部署 prometheus

我们可以直接使用 [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) 的 [Helm Charts](https://prometheus-community.github.io/helm-charts/) 来进行快速安装,也可以直接手动安装。

需要注意 Kubernetes 版本和 `kube-prometheus` 的兼容。

```shell
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl apply --server-side -f manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
```

可以看到上面针对 grafana、alertmanager 和 prometheus 都创建了一个类型为 ClusterIP 的 Service,当然如果我们想要在外网访问这两个服务的话可以通过创建对应的 Ingress 对象或者使用 NodePort 类型的 Service,我们这里为了简单,直接使用 NodePort 类型的服务即可,编辑 `grafana`、`alertmanager-main` 和 `prometheus-k8s` 这 3 个 Service,将服务类型更改为 NodePort:

![](../../../../..\static\img\advanced\image-20240524161614721.png)

```shell
kubectl edit svc grafana -n monitoring
kubectl edit svc alertmanager-main -n monitoring
kubectl edit svc prometheus-k8s -n monitoring
```

由于最新版本的 kube-prometheus 设置了网络策略,即使配置了 NodePort 也无法访问。需要修改 NetworkPolicy,允许 10网段的 IP访问。

![](../../../../..\static\img\advanced\image-20240530111340823.png)



```
kubectl edit NetworkPolicy prometheus-k8s -n monitoring
kubectl edit NetworkPolicy grafana -n monitoring
kubectl edit NetworkPolicy alertmanager-main -n monitoring
```

这样就可以通过 NodePort 访问 prometheus 和 grafana 服务了

![](../../../../..\static\img\advanced\image-20240530111642034.png)







## 部署 KubeEdge

部署完 KubeEdge 发现,node-exporter 在边缘节点的 pod 起不来。

去节点上查看 node-exporter 容器日志,发现是其中的 kube-rbac-proxy 这个 container 启动失败,看这个 container 的logs。发现是 kube-rbac-proxy 想要获取 KUBERNETES_SERVICE_HOST 和 KUBERNETES_SERVICE_PORT 这两个环境变量,但是获取失败,所以启动失败。

![](../../../../../static\img\advanced\image-20240612153658785.png)



和华为 KubeEdge 的社区同学咨询,KubeEdge 1.17版本将会增加这两个环境变量的设置。[KubeEdge 社区 proposals 链接](https://github.com/wackxu/kubeedge/blob/4a7c00783de9b11e56e56968b2cc950a7d32a403/docs/proposals/edge-pod-list-watch-natively.md)。

另一方面,推荐安装 edgemesh,安装之后在 edge 的 pod 上就可以访问 kubernetes.default.svc.cluster.local:443 了。

#### 1. edgemesh部署

1. 配置 cloudcore configmap

`kubectl edit cm cloudcore -n kubeedge` 设置 dynamicController=true.

修改完 重启 cloudcore `kubectl delete pod cloudcore-776ffcbbb9-s6ff8 -n kubeedge`

2. 配置 edgecore 模块,配置 metaServer=true 和 clusterDNS

```shell
$ vim /etc/kubeedge/config/edgecore.yaml

modules:
...
metaManager:
metaServer:
enable: true //配置这里
...

modules:
...
edged:
...
tailoredKubeletConfig:
...
clusterDNS: //配置这里
- 169.254.96.16
...

//重启edgecore
$ systemctl restart edgecore
```





![](../../../../..\static\img\advanced\image-20240329152628525.png)



修改完 验证是否修改成功

```
$ curl 127.0.0.1:10550/api/v1/services

{"apiVersion":"v1","items":[{"apiVersion":"v1","kind":"Service","metadata":{"creationTimestamp":"2021-04-14T06:30:05Z","labels":{"component":"apiserver","provider":"kubernetes"},"name":"kubernetes","namespace":"default","resourceVersion":"147","selfLink":"default/services/kubernetes","uid":"55eeebea-08cf-4d1a-8b04-e85f8ae112a9"},"spec":{"clusterIP":"10.96.0.1","ports":[{"name":"https","port":443,"protocol":"TCP","targetPort":6443}],"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}},{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"creationTimestamp":"2021-04-14T06:30:07Z","labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"KubeDNS"},"name":"kube-dns","namespace":"kube-system","resourceVersion":"203","selfLink":"kube-system/services/kube-dns","uid":"c221ac20-cbfa-406b-812a-c44b9d82d6dc"},"spec":{"clusterIP":"10.96.0.10","ports":[{"name":"dns","port":53,"protocol":"UDP","targetPort":53},{"name":"dns-tcp","port":53,"protocol":"TCP","targetPort":53},{"name":"metrics","port":9153,"protocol":"TCP","targetPort":9153}],"selector":{"k8s-app":"kube-dns"},"sessionAffinity":"None","type":"ClusterIP"},"status":{"loadBalancer":{}}}],"kind":"ServiceList","metadata":{"resourceVersion":"377360","selfLink":"/api/v1/services"}}

```

3. 安装 edgemesh

```
git clone https://github.com/kubeedge/edgemesh.git
cd edgemesh

kubectl apply -f build/crds/istio/

PSK 和 Relay Node 设置
vim 04-configmap.yaml

relayNodes:
- nodeName: masternode ## your relay node name
advertiseAddress:
- x.x.x.x ## your relay node ip



kubectl apply -f build/agent/resources/
```

![](../../../../..\static\img\advanced\image-20240329154436074.png)

#### 2. 修改dnsPolicy

edgemesh部署完成后,edge节点上的node-exporter中的两个境变量还是空的,也无法访问kubernetes.default.svc.cluster.local:443,原因是该pod中的dns服务器配置错误,应该是169.254.96.16的,但是却是跟宿主机一样的dns配置。

```shell
kubectl exec -it node-exporter-hcmfg -n monitoring -- sh
Defaulted container "node-exporter" out of: node-exporter, kube-rbac-proxy
$ cat /etc/resolv.conf
nameserver 127.0.0.53
```

将dnsPolicy修改为ClusterFirstWithHostNet,之后重启node-exporter,dns的配置正确

`kubectl edit ds node-exporter -n monitoring`

dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true

#### 3. 添加环境变量

vim /etc/systemd/system/edgecore.service

![](../../../../..\static\img\advanced\image-20240329155133337.png)

```
Environment=METASERVER_DUMMY_IP=kubernetes.default.svc.cluster.local
Environment=METASERVER_DUMMY_PORT=443
```

修改完重启 edgecore

```
systemctl daemon-reload
systemctl restart edgecore
```

**node-exporter 变成 running**!!!!

在边缘节点 `curl http://127.0.0.1:9100/metrics` 可以发现 采集到了边缘节点的数据。

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240329152628525.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240329154436074.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240329155133337.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240524161614721.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240530111340823.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240530111642034.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240604094828377.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/advanced/image-20240612153658785.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.