Skip to content
This repository has been archived by the owner on Jul 11, 2024. It is now read-only.

Exporter can create performance problems at scale #33

Open
moio opened this issue Nov 2, 2023 · 2 comments
Open

Exporter can create performance problems at scale #33

moio opened this issue Nov 2, 2023 · 2 comments

Comments

@moio
Copy link

moio commented Nov 2, 2023

I am looking at a user case with ~1.4k one-node clusters managed by Rancher, and I see prometheus-rancher-exporter generating considerable Kubernetes API load, especially to retrieve cluster and node information.

Here is an excerpt of the 10 slowest API calls within 8 minutes:

RequestUri Verb UserAgent ResponseStatus Kubernetes API Time (seconds)
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 44.838
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 42.098
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 41.834
/apis/management.cattle.io/v3/clusterroletemplatebindings list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 40.35
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 38.722
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 38.708
/apis/management.cattle.io/v3/nodes list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 38.382
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 38.239
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 37.637
/apis/management.cattle.io/v3/clusters list prometheus-rancher-exporter/v0.0.0 (linux/amd64) kubernetes/$Format {"metadata":{},"code":200} 37.626

All are due to prometheus-rancher-exporter (actually all the way down to the top ~250 in the sample I observed).

Unfortunately I do not know enough about the exporter's internals to suggest any solutions yet.

@moio moio changed the title Exporter can create problems at scale Exporter can create performance problems at scale Nov 2, 2023
@Anddd7
Copy link

Anddd7 commented Feb 2, 2024

When we set a short timer, a similar situation occurred (although our cluster size is not yet large)

What i find is

  • there are 14+4 goroutine starting every TIMER - means 18 api calls to rancher api server at the same time
  • a api call to rancher which managed huge number of cluster/nodes will return a large json doc.
  • simultaneously executing API requests and large JSON responses can slow down the rancher server and fill up network bandwidth -> which result to timeout request, but there is no timeout settings
  • so, some goroutines are still suspending, when next TIMER coming, it becomes worse.

for ; ; <-ticker.C {
resetGaugeVecMetrics(baseMetrics)
log.Info("Updating Metrics")
go getInstalledRancherVersion(client, baseMetrics)
go getClusterConnectedState(client, baseMetrics)
go getNumberOfClusters(client, baseMetrics)
go getDistributions(client, baseMetrics)
go getNumberOfNodes(client, baseMetrics)
go getDownstreamClusterVersions(client, baseMetrics)
go getNumberOfTokens(client, baseMetrics)
go getNumberOfUsers(client, baseMetrics)
go getNumberOfProjects(client, baseMetrics)
go getProjectLabels(client, baseMetrics)
go getProjectAnnotations(client, baseMetrics)
go getProjectResources(client, baseMetrics)
go getRancherCustomResources(client, baseMetrics)
go getNodeInfo(client, baseMetrics)
}

@mattmattox
Copy link

Could this be solved by using watch handlers to get the current state then the watchers that just update the local cache?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants