Fix OOM and `Handler timeout` issue by only returning one item in ListAllMetrics by default #623

CatherineF-dev · 2024-01-10T14:38:09Z

Adapted from #311

Also,

Instead of returning empty, this PR returns 1 metric resource item. Because returning empty response is not fine for generic clients that list resources with api discovery, like the namespace garbage collector and GitOps serivces like Config Sync and ArgoCD. An APIService that returns an empty list is invalid and causes an error in client-go.
added a feature-gate list-full-custom-metrics with default value = false.
metricsCache is the same as before previous.

Tested:

# HPA target deployment has 5 pods 
# now
NAME                                                 CPU(cores)   MEMORY(bytes)
custom-metrics-stackdriver-adapter-b8844b4d9-86b9h   5m           19Mi

# before
NAME                                                  CPU(cores)   MEMORY(bytes)
custom-metrics-stackdriver-adapter-6878c4fc56-r8pnt   5m           45Mi

The memory drop will be more significant in a large cluster.

"http2: stream closed" is from calling /apis/custom.metrics.k8s.io/v1beta2. Since it only returns 1 item, it's hard to have timeout error.

ListMetrics is not used in HPA, so it's safe to change it.

Cons:

API discovery around /apis/custom.metrics.k8s.io/v1beta2 returns an incomplete resource list, instead of listing all available metrics. This is fine since customers can find full metric names from GCP monitoring dashboards.The feature-gate list-full-custom-metrics returns all custom-metrics when it's true.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2"

{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"custom.metrics.k8s.io/v1beta2","resources":[{"name":"*/actions.googleapis.com|smarthome_action|camerastream|request_count","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]}]}

CatherineF-dev · 2024-01-10T14:56:19Z

/assign @erain @courageJ

CatherineF-dev · 2024-01-10T15:42:43Z

/assign @erain

/assign @courageJ

shuaich · 2024-01-10T15:45:55Z

Do we know the root cause for "http2: stream closed" and could you elaborate how returning empty ListAllMetrics will solve this issue?

CatherineF-dev · 2024-01-10T15:51:30Z

ListAllMetrics is called during api discovery and isn't needed for HPA.

In this HPA case, if #target pods is 5, it will call this function 5 times at the same time. It's not scalable for a large cluster.

In my small cluster, during 60s, ListAllMetrics is called 11 times and each time returns 864502*2 bytes=1.72 megabytes

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2" | wc -c
# 864502

"http2: stream closed

It's a timeout issue, so it will be fixed after returning empty vaules.

CatherineF-dev · 2024-01-10T15:54:41Z

/retest I just added more CI pipelines.

custom-metrics-stackdriver-adapter/adapter.go

…item for ListCustomMetrics

slash4 · 2024-01-22T14:01:20Z

Hi, google sent me here :) I'm happy to see this issue has a fix, I was wondering how I could implement it in our GKE. I tried to edit manually the file custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml to target 0.14.2 directly, but I don't see any difference after upgrading.

Am I applying the correct fix to this particular issue ?

Thanks a lot for you attention :)

CatherineF-dev · 2024-01-22T14:29:38Z

Hi @slash4, could you check the version by kubectl describe deployments -n custom-metrics | grep "custom-metrics-stackdriver-adapter"?

I didn't see this error in my cluster, will use custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml to reproduce.

slash4 · 2024-01-22T14:33:56Z

Sure : here's the output of the command 👍

Name: custom-metrics-stackdriver-adapter
Labels: k8s-app=custom-metrics-stackdriver-adapter
run=custom-metrics-stackdriver-adapter
Selector: k8s-app=custom-metrics-stackdriver-adapter,run=custom-metrics-stackdriver-adapter
Labels: k8s-app=custom-metrics-stackdriver-adapter
run=custom-metrics-stackdriver-adapter
Service Account: custom-metrics-stackdriver-adapter
pod-custom-metrics-stackdriver-adapter:
Image: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.14.2-gke.0
NewReplicaSet: custom-metrics-stackdriver-adapter-65dd5ddc76 (1/1 replicas created)
Normal ScalingReplicaSet 53m deployment-controller Scaled up replica set custom-metrics-stackdriver-adapter-79cc7d9f68 to 1
Normal ScalingReplicaSet 51m deployment-controller Scaled down replica set custom-metrics-stackdriver-adapter-5885cc597f to 0 from 1
Normal ScalingReplicaSet 45m deployment-controller Scaled up replica set custom-metrics-stackdriver-adapter-65dd5ddc76 to 1
Normal ScalingReplicaSet 45m deployment-controller Scaled down replica set custom-metrics-stackdriver-adapter-79cc7d9f68 to 0 from 1

I'm so grateful you responded so quickly. Thanks :)

CatherineF-dev · 2024-01-22T14:41:18Z

Ok.

@slash4

I couldn't reproduce this issue. Could you help find which line is raising this http2: stream closed. You can find it in the cloud logging by expanding the log entry.
Could you check whether it affects HPA kubectl describe hpa -n your_namespace? I think it won't.

CatherineF-dev · 2024-01-22T14:45:56Z

btw, I found this one https://stackoverflow.com/questions/67073909/error-scaling-up-in-hpa-in-gke-apiserver-was-unable-to-write-a-json-response-h, could you try this?

slash4 · 2024-01-22T14:51:48Z

The log entry reports L:117 of writers.go :

I confirm HPA isnt affected. I mean it works even if I can occasionnally see this message : FailedGetResourceMetric It's just the massive amount of logs that bothers me :\

slash4 · 2024-01-22T15:04:14Z

Regarding this : https://stackoverflow.com/questions/67073909/error-scaling-up-in-hpa-in-gke-apiserver-was-unable-to-write-a-json-response-h I'm already in External mode.

Could it be the custom.googleapis.com part ? By double checking I see I access my custom metric with pipes separators, and without prefixing with custom.googleapis.com. But I don't think this is the problem.

CatherineF-dev · 2024-01-22T15:41:23Z

Ok, I understand that spam logs are annoying. If I need more information, will let you know.

slash4 · 2024-01-22T16:47:23Z

Thanks a lot ! Sure, I'm here if you need me :)

CatherineF-dev · 2024-01-23T14:47:32Z

Hi @slash4, does your cluster still have spam logs? Just want to see whether spam logs are gone after x hours.

hariapollo · 2024-03-20T19:00:47Z

@CatherineF-dev @slash4 does the spam logs are gone now? if yes please let me know what was the root cause, since we are also seeing the same errors

CatherineF-dev · 2024-03-20T19:10:18Z

Could you provide detailed steps to reproduce? @hariapollo

Are you using the latest custom-metrics-stackdriver-adapter?

hariapollo · 2024-03-21T07:05:45Z

Hey @CatherineF-dev, We were on v0.13.1-gke.0. Today i upgrade it to v0.14.2-gke.0. However we are getting below logs, are they ignorable?

I0321 06:45:23.151222       1 trace.go:205] Trace[1544431631]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:fb040344-251b-4334-8ce6-f602024474ad,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:45:22.460) (total time: 690ms):
Trace[1544431631]: ---"Listing from storage done" 690ms (06:45:23.151)
Trace[1544431631]: [690.652348ms] [690.652348ms] END
I0321 06:45:49.010286       1 trace.go:205] Trace[1858399614]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:24bcfb93-881a-46ed-b187-479050861754,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:45:47.039) (total time: 1971ms):
Trace[1858399614]: ---"Listing from storage done" 1971ms (06:45:49.010)
Trace[1858399614]: [1.971096723s] [1.971096723s] END
I0321 06:46:38.090887       1 trace.go:205] Trace[142037306]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:f3526283-fd52-44ba-a7c3-34c4b7d80ebc,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:46:36.763) (total time: 1327ms):
Trace[142037306]: ---"Listing from storage done" 1327ms (06:46:38.090)
Trace[142037306]: [1.327772248s] [1.327772248s] END
I0321 06:47:33.860706       1 trace.go:205] Trace[1479243979]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:0c701559-68b2-4ed9-8b47-2c88c94c7391,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:47:33.347) (total time: 512ms):
Trace[1479243979]: ---"Listing from storage done" 512ms (06:47:33.860)
Trace[1479243979]: [512.732013ms] [512.732013ms] END
I0321 06:47:48.944403       1 trace.go:205] Trace[739596128]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:d8ca1ad3-cc4c-4d0d-b6af-db963cb9a37e,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:47:47.955) (total time: 988ms):
Trace[739596128]: ---"Listing from storage done" 988ms (06:47:48.944)
Trace[739596128]: [988.624893ms] [988.624893ms] END
I0321 06:50:09.372708       1 trace.go:205] Trace[884902674]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:5e706fde-41d2-401d-808f-b3f5c389fb3f,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:50:08.752) (total time: 620ms):
Trace[884902674]: ---"Listing from storage done" 619ms (06:50:09.372)
Trace[884902674]: [620.051172ms] [620.051172ms] END
I0321 06:54:00.525744       1 trace.go:205] Trace[349212798]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:49af46bb-4de6-4bb4-b645-f4fa66a9aed3,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:53:59.776) (total time: 749ms):
Trace[349212798]: ---"Listing from storage done" 749ms (06:54:00.525)
Trace[349212798]: [749.50106ms] [749.50106ms] END
I0321 06:54:11.989052       1 trace.go:205] Trace[83724490]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:b48d6159-9670-4629-bc39-7d1ff5250e68,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:54:10.872) (total time: 1116ms):
Trace[83724490]: ---"Listing from storage done" 1116ms (06:54:11.988)
Trace[83724490]: [1.1167639s] [1.1167639s] END
I0321 06:56:07.906453       1 trace.go:205] Trace[1017136168]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:fa262d63-4728-4131-963b-9124273867c5,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:56:06.321) (total time: 1584ms):
Trace[1017136168]: ---"Listing from storage done" 1584ms (06:56:07.906)
Trace[1017136168]: [1.584645417s] [1.584645417s] END
I0321 06:57:49.306758       1 trace.go:205] Trace[1213281210]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:d4953249-6306-4a4a-b5ca-41d2a8ee2fb2,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:57:48.567) (total time: 739ms):
Trace[1213281210]: ---"Listing from storage done" 739ms (06:57:49.306)
Trace[1213281210]: [739.291186ms] [739.291186ms] END
I0321 06:59:02.535682       1 trace.go:205] Trace[1926827826]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:25d1f19b-ba1d-4ba8-84b6-c1751c2858a5,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:59:01.108) (total time: 1427ms):
Trace[1926827826]: ---"Listing from storage done" 1427ms (06:59:02.535)
Trace[1926827826]: [1.427284304s] [1.427284304s] END
I0321 06:59:05.526355       1 trace.go:205] Trace[495119366]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:0ab8dea2-fa0a-4696-a4a1-b24e1aa64b42,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:59:03.413) (total time: 2112ms):
Trace[495119366]: ---"Listing from storage done" 2112ms (06:59:05.526)
Trace[495119366]: [2.112847632s] [2.112847632s] END

hariapollo · 2024-03-21T10:01:10Z

After upgrading it to v0.14.2-gke.0 i got below error log

E0321 13:07:02.418167    3420 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
I0321 07:24:00.553580       1 adapter.go:200] serverOptions: {true true true true false   false false}
I0321 07:24:00.553651       1 adapter.go:210] ListFullCustomMetrics is disabled, which would only list 1 metric resource to reduce memory usage. Add --list-full-custom-metrics to list full metric resources for debugging.
I0321 07:24:01.754403       1 request.go:665] Waited for 1.00918148s due to client-side throttling, not priority and fairness, request: GET:https://10.124.0.1:443/apis/kafka.strimzi.io/v1beta1?timeout=32s
I0321 07:24:03.347423       1 serving.go:341] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I0321 07:24:07.552005       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0321 07:24:07.552034       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0321 07:24:07.552284       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0321 07:24:07.552087       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0321 07:24:07.552656       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0321 07:24:07.552236       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0321 07:24:07.552617       1 dynamic_serving_content.go:129] "Starting controller" name="serving-cert::apiserver.local.config/certificates/apiserver.crt::apiserver.local.config/certificates/apiserver.key"
I0321 07:24:07.553506       1 secure_serving.go:256] Serving securely on [::]:443
I0321 07:24:07.553568       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0321 07:24:07.739320       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0321 07:24:07.740818       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.741585       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.741994       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.742335       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.742633       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.742915       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.743207       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.743494       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.743779       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744091       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744368       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744583       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744790       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744965       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.745090       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
I0321 07:24:07.839380       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
E0321 07:24:07.840068       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840227       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840440       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840605       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840695       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840720       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840843       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840865       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840941       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840977       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.841015       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840069       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.841054       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
I0321 07:24:07.840090       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0321 07:24:07.840157       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840790       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
I0321 07:24:47.623921       1 trace.go:205] Trace[145270202]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:565dde9d-e469-431f-a9dd-72eb798df7e2,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:24:46.075) (total time: 1548ms):
Trace[145270202]: ---"Listing from storage done" 1548ms (07:24:47.623)
Trace[145270202]: [1.548559548s] [1.548559548s] END
I0321 07:28:58.587083       1 trace.go:205] Trace[1540409068]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:62aceb74-c5e7-4f22-8f76-c37ef2db794d,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:28:57.871) (total time: 715ms):
Trace[1540409068]: ---"Listing from storage done" 715ms (07:28:58.586)
Trace[1540409068]: [715.65654ms] [715.65654ms] END
I0321 07:34:06.973729       1 trace.go:205] Trace[993740446]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:320d954c-e85d-45de-9511-398536215ea7,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:34:05.496) (total time: 1477ms):
Trace[993740446]: ---"Listing from storage done" 1477ms (07:34:06.973)
Trace[993740446]: [1.477411244s] [1.477411244s] END
I0321 07:36:31.582131       1 trace.go:205] Trace[71168755]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:f707bdfd-1d77-4d58-8b00-caa6a6fcc2df,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:36:30.991) (total time: 590ms):
Trace[71168755]: ---"Listing from storage done" 590ms (07:36:31.581)
Trace[71168755]: [590.539263ms] [590.539263ms] END

CatherineF-dev · 2024-03-21T11:58:23Z

@hariapollo could you open a new issue? Trace logs are not related to this issue. They are ignorable.

hariapollo · 2024-03-21T12:06:19Z

Sure, you mean for cert auth issue right?

CatherineF-dev force-pushed the remove-list branch from 32a26a4 to 6d08ee5 Compare January 10, 2024 14:41

CatherineF-dev changed the title ~~Fix OOM issue and "http2: stream closed" issue by returning empty Lis…~~ Fix OOM and "http2: stream closed" issue by returning empty ListAllMetrics by default Jan 10, 2024

CatherineF-dev force-pushed the remove-list branch from 6d08ee5 to 5a3bd71 Compare January 10, 2024 15:26

CatherineF-dev requested a review from erain January 10, 2024 15:43

CatherineF-dev force-pushed the remove-list branch 3 times, most recently from 55df82b to a983c73 Compare January 11, 2024 15:51

CatherineF-dev changed the title ~~Fix OOM and "http2: stream closed" issue by returning empty ListAllMetrics by default~~ Fix OOM and "http2: stream closed" issue by only returning one item in ListAllMetrics by default Jan 11, 2024

courageJ reviewed Jan 11, 2024

View reviewed changes

custom-metrics-stackdriver-adapter/adapter.go Show resolved Hide resolved

courageJ approved these changes Jan 14, 2024

View reviewed changes

Fix OOM issue and "http2: stream closed" issue by only returning one …

673d295

…item for ListCustomMetrics

CatherineF-dev force-pushed the remove-list branch from ee5a839 to 673d295 Compare January 15, 2024 15:25

CatherineF-dev merged commit 47ac134 into GoogleCloudPlatform:master Jan 15, 2024
3 checks passed

CatherineF-dev mentioned this pull request Jan 19, 2024

Custom metrics adapter spewing errors "apiserver was unable to write a fallback JSON response: http2: stream closed" #510

Open

CatherineF-dev changed the title ~~Fix OOM and "http2: stream closed" issue by only returning one item in ListAllMetrics by default~~ Fix OOM and Handler timeout issue by only returning one item in ListAllMetrics by default Jan 22, 2024

hariapollo mentioned this pull request Mar 21, 2024

Unable to authenticate the request err="verifying certificate failed: x509: certificate signed by unknown authority"" #673

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OOM and `Handler timeout` issue by only returning one item in ListAllMetrics by default #623

Fix OOM and `Handler timeout` issue by only returning one item in ListAllMetrics by default #623

CatherineF-dev commented Jan 10, 2024 •

edited

Loading

CatherineF-dev commented Jan 10, 2024

CatherineF-dev commented Jan 10, 2024

shuaich commented Jan 10, 2024

CatherineF-dev commented Jan 10, 2024 •

edited

Loading

CatherineF-dev commented Jan 10, 2024

slash4 commented Jan 22, 2024 •

edited

Loading

CatherineF-dev commented Jan 22, 2024

slash4 commented Jan 22, 2024

CatherineF-dev commented Jan 22, 2024

CatherineF-dev commented Jan 22, 2024

slash4 commented Jan 22, 2024 •

edited

Loading

slash4 commented Jan 22, 2024 •

edited

Loading

CatherineF-dev commented Jan 22, 2024

slash4 commented Jan 22, 2024

CatherineF-dev commented Jan 23, 2024 •

edited

Loading

hariapollo commented Mar 20, 2024 •

edited

Loading

CatherineF-dev commented Mar 20, 2024

hariapollo commented Mar 21, 2024 •

edited

Loading

hariapollo commented Mar 21, 2024

CatherineF-dev commented Mar 21, 2024 •

edited

Loading

hariapollo commented Mar 21, 2024

Fix OOM and Handler timeout issue by only returning one item in ListAllMetrics by default #623

Fix OOM and Handler timeout issue by only returning one item in ListAllMetrics by default #623

Conversation

CatherineF-dev commented Jan 10, 2024 • edited Loading

Tested:

ListMetrics is not used in HPA, so it's safe to change it.

Cons:

CatherineF-dev commented Jan 10, 2024

CatherineF-dev commented Jan 10, 2024

shuaich commented Jan 10, 2024

CatherineF-dev commented Jan 10, 2024 • edited Loading

CatherineF-dev commented Jan 10, 2024

slash4 commented Jan 22, 2024 • edited Loading

CatherineF-dev commented Jan 22, 2024

slash4 commented Jan 22, 2024

CatherineF-dev commented Jan 22, 2024

CatherineF-dev commented Jan 22, 2024

slash4 commented Jan 22, 2024 • edited Loading

slash4 commented Jan 22, 2024 • edited Loading

CatherineF-dev commented Jan 22, 2024

slash4 commented Jan 22, 2024

CatherineF-dev commented Jan 23, 2024 • edited Loading

hariapollo commented Mar 20, 2024 • edited Loading

CatherineF-dev commented Mar 20, 2024

hariapollo commented Mar 21, 2024 • edited Loading

hariapollo commented Mar 21, 2024

CatherineF-dev commented Mar 21, 2024 • edited Loading

hariapollo commented Mar 21, 2024

Fix OOM and `Handler timeout` issue by only returning one item in ListAllMetrics by default #623

Fix OOM and `Handler timeout` issue by only returning one item in ListAllMetrics by default #623

CatherineF-dev commented Jan 10, 2024 •

edited

Loading

CatherineF-dev commented Jan 10, 2024 •

edited

Loading

slash4 commented Jan 22, 2024 •

edited

Loading

slash4 commented Jan 22, 2024 •

edited

Loading

slash4 commented Jan 22, 2024 •

edited

Loading

CatherineF-dev commented Jan 23, 2024 •

edited

Loading

hariapollo commented Mar 20, 2024 •

edited

Loading

hariapollo commented Mar 21, 2024 •

edited

Loading

CatherineF-dev commented Mar 21, 2024 •

edited

Loading