Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOM and Handler timeout issue by only returning one item in ListAllMetrics by default #623

Merged

Conversation

CatherineF-dev
Copy link
Contributor

@CatherineF-dev CatherineF-dev commented Jan 10, 2024

Adapted from #311

Also,

  1. Instead of returning empty, this PR returns 1 metric resource item. Because returning empty response is not fine for generic clients that list resources with api discovery, like the namespace garbage collector and GitOps serivces like Config Sync and ArgoCD. An APIService that returns an empty list is invalid and causes an error in client-go.

  2. added a feature-gate list-full-custom-metrics with default value = false.

  3. metricsCache is the same as before previous.

Fixes: #582, #545, #510, #458

Tested:

# HPA target deployment has 5 pods 
# now
NAME                                                 CPU(cores)   MEMORY(bytes)
custom-metrics-stackdriver-adapter-b8844b4d9-86b9h   5m           19Mi

# before
NAME                                                  CPU(cores)   MEMORY(bytes)
custom-metrics-stackdriver-adapter-6878c4fc56-r8pnt   5m           45Mi

The memory drop will be more significant in a large cluster.

"http2: stream closed" is from calling /apis/custom.metrics.k8s.io/v1beta2. Since it only returns 1 item, it's hard to have timeout error.

ListMetrics is not used in HPA, so it's safe to change it.

Screenshot 2024-01-10 at 9 22 31 AM

Cons:

API discovery around /apis/custom.metrics.k8s.io/v1beta2 returns an incomplete resource list, instead of listing all available metrics. This is fine since customers can find full metric names from GCP monitoring dashboards.The feature-gate list-full-custom-metrics returns all custom-metrics when it's true.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2"

{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"custom.metrics.k8s.io/v1beta2","resources":[{"name":"*/actions.googleapis.com|smarthome_action|camerastream|request_count","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]}]}

@CatherineF-dev
Copy link
Contributor Author

/assign @erain @courageJ

@CatherineF-dev CatherineF-dev changed the title Fix OOM issue and "http2: stream closed" issue by returning empty Lis… Fix OOM and "http2: stream closed" issue by returning empty ListAllMetrics by default Jan 10, 2024
@CatherineF-dev
Copy link
Contributor Author

/assign @erain

/assign @courageJ

@CatherineF-dev CatherineF-dev requested a review from erain January 10, 2024 15:43
@shuaich
Copy link
Member

shuaich commented Jan 10, 2024

Do we know the root cause for "http2: stream closed" and could you elaborate how returning empty ListAllMetrics will solve this issue?

@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Jan 10, 2024

ListAllMetrics is called during api discovery and isn't needed for HPA.

In this HPA case, if #target pods is 5, it will call this function 5 times at the same time. It's not scalable for a large cluster.

In my small cluster, during 60s, ListAllMetrics is called 11 times and each time returns 864502*2 bytes=1.72 megabytes

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2" | wc -c
# 864502

"http2: stream closed

It's a timeout issue, so it will be fixed after returning empty vaules.

@CatherineF-dev
Copy link
Contributor Author

/retest I just added more CI pipelines.

@CatherineF-dev CatherineF-dev force-pushed the remove-list branch 3 times, most recently from 55df82b to a983c73 Compare January 11, 2024 15:51
@CatherineF-dev CatherineF-dev changed the title Fix OOM and "http2: stream closed" issue by returning empty ListAllMetrics by default Fix OOM and "http2: stream closed" issue by only returning one item in ListAllMetrics by default Jan 11, 2024
@slash4
Copy link

slash4 commented Jan 22, 2024

Hi, google sent me here :) I'm happy to see this issue has a fix, I was wondering how I could implement it in our GKE. I tried to edit manually the file custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml to target 0.14.2 directly, but I don't see any difference after upgrading.
Capture d’écran 2024-01-22 à 15 01 06
Am I applying the correct fix to this particular issue ?

Thanks a lot for you attention :)

@CatherineF-dev
Copy link
Contributor Author

Hi @slash4, could you check the version by kubectl describe deployments -n custom-metrics | grep "custom-metrics-stackdriver-adapter"?

I didn't see this error in my cluster, will use custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml to reproduce.

@slash4
Copy link

slash4 commented Jan 22, 2024

Sure : here's the output of the command 👍

Name: custom-metrics-stackdriver-adapter
Labels: k8s-app=custom-metrics-stackdriver-adapter
run=custom-metrics-stackdriver-adapter
Selector: k8s-app=custom-metrics-stackdriver-adapter,run=custom-metrics-stackdriver-adapter
Labels: k8s-app=custom-metrics-stackdriver-adapter
run=custom-metrics-stackdriver-adapter
Service Account: custom-metrics-stackdriver-adapter
pod-custom-metrics-stackdriver-adapter:
Image: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.14.2-gke.0
NewReplicaSet: custom-metrics-stackdriver-adapter-65dd5ddc76 (1/1 replicas created)
Normal ScalingReplicaSet 53m deployment-controller Scaled up replica set custom-metrics-stackdriver-adapter-79cc7d9f68 to 1
Normal ScalingReplicaSet 51m deployment-controller Scaled down replica set custom-metrics-stackdriver-adapter-5885cc597f to 0 from 1
Normal ScalingReplicaSet 45m deployment-controller Scaled up replica set custom-metrics-stackdriver-adapter-65dd5ddc76 to 1
Normal ScalingReplicaSet 45m deployment-controller Scaled down replica set custom-metrics-stackdriver-adapter-79cc7d9f68 to 0 from 1

I'm so grateful you responded so quickly. Thanks :)

@CatherineF-dev CatherineF-dev changed the title Fix OOM and "http2: stream closed" issue by only returning one item in ListAllMetrics by default Fix OOM and Handler timeout issue by only returning one item in ListAllMetrics by default Jan 22, 2024
@CatherineF-dev
Copy link
Contributor Author

Ok.

@slash4

  1. I couldn't reproduce this issue. Could you help find which line is raising this http2: stream closed. You can find it in the cloud logging by expanding the log entry.

  2. Could you check whether it affects HPA kubectl describe hpa -n your_namespace? I think it won't.

@CatherineF-dev
Copy link
Contributor Author

@slash4
Copy link

slash4 commented Jan 22, 2024

  1. The log entry reports L:117 of writers.go :
Capture d’écran 2024-01-22 à 15 51 31
  1. I confirm HPA isnt affected. I mean it works even if I can occasionnally see this message : FailedGetResourceMetric It's just the massive amount of logs that bothers me :\

@slash4
Copy link

slash4 commented Jan 22, 2024

Regarding this : https://stackoverflow.com/questions/67073909/error-scaling-up-in-hpa-in-gke-apiserver-was-unable-to-write-a-json-response-h I'm already in External mode.

Capture d’écran 2024-01-22 à 16 06 31

Could it be the custom.googleapis.com part ? By double checking I see I access my custom metric with pipes separators, and without prefixing with custom.googleapis.com. But I don't think this is the problem.

@CatherineF-dev
Copy link
Contributor Author

Ok, I understand that spam logs are annoying. If I need more information, will let you know.

@slash4
Copy link

slash4 commented Jan 22, 2024

Thanks a lot ! Sure, I'm here if you need me :)

@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Jan 23, 2024

Hi @slash4, does your cluster still have spam logs? Just want to see whether spam logs are gone after x hours.

@hariapollo
Copy link

hariapollo commented Mar 20, 2024

@CatherineF-dev @slash4 does the spam logs are gone now? if yes please let me know what was the root cause, since we are also seeing the same errors
image

@CatherineF-dev
Copy link
Contributor Author

Could you provide detailed steps to reproduce? @hariapollo

Are you using the latest custom-metrics-stackdriver-adapter?

@hariapollo
Copy link

hariapollo commented Mar 21, 2024

Hey @CatherineF-dev, We were on v0.13.1-gke.0. Today i upgrade it to v0.14.2-gke.0. However we are getting below logs, are they ignorable?

I0321 06:45:23.151222       1 trace.go:205] Trace[1544431631]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:fb040344-251b-4334-8ce6-f602024474ad,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:45:22.460) (total time: 690ms):
Trace[1544431631]: ---"Listing from storage done" 690ms (06:45:23.151)
Trace[1544431631]: [690.652348ms] [690.652348ms] END
I0321 06:45:49.010286       1 trace.go:205] Trace[1858399614]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:24bcfb93-881a-46ed-b187-479050861754,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:45:47.039) (total time: 1971ms):
Trace[1858399614]: ---"Listing from storage done" 1971ms (06:45:49.010)
Trace[1858399614]: [1.971096723s] [1.971096723s] END
I0321 06:46:38.090887       1 trace.go:205] Trace[142037306]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:f3526283-fd52-44ba-a7c3-34c4b7d80ebc,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:46:36.763) (total time: 1327ms):
Trace[142037306]: ---"Listing from storage done" 1327ms (06:46:38.090)
Trace[142037306]: [1.327772248s] [1.327772248s] END
I0321 06:47:33.860706       1 trace.go:205] Trace[1479243979]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:0c701559-68b2-4ed9-8b47-2c88c94c7391,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:47:33.347) (total time: 512ms):
Trace[1479243979]: ---"Listing from storage done" 512ms (06:47:33.860)
Trace[1479243979]: [512.732013ms] [512.732013ms] END
I0321 06:47:48.944403       1 trace.go:205] Trace[739596128]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:d8ca1ad3-cc4c-4d0d-b6af-db963cb9a37e,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:47:47.955) (total time: 988ms):
Trace[739596128]: ---"Listing from storage done" 988ms (06:47:48.944)
Trace[739596128]: [988.624893ms] [988.624893ms] END
I0321 06:50:09.372708       1 trace.go:205] Trace[884902674]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:5e706fde-41d2-401d-808f-b3f5c389fb3f,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:50:08.752) (total time: 620ms):
Trace[884902674]: ---"Listing from storage done" 619ms (06:50:09.372)
Trace[884902674]: [620.051172ms] [620.051172ms] END
I0321 06:54:00.525744       1 trace.go:205] Trace[349212798]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:49af46bb-4de6-4bb4-b645-f4fa66a9aed3,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:53:59.776) (total time: 749ms):
Trace[349212798]: ---"Listing from storage done" 749ms (06:54:00.525)
Trace[349212798]: [749.50106ms] [749.50106ms] END
I0321 06:54:11.989052       1 trace.go:205] Trace[83724490]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:b48d6159-9670-4629-bc39-7d1ff5250e68,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:54:10.872) (total time: 1116ms):
Trace[83724490]: ---"Listing from storage done" 1116ms (06:54:11.988)
Trace[83724490]: [1.1167639s] [1.1167639s] END
I0321 06:56:07.906453       1 trace.go:205] Trace[1017136168]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:fa262d63-4728-4131-963b-9124273867c5,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:56:06.321) (total time: 1584ms):
Trace[1017136168]: ---"Listing from storage done" 1584ms (06:56:07.906)
Trace[1017136168]: [1.584645417s] [1.584645417s] END
I0321 06:57:49.306758       1 trace.go:205] Trace[1213281210]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:d4953249-6306-4a4a-b5ca-41d2a8ee2fb2,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:57:48.567) (total time: 739ms):
Trace[1213281210]: ---"Listing from storage done" 739ms (06:57:49.306)
Trace[1213281210]: [739.291186ms] [739.291186ms] END
I0321 06:59:02.535682       1 trace.go:205] Trace[1926827826]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:25d1f19b-ba1d-4ba8-84b6-c1751c2858a5,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:59:01.108) (total time: 1427ms):
Trace[1926827826]: ---"Listing from storage done" 1427ms (06:59:02.535)
Trace[1926827826]: [1.427284304s] [1.427284304s] END
I0321 06:59:05.526355       1 trace.go:205] Trace[495119366]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:0ab8dea2-fa0a-4696-a4a1-b24e1aa64b42,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 06:59:03.413) (total time: 2112ms):
Trace[495119366]: ---"Listing from storage done" 2112ms (06:59:05.526)
Trace[495119366]: [2.112847632s] [2.112847632s] END

@hariapollo
Copy link

After upgrading it to v0.14.2-gke.0 i got below error log

E0321 13:07:02.418167    3420 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
I0321 07:24:00.553580       1 adapter.go:200] serverOptions: {true true true true false   false false}
I0321 07:24:00.553651       1 adapter.go:210] ListFullCustomMetrics is disabled, which would only list 1 metric resource to reduce memory usage. Add --list-full-custom-metrics to list full metric resources for debugging.
I0321 07:24:01.754403       1 request.go:665] Waited for 1.00918148s due to client-side throttling, not priority and fairness, request: GET:https://10.124.0.1:443/apis/kafka.strimzi.io/v1beta1?timeout=32s
I0321 07:24:03.347423       1 serving.go:341] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I0321 07:24:07.552005       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0321 07:24:07.552034       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0321 07:24:07.552284       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0321 07:24:07.552087       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0321 07:24:07.552656       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0321 07:24:07.552236       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0321 07:24:07.552617       1 dynamic_serving_content.go:129] "Starting controller" name="serving-cert::apiserver.local.config/certificates/apiserver.crt::apiserver.local.config/certificates/apiserver.key"
I0321 07:24:07.553506       1 secure_serving.go:256] Serving securely on [::]:443
I0321 07:24:07.553568       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0321 07:24:07.739320       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0321 07:24:07.740818       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.741585       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.741994       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.742335       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.742633       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.742915       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.743207       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.743494       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.743779       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744091       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744368       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744583       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744790       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.744965       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.745090       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
I0321 07:24:07.839380       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
E0321 07:24:07.840068       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840227       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840440       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840605       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840695       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840720       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840843       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840865       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840941       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840977       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.841015       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840069       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.841054       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
I0321 07:24:07.840090       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0321 07:24:07.840157       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
E0321 07:24:07.840790       1 authentication.go:63] "Unable to authenticate the request" err="verifying certificate SN=23850158042506833490065535935480109479, SKID=, AKID=53:D7:E6:89:D7:7D:31:34:C7:C2:64:E4:50:89:76:08:7E:53:17:A9 failed: x509: certificate signed by unknown authority"
I0321 07:24:47.623921       1 trace.go:205] Trace[145270202]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:565dde9d-e469-431f-a9dd-72eb798df7e2,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:24:46.075) (total time: 1548ms):
Trace[145270202]: ---"Listing from storage done" 1548ms (07:24:47.623)
Trace[145270202]: [1.548559548s] [1.548559548s] END
I0321 07:28:58.587083       1 trace.go:205] Trace[1540409068]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:62aceb74-c5e7-4f22-8f76-c37ef2db794d,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:28:57.871) (total time: 715ms):
Trace[1540409068]: ---"Listing from storage done" 715ms (07:28:58.586)
Trace[1540409068]: [715.65654ms] [715.65654ms] END
I0321 07:34:06.973729       1 trace.go:205] Trace[993740446]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:320d954c-e85d-45de-9511-398536215ea7,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:34:05.496) (total time: 1477ms):
Trace[993740446]: ---"Listing from storage done" 1477ms (07:34:06.973)
Trace[993740446]: [1.477411244s] [1.477411244s] END
I0321 07:36:31.582131       1 trace.go:205] Trace[71168755]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|sidekiq_latency,user-agent:vpa-recommender/v0.0.0 (linux/amd64) kubernetes/$Format/metrics-horizontal-pod-autoscaler,audit-id:f707bdfd-1d77-4d58-8b00-caa6a6fcc2df,client:10.126.0.4,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (21-Mar-2024 07:36:30.991) (total time: 590ms):
Trace[71168755]: ---"Listing from storage done" 590ms (07:36:31.581)
Trace[71168755]: [590.539263ms] [590.539263ms] END

@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Mar 21, 2024

@hariapollo could you open a new issue? Trace logs are not related to this issue. They are ignorable.

@hariapollo
Copy link

Sure, you mean for cert auth issue right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

100% memory and CPU and never recovers
5 participants