Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPA updater constantly fails to match the container that doesn't even exists #6215

Closed
rkashasl opened this issue Oct 20, 2023 · 7 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@rkashasl
Copy link

Hello!
We are using latest vpa chart:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: vertical-pod-autoscaler
  namespace: kube-system
spec:
  interval: 14m
  url: "https://cowboysysop.github.io/charts/"
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: vertical-pod-autoscaler
  namespace: kube-system
spec:
  install:
    createNamespace: true
    crds: CreateReplace
  upgrade:
    crds: CreateReplace
  releaseName: vertical-pod-autoscaler
  interval: 9m
  chart:
    spec:
      # renovate: registryUrl=https://cowboysysop.github.io/charts/
      chart: vertical-pod-autoscaler
      version: 7.2.0
      sourceRef:
        kind: HelmRepository
        name: vertical-pod-autoscaler
        namespace: kube-system
      interval: 14m
  values:
    admissionController:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 50Mi
        requests:
          cpu: 10m
          memory: 40Mi
    recommender:
      extraArgs:
        pod-recommendation-min-memory-mb: 30
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 250Mi
        requests:
          cpu: 10m
          memory: 150Mi
    updater:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 50Mi
        requests:
          cpu: 10m
          memory: 50Mi

However we see an errors about cert-manager container that vpa-updater pod spamming:

vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:14:53.194314       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:14:53.194621       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:15:53.202479       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:15:53.232326       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:16:53.193146       1 capping.go:79] no matching Container found for recommendation cert-manager

Here is a cer-manager deployment and it's vpa:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  interval: 14m
  url: "https://charts.jetstack.io/"
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  install:
    createNamespace: true
    crds: CreateReplace
  upgrade:
    crds: CreateReplace
  interval: 9m
  chart:
    spec:
      # renovate: registryUrl=https://charts.jetstack.io/
      chart: cert-manager
      version: v1.13.1
      sourceRef:
        kind: HelmRepository
        name: cert-manager
        namespace: cert-manager
      interval: 14m
  values:
    installCRDs: true
    serviceAccount:
      create: false
      name: certmanager-oidc
    global:
      priorityClassName: above-average
    prometheus:
      enabled: true
      servicemonitor:
        enabled: true
        prometheusInstance: prometheus-kube-prometheus-prometheus
    ingressShim:
      defaultIssuerName: letsencrypt-prod
      defaultIssuerKind: ClusterIssuer
    webhook:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 64Mi
        requests:
          memory: 32Mi
          cpu: 10m
    cainjector:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      extraArgs:
      - "--leader-elect=false"
      resources:
        limits:
          memory: 512Mi
        requests:
          memory: 128Mi
          cpu: 10m
    resources:
      limits:
        memory: 384Mi
      requests:
        memory: 160Mi
        cpu: 10m
    tolerations:
    - key: "arch"
      operator: "Equal"
      value: "arm64"
      effect: "NoSchedule"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cert-manager
  updatePolicy:
    updateMode: Recreate
    minReplicas: 1
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cert-manager-cainjector
  namespace: cert-manager
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cert-manager-cainjector
  updatePolicy:
    updateMode: Recreate
    minReplicas: 1
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cert-manager-webhook
  namespace: cert-manager
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cert-manager-webhook
  updatePolicy:
    updateMode: Recreate
    minReplicas: 1

And there is no cert-manager container name it refers to in deployment:

Name:                   cert-manager
Namespace:              cert-manager
CreationTimestamp:      Wed, 18 Oct 2023 17:14:32 +0300
Labels:                 app=cert-manager
                        app.kubernetes.io/component=controller
                        app.kubernetes.io/instance=cert-manager
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=cert-manager
                        app.kubernetes.io/version=v1.13.1
                        helm.sh/chart=cert-manager-v1.13.1
                        helm.toolkit.fluxcd.io/name=cert-manager
                        helm.toolkit.fluxcd.io/namespace=cert-manager
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: cert-manager
                        meta.helm.sh/release-namespace: cert-manager
Selector:               app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=cert-manager
                    app.kubernetes.io/component=controller
                    app.kubernetes.io/instance=cert-manager
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=cert-manager
                    app.kubernetes.io/version=v1.13.1
                    helm.sh/chart=cert-manager-v1.13.1
  Service Account:  certmanager-oidc
  Containers:
   cert-manager-controller:
    Image:       quay.io/jetstack/cert-manager-controller:v1.13.1
    Ports:       9402/TCP, 9403/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      --v=2
      --cluster-resource-namespace=$(POD_NAMESPACE)
      --leader-election-namespace=kube-system
      --acme-http01-solver-image=quay.io/jetstack/cert-manager-acmesolver:v1.13.1
      --default-issuer-name=letsencrypt-prod
      --default-issuer-kind=ClusterIssuer
      --max-concurrent-challenges=60
    Limits:
      memory:  384Mi
    Requests:
      cpu:     10m
      memory:  160Mi
    Environment:
      POD_NAMESPACE:     (v1:metadata.namespace)
    Mounts:             <none>
  Volumes:              <none>
  Priority Class Name:  above-average
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cert-manager-7d47d666f8 (1/1 replicas created)
Events:          <none>

@rkashasl rkashasl added the kind/bug Categorizes issue or PR as related to a bug. label Oct 20, 2023
@universam1
Copy link

Same issue here. It looks like that VPA recommender actually is broken, no recommendations are applied to new VPAs

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 1, 2024
@rkashasl
Copy link
Author

rkashasl commented Mar 5, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 3, 2024
@voelzmo
Copy link
Contributor

voelzmo commented Jun 3, 2024

Without additional information, I assume that this is another instance of #6744 which could be fixed with #6745

TL;DR: stale recommendations that don't have a matching Pod anymore can exist, when you e.g. renamed a Container in a Pod.

I'm closing this in favor of the above mentioned issue. Feel free to re-open with additional information.

/close

@k8s-ci-robot
Copy link
Contributor

@voelzmo: Closing this issue.

In response to this:

Without additional information, I assume that this is another instance of #6744 which could be fixed with #6745

TL;DR: stale recommendations that don't have a matching Pod anymore can exist, when you e.g. renamed a Container in a Pod.

I'm closing this in favor of the above mentioned issue. Feel free to re-open with additional information.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants