operator constanly tries to reconcile fluentbitagent because it tries to update daemonset labels #1837

siimaus · 2024-10-31T13:24:31Z

Logging operator constantly reports following error:

DaemonSet.apps "rancher-logging-eks-fluentbit" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"rancher-logging-eks", "app.kubernetes.io/managed-by":"rancher-logging-eks", "app.kubernetes.io/name":"fluentbit"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

When I enable workloads recreation as mentioned in logs by setting logging objects
spec.enableRecreateWorkloadOnImmutableFieldChange: true
then operator constantly recreates fluentbit agent daemonsets either based on some schedule or when operator is restarted etc. Deletion of CRD and recreating anew does not solve issue.

Error seems to stem from attempt to change daemonset selector but I am not familiar enough with operator code to say why it tries to change selector.

Describe the bug:
Operator constantly tries to recreate fluentbit daemonsets because it fails reconcilation
Expected behaviour:
No attempts at recreation of daemonsets unless FluentbitAgent CRD has changed.

Steps to reproduce the bug:

Install Rancher control plane (v2.9.0) on EKS
Add managed EKS cluster
Install Rancher Logging via Rancher apps.

Additional context:
Add any other context about the problem here.

Environment details:

Kubernetes version (e.g. v1.15.2): 1.30
Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): EKS
logging-operator version (e.g. 2.1.1): 4.8.0
Install method (e.g. helm or static manifests): Rancher logging helm chart - https://github.com/rancher/charts/tree/release-v2.9/charts/rancher-logging/104.1.2%2Bup4.8.0
Logs from the misbehaving component (and any other relevant logs):

{"level":"error","ts":"2024-10-31T13:03:37Z","msg":"Reconciler error","controller":"logging","controllerGroup":"logging.banzaicloud.io","controllerKind":"Logging","Logging":{"name":"rancher-logging-eks"},"namespace":"","name":"rancher-logging-eks","reconcileID":"158a57af-91ab-4485-a2dc-055889fc2e0e","error":"failed to reconcile resource: Object has to be recreated, but refusing to remove without explicitly being told so. Use logging.spec.enableRecreateWorkloadOnImmutableFieldChange to move on but make sure to understand the consequences. As of fluentd, to avoid data loss, make sure to use a persistent volume for buffers, which is the default, unless explicitly disabled or configured differently. As of fluent-bit, to avoid duplicated logs, make sure to configure a hostPath volume for the positions through logging.spec.fluentbit.spec.positiondb. : DaemonSet.apps "rancher-logging-eks-fluentbit" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"rancher-logging-eks", "app.kubernetes.io/managed-by":"rancher-logging-eks", "app.kubernetes.io/name":"fluentbit"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable","errorVerbose":"DaemonSet.apps "rancher-logging-eks-fluentbit" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"rancher-logging-eks", "app.kubernetes.io/managed-by":"rancher-logging-eks", "app.kubernetes.io/name":"fluentbit"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\nObject has to be recreated, but refusing to remove without explicitly being told so. Use logging.spec.enableRecreateWorkloadOnImmutableFieldChange to move on but make sure to understand the consequences. As of fluentd, to avoid data loss, make sure to use a persistent volume for buffers, which is the default, unless explicitly disabled or configured differently. As of fluent-bit, to avoid duplicated logs, make sure to configure a hostPath volume for the positions through logging.spec.fluentbit.spec.positiondb. \ngithub.com/cisco-open/operator-tools/pkg/reconciler.(*GenericResourceReconciler).ReconcileResource\n\t/go/pkg/mod/github.com/cisco-open/[email protected]/pkg/reconciler/resource.go:515\ngithub.com/kube-logging/logging-operator/pkg/resources/fluentbit.(*Reconciler).Reconcile\n\t/usr/local/src/logging-operator/pkg/resources/fluentbit/fluentbit.go:149\ngithub.com/kube-logging/logging-operator/controllers/logging.(*LoggingReconciler).Reconcile\n\t/usr/local/src/logging-operator/controllers/logging/logging_controller.go:280\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695\nfailed to reconcile resource\ngithub.com/kube-logging/logging-operator/pkg/resources/fluentbit.(*Reconciler).Reconcile\n\t/usr/local/src/logging-operator/pkg/resources/fluentbit/fluentbit.go:151\ngithub.com/kube-logging/logging-operator/controllers/logging.(*LoggingReconciler).Reconcile\n\t/usr/local/src/logging-operator/controllers/logging/logging_controller.go:280\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}

Resource definition (possibly in YAML format) that caused the issue, without sensitive data:

apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
  annotations:
    meta.helm.sh/release-name: rancher-logging
    meta.helm.sh/release-namespace: cattle-logging-system
  labels:
    app.kubernetes.io/instance: rancher-logging
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: rancher-logging
    app.kubernetes.io/version: 4.8.0
    helm.sh/chart: rancher-logging-104.1.1_up4.8.0
  name: rancher-logging-eks

spec:
  disableKubernetesFilter: true
  extraVolumeMounts:
    - destination: /var/log/messages
      readOnly: true
      source: /var/log/messages
  image:
    repository: rancher/mirrored-fluent-fluent-bit
    tag: 2.2.0
  inputTail:
    Buffer_Chunk_Size: 1MB
    Buffer_Max_Size: 5MB
    Parser: syslog
    Path: /var/log/messages
    Tag: eks
  nodeSelector:
    kubernetes.io/os: linux
  podPriorityClassName: system-cluster-critical
  tolerations:
    - operator: Exists

/kind bug

The text was updated successfully, but these errors were encountered:

pepov · 2024-11-25T10:29:02Z

@siimaus thanks for the report and sorry for the long delay! Have you tried to set loggins.spec. enableRecreateWorkloadOnImmutableFieldChange = true? This flag is required for the operator to let it recreate the agent daemonset in case there is a change.

pepov · 2024-12-02T12:43:01Z

@siimaus sorry I was mixing things up. Can you please provide your logging resource as well? If you let the resource to be recreated what is the difference between the original and the recreated resource exactly?

siimaus added the bug Something isn't working label Oct 31, 2024

pepov added this to the 5.0 milestone Nov 25, 2024

pepov added the triage label Nov 25, 2024

pepov removed the triage label Nov 25, 2024

pepov removed this from the 5.0 milestone Nov 25, 2024

csatib02 modified the milestones: Fluentd, 5.x Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator constanly tries to reconcile fluentbitagent because it tries to update daemonset labels #1837

operator constanly tries to reconcile fluentbitagent because it tries to update daemonset labels #1837

siimaus commented Oct 31, 2024

pepov commented Nov 25, 2024

pepov commented Dec 2, 2024

operator constanly tries to reconcile fluentbitagent because it tries to update daemonset labels #1837

operator constanly tries to reconcile fluentbitagent because it tries to update daemonset labels #1837

Comments

siimaus commented Oct 31, 2024

pepov commented Nov 25, 2024

pepov commented Dec 2, 2024