Strange lifecycle flow for redpanda cluster helm release #261

metacoma · 2024-10-04T13:47:23Z

During integration testing, I deploy a single-node redpanda cluster and have noticed that sometimes the deployment takes around 20 minutes, while it usually completes in 4-7 minutes.

After applying cluster crd manifest:

---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: neo4j-cdc-stream
  namespace: redpanda
spec:
  chartRef:
    timeout: 1m0s
  clusterSpec:
    resources:
      cpu:
        cores: 100m
    external:
      domain: redpanda.local
      enabled: true
      type: NodePort
    tls:
      enabled: false
      certs:
        defaults:
          caEnabled: false
        external:
          caEnabled: false
    statefulset:
      replicas: 1
      initContainers:
        setDataDirOwnership:
          enabled: true
      livenessProbe:
        timeoutSeconds: 15
      readinessProbe:
        timeoutSeconds: 15
    storage:
      persistentVolume:
        enabled: true
        size: 1Gi

I observed the following strange behavior in the helmrelease status:

$ kubectl -n redpanda get helmrelease -w
NAME        AGE     READY   STATUS
neo4j-cdc   3m42s   False   Could not load chart: failed to parse digest '': invalid checksum digest format
neo4j-cdc   4m5s    False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   4m5s    False   Could not load chart: failed to parse digest '': invalid checksum digest format
neo4j-cdc   6m23s   Unknown   Running 'install' action with timeout of 1m0s
neo4j-cdc   6m23s   Unknown   Running 'install' action with timeout of 1m0s
neo4j-cdc   7m5s    True      Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   7m35s   True      Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
.... 
neo4j-cdc   34m   False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   34m   False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   39m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   39m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   39m   False   failed to verify artifact: computed checksum 'efe3fd90bce319c79f480e13ef5ce5543cbda4850863e07c7773b363a4116c6c' doesn't match advertised ''
neo4j-cdc   43m   False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   48m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   48m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]

I tried debugging the issue, specifically the error:

connection refused http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz"

To my surprise, the source-controller service exists, but inside the Kubernetes pod, no process is listening on the specified port.

kubectl -n flux-system get svc source-controller -o jsonpath='{.spec.ports}'
[
  {
    "name": "http",
    "port": 80,
    "protocol": "TCP",
    "targetPort": "http"
  }
]
$ kubectl -n flux-system exec -ti deployment/source-controller -- netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 :::8080                 :::*                    LISTEN      1/source-controller
tcp        0      0 :::9090                 :::*                    LISTEN      1/source-controller
tcp        0      0 :::9440                 :::*                    LISTEN      1/source-controller

Also, i found the strange errors in flux helm deployment

$ kubectl -n flux-system logs deployment/helm-controller
{
  "namespace": "redpanda",
  "name": "neo4j-cdc",
  "reconcileID": "6a8407fe-e2d5-41f6-9dd0-4ddd39bfa460",
  "error": "template: redpanda/templates/console/configmap-and-deployment.yaml:67:4: executing \"redpanda/templates/console/configmap-and-deployment.yaml\" at <include \"_shims.render-manifest\" (list \"console.ConfigMap\" $wrappedSecretValues)>: error calling include: template: redpanda/templates/_shims.tpl:279:25: executing \"_shims.render-manifest\" at <include $tpl (dict \"a\" (list $dot))>: error calling include: template: redpanda/charts/console/templates/_configmap.go.tpl:13:80: executing \"console.ConfigMap\" at <tpl (toYaml $values.console.config) $dot>: error calling tpl: cannot retrieve Template.Basepath from values inside tpl function: kafka:\n  brokers:\n  - neo4j-cdc-0.neo4j-cdc.redpanda.svc.cluster.local.:9093\n  sasl:\n    enabled: false\n  schemaRegistry:\n    enabled: true\n    tls:\n      caFilepath: \"\"\n      certFilepath: \"\"\n      enabled: false\n      insecureSkipTlsVerify: false\n      keyFilepath: \"\"\n    urls:\n    - http://neo4j-cdc-0.neo4j-cdc.redpanda.svc.cluster.local.:8081\n  tls:\n    caFilepath: \"\"\n    certFilepath: \"\"\n    enabled: false\n    insecureSkipTlsVerify: false\n    keyFilepath: \"\"\nredpanda:\n  adminApi:\n    enabled: true\n    tls:\n      caFilepath: \"\"\n      certFilepath: \"\"\n      enabled: false\n      insecureSkipTlsVerify: false\n      keyFilepath: \"\"\n    urls:\n    - http://neo4j-cdc.redpanda.svc.cluster.local.:9644: \"BasePath\" is not a value"
}

I don't fully understand where the issue is. Maybe it is related to Flux? I'm using the standard Helm chart without any custom parameters to install it.

Environments:
Node Configurations:
single-node k8s, 6 CPU, 16 GB RAM
single-node k8s, 14 CPU, 16 GB RAM

Kubernetes Versions:
k3s: 1.29.X, 1.30.X

Flux Chart Version: 2.12.4, 2.3.0

Redpanda-Operator Chart Version: 0,4.20, 0.4.21, 0.4.27 (with image-tag: v2.2.2-24.2.4)

The text was updated successfully, but these errors were encountered:

david-yu · 2024-12-09T23:04:02Z

@metacoma Could you try our latest release and set useFlux to false? https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/kubernetes/k-production-deployment/?tab=tabs-1-helm-operator#deploy-a-redpanda-cluster

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef:
    chartVersion: 5.9.16
    useFlux: false
  clusterSpec:
    #enterprise:
      #licenseSecretRef:
        #name: <secret-name>
        #key: <secret-key>
    statefulset:
      extraVolumes: |-
        - name: redpanda-io-config
          configMap:
            name: redpanda-io-config
      extraVolumeMounts: |-
        - name: redpanda-io-config
          mountPath: /etc/redpanda-io-config
      additionalRedpandaCmdFlags:
        - "--io-properties-file=/etc/redpanda-io-config/io-config.yaml"

metacoma · 2024-12-13T18:53:44Z

@david-yu thank you, useFlux: false works for me

metacoma mentioned this issue Oct 4, 2024

bug: redpanda helm release in false state mindwm/mindwm-gitops#118

Closed

metacoma mentioned this issue Oct 16, 2024

source-controller: Could not load chart: connection refused fluxcd-community/helm-charts#231

Closed

metacoma closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange lifecycle flow for redpanda cluster helm release #261

Strange lifecycle flow for redpanda cluster helm release #261

metacoma commented Oct 4, 2024

david-yu commented Dec 9, 2024 •

edited

Loading

metacoma commented Dec 13, 2024

Strange lifecycle flow for redpanda cluster helm release #261

Strange lifecycle flow for redpanda cluster helm release #261

Comments

metacoma commented Oct 4, 2024

david-yu commented Dec 9, 2024 • edited Loading

metacoma commented Dec 13, 2024

david-yu commented Dec 9, 2024 •

edited

Loading