Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange lifecycle flow for redpanda cluster helm release #261

Closed
metacoma opened this issue Oct 4, 2024 · 2 comments
Closed

Strange lifecycle flow for redpanda cluster helm release #261

metacoma opened this issue Oct 4, 2024 · 2 comments

Comments

@metacoma
Copy link

metacoma commented Oct 4, 2024

During integration testing, I deploy a single-node redpanda cluster and have noticed that sometimes the deployment takes around 20 minutes, while it usually completes in 4-7 minutes.

After applying cluster crd manifest:

---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: neo4j-cdc-stream
  namespace: redpanda
spec:
  chartRef:
    timeout: 1m0s
  clusterSpec:
    resources:
      cpu:
        cores: 100m
    external:
      domain: redpanda.local
      enabled: true
      type: NodePort
    tls:
      enabled: false
      certs:
        defaults:
          caEnabled: false
        external:
          caEnabled: false
    statefulset:
      replicas: 1
      initContainers:
        setDataDirOwnership:
          enabled: true
      livenessProbe:
        timeoutSeconds: 15
      readinessProbe:
        timeoutSeconds: 15
    storage:
      persistentVolume:
        enabled: true
        size: 1Gi

I observed the following strange behavior in the helmrelease status:

$ kubectl -n redpanda get helmrelease -w
NAME        AGE     READY   STATUS
neo4j-cdc   3m42s   False   Could not load chart: failed to parse digest '': invalid checksum digest format
neo4j-cdc   4m5s    False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   4m5s    False   Could not load chart: failed to parse digest '': invalid checksum digest format
neo4j-cdc   6m23s   Unknown   Running 'install' action with timeout of 1m0s
neo4j-cdc   6m23s   Unknown   Running 'install' action with timeout of 1m0s
neo4j-cdc   7m5s    True      Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   7m35s   True      Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
.... 
neo4j-cdc   34m   False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   34m   False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   39m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   39m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   39m   False   failed to verify artifact: computed checksum 'efe3fd90bce319c79f480e13ef5ce5543cbda4850863e07c7773b363a4116c6c' doesn't match advertised ''
neo4j-cdc   43m   False   Could not load chart: GET http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz": dial tcp 10.43.82.103:80: connect: connection refused
neo4j-cdc   48m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]
neo4j-cdc   48m   True    Helm install succeeded for release redpanda/neo4j-cdc.v1 with chart [email protected]

I tried debugging the issue, specifically the error:

connection refused http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./helmchart/redpanda/redpanda-neo4j-cdc/redpanda-5.9.5.tgz"

To my surprise, the source-controller service exists, but inside the Kubernetes pod, no process is listening on the specified port.

kubectl -n flux-system get svc source-controller -o jsonpath='{.spec.ports}'
[
  {
    "name": "http",
    "port": 80,
    "protocol": "TCP",
    "targetPort": "http"
  }
]
$ kubectl -n flux-system exec -ti deployment/source-controller -- netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 :::8080                 :::*                    LISTEN      1/source-controller
tcp        0      0 :::9090                 :::*                    LISTEN      1/source-controller
tcp        0      0 :::9440                 :::*                    LISTEN      1/source-controller

Also, i found the strange errors in flux helm deployment

$ kubectl -n flux-system logs deployment/helm-controller
{
  "namespace": "redpanda",
  "name": "neo4j-cdc",
  "reconcileID": "6a8407fe-e2d5-41f6-9dd0-4ddd39bfa460",
  "error": "template: redpanda/templates/console/configmap-and-deployment.yaml:67:4: executing \"redpanda/templates/console/configmap-and-deployment.yaml\" at <include \"_shims.render-manifest\" (list \"console.ConfigMap\" $wrappedSecretValues)>: error calling include: template: redpanda/templates/_shims.tpl:279:25: executing \"_shims.render-manifest\" at <include $tpl (dict \"a\" (list $dot))>: error calling include: template: redpanda/charts/console/templates/_configmap.go.tpl:13:80: executing \"console.ConfigMap\" at <tpl (toYaml $values.console.config) $dot>: error calling tpl: cannot retrieve Template.Basepath from values inside tpl function: kafka:\n  brokers:\n  - neo4j-cdc-0.neo4j-cdc.redpanda.svc.cluster.local.:9093\n  sasl:\n    enabled: false\n  schemaRegistry:\n    enabled: true\n    tls:\n      caFilepath: \"\"\n      certFilepath: \"\"\n      enabled: false\n      insecureSkipTlsVerify: false\n      keyFilepath: \"\"\n    urls:\n    - http://neo4j-cdc-0.neo4j-cdc.redpanda.svc.cluster.local.:8081\n  tls:\n    caFilepath: \"\"\n    certFilepath: \"\"\n    enabled: false\n    insecureSkipTlsVerify: false\n    keyFilepath: \"\"\nredpanda:\n  adminApi:\n    enabled: true\n    tls:\n      caFilepath: \"\"\n      certFilepath: \"\"\n      enabled: false\n      insecureSkipTlsVerify: false\n      keyFilepath: \"\"\n    urls:\n    - http://neo4j-cdc.redpanda.svc.cluster.local.:9644: \"BasePath\" is not a value"
}

I don't fully understand where the issue is. Maybe it is related to Flux? I'm using the standard Helm chart without any custom parameters to install it.

Environments:
Node Configurations:
single-node k8s, 6 CPU, 16 GB RAM
single-node k8s, 14 CPU, 16 GB RAM

Kubernetes Versions:
k3s: 1.29.X, 1.30.X

Flux Chart Version: 2.12.4, 2.3.0

Redpanda-Operator Chart Version: 0,4.20, 0.4.21, 0.4.27 (with image-tag: v2.2.2-24.2.4)

@david-yu
Copy link
Contributor

david-yu commented Dec 9, 2024

@metacoma Could you try our latest release and set useFlux to false? https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/kubernetes/k-production-deployment/?tab=tabs-1-helm-operator#deploy-a-redpanda-cluster

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef:
    chartVersion: 5.9.16
    useFlux: false
  clusterSpec:
    #enterprise:
      #licenseSecretRef:
        #name: <secret-name>
        #key: <secret-key>
    statefulset:
      extraVolumes: |-
        - name: redpanda-io-config
          configMap:
            name: redpanda-io-config
      extraVolumeMounts: |-
        - name: redpanda-io-config
          mountPath: /etc/redpanda-io-config
      additionalRedpandaCmdFlags:
        - "--io-properties-file=/etc/redpanda-io-config/io-config.yaml"

@metacoma
Copy link
Author

@david-yu thank you, useFlux: false works for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants