ziti-router: introduced health checks fail #226

marvkis · 2024-06-16T09:58:18Z

Hi,

I tried to upgrade to the latest helm charts. For my router the deployment never became active and was restarting. After some investigation I noticed the health checks introduced in this commit 3c2afc5 fialed to succeed. After manually removing them from the deplyment it became active and everything worked as expected.

I tried to execute them manually in the pod and it seems ziti agent stats fails to find the ziti router process:

[ziggy@core-router-858f76f8fb-fvtqm ~]$ ziti agent stats
Error: no processes found matching filter, use 'ziti agent list' to list candidates
Usage:
  ziti agent stats [flags]

Flags:
  -a, --app-alias string      Alias of host application to talk to (specified in host application)
  -i, --app-id string         Id of host application to talk to (like controller or router id)
  -t, --app-type string       Type of host application to talk to (like controller or router)
  -h, --help                  help for stats
  -p, --pid uint32            Process ID of host application to talk to
  -n, --process-name string   Process name of host application to talk to
      --tcp-addr string       Type of host application to talk to (like controller or router)
      --timeout duration      Operation timeout (default 5s)


no processes found matching filter, use 'ziti agent list' to list candidates
[ziggy@core-router-858f76f8fb-fvtqm ~]$ ziti agent list
╭─────┬────────────┬────────┬─────────────┬──────────┬─────────────┬───────────╮
│ PID │ EXECUTABLE │ APP ID │ UNIX SOCKET │ APP TYPE │ APP VERSION │ APP ALIAS │
├─────┼────────────┼────────┼─────────────┼──────────┼─────────────┼───────────┤
╰─────┴────────────┴────────┴─────────────┴──────────┴─────────────┴───────────╯

Hint: I'm using security contexts as additional security layer on my containers:

  podSecurityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    privileged: false
    readOnlyRootFilesystem: true
    runAsGroup: 2171
    runAsNonRoot: true
    runAsUser: 2171
    seccompProfile:
      type: RuntimeDefault
  securityContext:
    fsGroup: 2171
    runAsNonRoot: true

might this prevent ziti agent stats to find its process?

bye,
Chris

The text was updated successfully, but these errors were encountered:

qrkourier · 2024-06-16T20:50:27Z

Yes, I think the read only root might have prevented placing the IPC socket used by the probe. It will be more elegant to use the controller and router HTTP health checks, but there's a little work needed to template those on a separate web listener's binding. That way they'll be published to the kubelet but not to all IP addresses, especially avoiding publishing them with the client API, because they could represent a vector for DoS.

marvkis · 2024-06-17T06:20:18Z

What is the path to the IPC socket? We could mount a tmpfs for the IPC to have an initial solution before migrating to the HTTP checks.

qrkourier · 2024-06-17T13:42:31Z

That might work. I've only observed the IPC socket in paths like this: /tmp/gops-agent.{pid}.sock.

marvkis · 2024-06-18T20:52:34Z

That was it. I just added

additionalVolumes: 
- name: tmp
  volumeType: emptyDir
  mountPath: /tmp

and the health checks started working. What do you think? Should we add an emptyDir to /tmp by default or should we keep it this way? At least I would extend it to limit the size of the emptyDir sizeLimit: 1Mi. Call me paranoid, but just in case something 'unexpected' happens and someone finds a way to upload data to /tmp - the OOMKiller would kill the pod and we would be informed that something is going on here...

qrkourier · 2024-06-19T22:07:35Z

It seems reasonable for us chart maintainers to define a non root emptyDir volume for temp, or to find a way to influence the socket path and ensure a writable volume is available there. I'll support using /tmp until we have a reason not to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ziti-router: introduced health checks fail #226

ziti-router: introduced health checks fail #226

marvkis commented Jun 16, 2024 •

edited

Loading

qrkourier commented Jun 16, 2024

marvkis commented Jun 17, 2024

qrkourier commented Jun 17, 2024

marvkis commented Jun 18, 2024

qrkourier commented Jun 19, 2024

ziti-router: introduced health checks fail #226

ziti-router: introduced health checks fail #226

Comments

marvkis commented Jun 16, 2024 • edited Loading

qrkourier commented Jun 16, 2024

marvkis commented Jun 17, 2024

qrkourier commented Jun 17, 2024

marvkis commented Jun 18, 2024

qrkourier commented Jun 19, 2024

marvkis commented Jun 16, 2024 •

edited

Loading