Skip to content

Commit

Permalink
introduce a simple readiness probe that doesn't check bootstrapped st…
Browse files Browse the repository at this point in the history
…atus

Recently we have seen on mondaynet the node's RPC subsystem become
unresponsive, but the node does not crash.

We normally have a readiness probe to get alerted when this happens.

But the readiness probe is overkill: it checks whether the chain is
bootstrapped by measuring the age of the head block, failing if it's
over 10 minutes.

We don't want this on test networks generally, but especially on
mondaynet. After activation, we wait for the website to be published,
other participants to come online, and quorum to be met.

If we had this probe, the chain would be marked as unbootstrapped and
stop responding to RPC and p2p, then we would never get quorum.

But we still want to be alerted when the RPC subsystem is down.

I'm introducing 2 readiness probe settings:

* `bootstrapped_readiness_probe`: identical to existing
  `readiness_probe`
* `rpc_readiness_probe`: checks for RPC only

By default, they are on. So for mondaynet, the following should be set:

```
nodes:
  nodex:
    bootstrapped_readiness_probe: false
```
  • Loading branch information
nicolasochem committed Feb 13, 2024
1 parent 6f8cac1 commit 469338d
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 11 deletions.
9 changes: 7 additions & 2 deletions charts/tezos/templates/_containers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -171,11 +171,16 @@
name: tezos-net
- containerPort: 9932
name: metrics
{{- if or (not (hasKey $.node_vals "readiness_probe")) $.node_vals.readiness_probe }}
{{- if or (not (hasKey $.node_vals "bootstrapped_readiness_probe")) $.node_vals.bootstrapped_readiness_probe }}
readinessProbe:
httpGet:
path: /is_synced
port: 31732
{{- else if or (not (hasKey $.node_vals "rpc_readiness_probe")) $.node_vals.rpc_readiness_probe }}
readinessProbe:
httpGet:
path: /version
port: 8732
{{- end }}
{{- end }}
{{- if .resources }}
Expand Down Expand Up @@ -254,7 +259,7 @@
{{- end }}

{{- define "tezos.container.sidecar" }}
{{- if or (not (hasKey $.node_vals "readiness_probe")) $.node_vals.readiness_probe }}
{{- if or (not (hasKey $.node_vals "bootstrapped_readiness_probe")) $.node_vals.bootstrapped_readiness_probe }}
{{- $sidecarResources := dict "requests" (dict "memory" "80Mi") "limits" (dict "memory" "100Mi") -}}
{{- include "tezos.generic_container" (dict "root" $
"type" "sidecar"
Expand Down
21 changes: 12 additions & 9 deletions charts/tezos/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,15 +150,18 @@ accounts: {}
# automatically for you.
# - `node_selector`: Specify a kubernetes node selector in `key: value` format
# for your tezos nodes.
# - `readiness_probe`: Attach a probe to the node. The probe checks whether
# the most recent block is recent enough. If not, the
# services will be unreachable. Defaults to True.
# True is good for RPC nodes, private nodes, and
# self-contained private chains.
# Recommended to set to False when bootstrapping a new
# chain with external bakers, such as a new test chain.
# Otherwise, the chain may become unreachable externally
# while waiting for other nodes to come online.
# - `rpc_readiness_probe`: Attach a probe to the node. The probe checks whether
# the RPC service is responsive, which should always be the
# case. Defaults to true.
# - `bootstrapped_readiness_probe`: Checks whether the most recent block is less than
# 600 seconds old. If not, the services will be unreachable.
# Overrides `readiness_probe`. Defaults to True.
# True is good for RPC nodes, private nodes, and
# self-contained private chains.
# Recommended to set to False when bootstrapping a new
# chain with external bakers, such as a new test chain.
# Otherwise, the chain may become unreachable externally
# while waiting for other nodes to come online.
# - `instances`: A list of nodes to fire up, each is a dictionary defining:
# - `bake_using_accounts`: List of account names that should be used for baking.
# - `authorized_keys`: List of account names that should be used as keys to
Expand Down

0 comments on commit 469338d

Please sign in to comment.