Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default pod topology spread constraints to cluster deployment #245

Open
jawnsy opened this issue Aug 20, 2023 · 3 comments
Open

Add default pod topology spread constraints to cluster deployment #245

jawnsy opened this issue Aug 20, 2023 · 3 comments

Comments

@jawnsy
Copy link
Contributor

jawnsy commented Aug 20, 2023

Summary

Add a default pod topology spread constraint the cluster deployment to prefer scheduling on different nodes, so that failure of a node or zone does not result in an outage of the SpiceDB cluster.

Background

When running in a high availability configuration (multiple replicas of SpiceDB), the Kubernetes scheduler may place all the nodes in the same failure domain, such as a particular node or a particular availability zone. In rare instances, this can cause a brief outage:

  • If all replicas happen to be in the same node and the node fails, then it may take a moment for Kubernetes to detect that failure and re-schedule pods on other worker nodes in the cluster
  • If all replicas happen to be in the same zone and the entire availability zone fails, then it may take some time for Kubernetes to detect the failure

Workaround

A solution to this is to add pod topology spread constraints to the resulting pods, with a whenUnsatisfiable setting of ScheduleAnyway to prevent issues on single-zone clusters. Nodes and availability zones have standard well-known labels for this purpose:

The following SpiceDBCluster patches can do this:

---
apiVersion: authzed.com/v1alpha1
kind: SpiceDBCluster
metadata:
  name: spicedb
spec:
  channel: stable
  config:
    datastoreEngine: postgres
    logLevel: info
    replicas: 3
    serviceAccountName: spicedb
  patches:
    - kind: Deployment
      patch:
        op: add
        path: /spec/template/spec/topologySpreadConstraints
        value:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: ScheduleAnyway
            labelSelector:
              matchLabels:
                authzed.com/cluster: spicedb
                authzed.com/cluster-component: spicedb
          - maxSkew: 1
            topologyKey: topology.kubernetes.io/zone
            whenUnsatisfiable: ScheduleAnyway
            labelSelector:
              matchLabels:
                authzed.com/cluster: spicedb
                authzed.com/cluster-component: spicedb
  secretName: spicedb
@jawnsy jawnsy changed the title Add default pod topology spread constraints Add default pod topology spread constraints to cluster deployment Aug 20, 2023
@ecordell
Copy link
Contributor

ecordell commented Aug 20, 2023

My rule of thumb for patches is that they are for things that the operator itself has no opinion about (i.e. no operator behavior changes with different topology constraints).

It can be hard to make good defaults for things like this. The ones you have here for example don't match what we use in some of our production environments, and is probably not what you'd want for test/staging SpiceDBClusters.

I'm open to making this easier to configure, but scheduling-related api fields tend to have more churn than other bits of kube, so using patches for this keeps the operator forward-compatible for longer.

@jawnsy
Copy link
Contributor Author

jawnsy commented Aug 20, 2023

Thanks for triaging, @ecordell! I understand that none of these constraints are going to be suitable for everyone.

We're okay carrying patches for this indefinitely, but if you have some thoughts on an approach that would make this easier to configure, I think that would be useful. I'd be happy to contribute a feature, but I don't have a straightforward way to test changes to the operator (getting a development environment for this stuff can be tricky)

@WillDaSilva
Copy link

@jawnsy Thanks for sharing the patch. Saved me some time trying to put it together myself. It'd be useful if something like this was added to the examples directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants