Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concepts guide on graceful shutdown #468

Merged
merged 5 commits into from
Oct 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions modules/concepts/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@
** xref:resources.adoc[]
** xref:s3.adoc[]
** xref:tls_server_verification.adoc[]
** xref:pod_placement.adoc[]
** xref:overrides.adoc[]
** xref:duration.adoc[]
** xref:operations/index.adoc[]
*** xref:operations/cluster_operations.adoc[]
*** xref:operations/pod_placement.adoc[]
*** xref:operations/pod_disruptions.adoc[]
*** xref:operations/pod_placement.adoc[]
*** xref:operations/graceful_shutdown.adoc[]
35 changes: 35 additions & 0 deletions modules/concepts/pages/operations/graceful_shutdown.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
= Graceful shutdown
sbernauer marked this conversation as resolved.
Show resolved Hide resolved

The article https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace[Kubernetes best practices: terminating with grace] describes how a graceful shutdown works in Kubernetes.

Our operators add the needed shutdown mechanism for their products that support graceful shutdown.

They also configure a sensible amount of time Pods are granted to properly shut down without disrupting the availability of the product.
If you are not satisfied with the default values, you can set the graceful shutdown timeout as follows:

[source,yaml]
----
spec:
workers:
config:
gracefulShutdownTimeout: 1h # Set it for all worker roleGroups
roleGroups:
normal: # Will use 1h from the worker role config
replicas: 1
long: # Will use 6h from the roleGroup config below
replicas: 1
config:
gracefulShutdownTimeout: 6h # Set it only for this specific roleGroup
----

The individual default timeouts are documented in the specific operators at the `Operations -> Graceful shutdown` usage-guide.
adwk67 marked this conversation as resolved.
Show resolved Hide resolved

== Kubernetes cluster requirements
Pods need to have the ability to take as long as they need to gracefully shut down without getting killed.

Imagine the situation that you set the graceful shutdown period to 24 hours.
In the case of e.g. an on-premise Kubernetes cluster the Kubernetes infrastructure team may want to drain the Kubernetes node so that they can do regular maintenance, such as rebooting the node.
They will have some upper limit on how long they will wait for Pods on the Node to terminate before they reboot the Kubernetes node, regardless of any Pods that are still running.

When setting up a production cluster, you need to check with your Kubernetes administrator (or cloud provider) what time period your Pods have to terminate gracefully.
It is not sufficient to have a look at the `spec.terminationGracePeriodSeconds` and come to the conclusion that the Pods have e.g. 24 hours to gracefully shut down, as e.g. an administrator can reboot the Kubernetes node before the time period is reached.
2 changes: 1 addition & 1 deletion modules/concepts/pages/operations/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Make sure to go through the following checklist to achieve the maximum level of
Many HA capable products offer a way to gracefully shut down the service running within the Pod.
The flow is as follows: Kubernetes wants to shut down the Pod and calls a hook into the Pod, which in turn interacts with the product, signaling it to gracefully shut down.
The final deletion of the Pod is then blocked until the product has successfully migrated running workloads away from the Pod that is to be shut down.
Details covering the graceful shutdown mechanism are described in the actual operator documentation.
Details covering the graceful shutdown mechanism are described in xref:operations/graceful_shutdown.adoc[] as well as the actual operator documentation.
+
WARNING: Graceful shutdown is not implemented for all products yet. Please check the documentation specific to the product operator to see if it is supported (such as e.g. xref:trino:usage-guide/operations/graceful-shutdown.adoc[the documentation for Trino].

Expand Down