From 0d35e590116e14bd29166591204c1b4827d6cecc Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Tue, 10 Oct 2023 12:01:50 +0200 Subject: [PATCH 1/5] Add concepts guide on graceful shutdown --- .../pages/operations/graceful_shutdown.adoc | 25 +++++++++++++++++++ modules/concepts/pages/operations/index.adoc | 2 +- 2 files changed, 26 insertions(+), 1 deletion(-) create mode 100644 modules/concepts/pages/operations/graceful_shutdown.adoc diff --git a/modules/concepts/pages/operations/graceful_shutdown.adoc b/modules/concepts/pages/operations/graceful_shutdown.adoc new file mode 100644 index 000000000..28f5875f1 --- /dev/null +++ b/modules/concepts/pages/operations/graceful_shutdown.adoc @@ -0,0 +1,25 @@ += Graceful shutdown + +The article https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace[Kubernetes best practices: terminating with grace] describes very well, how graceful shutdown works in Kubernetes. + +Our operators add the needed shutdown-mechanism for all the products that support graceful shutdown. + +They also configure a sensible amount of time Pods are granted to properly shut down without disrupting the availability of the product. +If you are not satisfied with the default values, you can set the graceful shutdown timeout as follow: + +[source,yaml] +---- +spec: + workers: + config: + gracefulShutdownTimeout: 1h # Set it for all worker roleGroups + roleGroups: + normal: # Will use 1h from the worker role config + replicas: 1 + long: # Will use 6h from the roleGroup config below + replicas: 1 + config: + gracefulShutdownTimeout: 6h # Set it only for this specific roleGroup +---- + +The individual default timeouts are documented in the specific operators at the `Operations -> Graceful shutdown` usage-guide. diff --git a/modules/concepts/pages/operations/index.adoc b/modules/concepts/pages/operations/index.adoc index b25ccdf54..1203113cb 100644 --- a/modules/concepts/pages/operations/index.adoc +++ b/modules/concepts/pages/operations/index.adoc @@ -17,7 +17,7 @@ Make sure to go through the following checklist to achieve the maximum level of Many HA capable products offer a way to gracefully shut down the service running within the Pod. The flow is as follows: Kubernetes wants to shut down the Pod and calls a hook into the Pod, which in turn interacts with the product, signaling it to gracefully shut down. The final deletion of the Pod is then blocked until the product has successfully migrated running workloads away from the Pod that is to be shut down. - Details covering the graceful shutdown mechanism are described in the actual operator documentation. + Details covering the graceful shutdown mechanism are described in xref:operations/graceful_shutdown.adoc[] as well as the actual operator documentation. + WARNING: Graceful shutdown is not implemented for all products yet. Please check the documentation specific to the product operator to see if it is supported (such as e.g. xref:trino:usage-guide/operations/graceful-shutdown.adoc[the documentation for Trino]. From e8126855b26396a655b15c514b582680cd7ed27a Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Tue, 10 Oct 2023 13:02:26 +0200 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Andrew Kenworthy --- modules/concepts/pages/operations/graceful_shutdown.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/concepts/pages/operations/graceful_shutdown.adoc b/modules/concepts/pages/operations/graceful_shutdown.adoc index 28f5875f1..293850174 100644 --- a/modules/concepts/pages/operations/graceful_shutdown.adoc +++ b/modules/concepts/pages/operations/graceful_shutdown.adoc @@ -1,11 +1,11 @@ = Graceful shutdown -The article https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace[Kubernetes best practices: terminating with grace] describes very well, how graceful shutdown works in Kubernetes. +The article https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace[Kubernetes best practices: terminating with grace] describes how a graceful shutdown works in Kubernetes. -Our operators add the needed shutdown-mechanism for all the products that support graceful shutdown. +Our operators add the needed shutdown mechanism for their products that support graceful shutdown. They also configure a sensible amount of time Pods are granted to properly shut down without disrupting the availability of the product. -If you are not satisfied with the default values, you can set the graceful shutdown timeout as follow: +If you are not satisfied with the default values, you can set the graceful shutdown timeout as follows: [source,yaml] ---- From b39e3b766ded832ecc16d13321e9bf1f5fdf670e Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Tue, 10 Oct 2023 13:15:29 +0200 Subject: [PATCH 3/5] Add k8s requirements --- .../concepts/pages/operations/graceful_shutdown.adoc | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/modules/concepts/pages/operations/graceful_shutdown.adoc b/modules/concepts/pages/operations/graceful_shutdown.adoc index 293850174..debc407c0 100644 --- a/modules/concepts/pages/operations/graceful_shutdown.adoc +++ b/modules/concepts/pages/operations/graceful_shutdown.adoc @@ -23,3 +23,13 @@ spec: ---- The individual default timeouts are documented in the specific operators at the `Operations -> Graceful shutdown` usage-guide. + +== Kubernetes cluster requirements +Pods need to have the ability to take as long as they need to gracefully shut down without getting killed. + +Imagine the situation that you set the graceful shutdown period to 24 hours. +In case of e.g. an on-prem Kubernetes cluster the Kubernetes infrastructure team wants to drain the Kubernetes node, so that they can do regular maintenance, such as rebooting the node. +They will have some upper limit on how long they will wait for Pods on the Node to terminate, until they will reboot the Kubernetes node regardless of still running Pods. + +When setting up a production cluster, you need to check with your Kubernetes administrator (or cloud provider) what time period your Pods have to terminate gracefully. +It is not sufficient to have a look at the `spec.terminationGracePeriodSeconds` and come to the conclusion that the Pods have e.g. 24 hours to gracefully shut down, as e.g. an administrator can reboot the Kubernetes node before the time period is reached. From b4af463ca6901ccedc020edd7f3438d97d0c1c0b Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Tue, 10 Oct 2023 13:32:04 +0200 Subject: [PATCH 4/5] Apply suggestions from code review Co-authored-by: Andrew Kenworthy --- modules/concepts/pages/operations/graceful_shutdown.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/concepts/pages/operations/graceful_shutdown.adoc b/modules/concepts/pages/operations/graceful_shutdown.adoc index debc407c0..2a2b5fe85 100644 --- a/modules/concepts/pages/operations/graceful_shutdown.adoc +++ b/modules/concepts/pages/operations/graceful_shutdown.adoc @@ -28,8 +28,8 @@ The individual default timeouts are documented in the specific operators at the Pods need to have the ability to take as long as they need to gracefully shut down without getting killed. Imagine the situation that you set the graceful shutdown period to 24 hours. -In case of e.g. an on-prem Kubernetes cluster the Kubernetes infrastructure team wants to drain the Kubernetes node, so that they can do regular maintenance, such as rebooting the node. -They will have some upper limit on how long they will wait for Pods on the Node to terminate, until they will reboot the Kubernetes node regardless of still running Pods. +In the case of e.g. an on-premise Kubernetes cluster the Kubernetes infrastructure team may want to drain the Kubernetes node so that they can do regular maintenance, such as rebooting the node. +They will have some upper limit on how long they will wait for Pods on the Node to terminate before they reboot the Kubernetes node, regardless of any Pods that are still running. When setting up a production cluster, you need to check with your Kubernetes administrator (or cloud provider) what time period your Pods have to terminate gracefully. It is not sufficient to have a look at the `spec.terminationGracePeriodSeconds` and come to the conclusion that the Pods have e.g. 24 hours to gracefully shut down, as e.g. an administrator can reboot the Kubernetes node before the time period is reached. From fbdd71093c3098a9d9874a6fedff3b189e2036f6 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Wed, 11 Oct 2023 12:19:24 +0200 Subject: [PATCH 5/5] fix nav --- modules/concepts/nav.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/concepts/nav.adoc b/modules/concepts/nav.adoc index 7d1a591a1..a2cc144e7 100644 --- a/modules/concepts/nav.adoc +++ b/modules/concepts/nav.adoc @@ -10,10 +10,10 @@ ** xref:resources.adoc[] ** xref:s3.adoc[] ** xref:tls_server_verification.adoc[] -** xref:pod_placement.adoc[] ** xref:overrides.adoc[] ** xref:duration.adoc[] ** xref:operations/index.adoc[] *** xref:operations/cluster_operations.adoc[] -*** xref:operations/pod_placement.adoc[] *** xref:operations/pod_disruptions.adoc[] +*** xref:operations/pod_placement.adoc[] +*** xref:operations/graceful_shutdown.adoc[]