From 285cb0ff872c5461a78e62f683e9f7226a9d6351 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 13:48:55 +0100 Subject: [PATCH 01/15] Basic notes. --- .../cp-subsystem/pages/cp-subsystem.adoc | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index f0400e778..47c4e78c6 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -273,3 +273,23 @@ group is not available anymore, no management tasks can be performed on the CP Subsystem. For instance, a new CP group cannot be created. In this case, the only solution is to wipe-out the whole CP Subsystem state by performing a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsystem Management]. + +== Kubernetes + +Deployment of CP within Kubernetes is supported from Hazelcast 5.5 and covers the following scenarios +when using platform opertor our Helm Hazelcast Enterprise chart: + +- Deployment +- Pause: Scaling of compute to `0` +- Resume: Scaling of compute back to the number of members defined during Deployment +- Rolling Update +- Spurious pod restarts + +NOTE: CP is only supported on Kubernetes with CP persistence enabled. + +NOTE: The current limitation on CP in Kubernetes is that we do not support dynamic scaling of the cluster. +The number of members defined at the time of deployment is static and the CP members and CP group size +are expected to be equal to the total number of members (the cluster size) at the time of deployment. +Explicit removal and promotion of a CP member is not supported: Kubernetes has the responsibility of +restarting CP members should they be terminated. These restrictions will be removed in a subsequent +release of Hazelcast Enterprise. From 715207f1370109c862f864c9a77bcd00eb82a7a5 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 13:56:21 +0100 Subject: [PATCH 02/15] Persistence link --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 47c4e78c6..dedddd5c8 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -276,8 +276,8 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste == Kubernetes -Deployment of CP within Kubernetes is supported from Hazelcast 5.5 and covers the following scenarios -when using platform opertor our Helm Hazelcast Enterprise chart: +Deployment of CP within Kubernetes is supported from Hazelcast Enterprise 5.5 and covers the +following scenarios when using platform opertor our Helm Hazelcast Enterprise chart: - Deployment - Pause: Scaling of compute to `0` @@ -285,7 +285,7 @@ when using platform opertor our Helm Hazelcast Enterprise chart: - Rolling Update - Spurious pod restarts -NOTE: CP is only supported on Kubernetes with CP persistence enabled. +NOTE: CP is only supported on Kubernetes with CP xref:cp-subsystem:configuration.adoc#persistence[persistence enabled,window=_blank]. NOTE: The current limitation on CP in Kubernetes is that we do not support dynamic scaling of the cluster. The number of members defined at the time of deployment is static and the CP members and CP group size @@ -293,3 +293,4 @@ are expected to be equal to the total number of members (the cluster size) at th Explicit removal and promotion of a CP member is not supported: Kubernetes has the responsibility of restarting CP members should they be terminated. These restrictions will be removed in a subsequent release of Hazelcast Enterprise. + From 11e35ab263922a39e4f5111ebe469686aa829e90 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 14:02:47 +0100 Subject: [PATCH 03/15] k8s ref --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index dedddd5c8..2c1e32030 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -279,9 +279,9 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste Deployment of CP within Kubernetes is supported from Hazelcast Enterprise 5.5 and covers the following scenarios when using platform opertor our Helm Hazelcast Enterprise chart: -- Deployment -- Pause: Scaling of compute to `0` -- Resume: Scaling of compute back to the number of members defined during Deployment +- Deployment: see xref:kubernetes:deploying-in-kubernetes.adoc[Deploying in Kubernetes,window=_blank]. +- Pause: scaling of compute to `0` +- Resume: scaling of compute back to the number of members defined during Deployment - Rolling Update - Spurious pod restarts From c607ab571f782624fd75314e117f2d978a1a228e Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 14:12:49 +0100 Subject: [PATCH 04/15] important note on what is preferred installation method --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 2c1e32030..1d3ae1389 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -276,6 +276,10 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste == Kubernetes +IMPORTANT: We strongly encourage using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] for deployments into Kubernetes. When using Helm please make sure to use the official +`hazelcast/hazelcast-enterprise` Helm Chart (see xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Hazelcast Platform Operator,window=_blank]) +within the limitations of what is described in this section. + Deployment of CP within Kubernetes is supported from Hazelcast Enterprise 5.5 and covers the following scenarios when using platform opertor our Helm Hazelcast Enterprise chart: @@ -285,7 +289,12 @@ following scenarios when using platform opertor our Helm Hazelcast Enterprise ch - Rolling Update - Spurious pod restarts +The method by which deployment, pause, resume and rolling update are performed will vary according +to the way that CP was deployed. See xref:kubernetes:deploying-in-kubernetes.adoc[Deploying in Kubernetes,window=_blank] +for more information. + NOTE: CP is only supported on Kubernetes with CP xref:cp-subsystem:configuration.adoc#persistence[persistence enabled,window=_blank]. +Hazelcast Enterprise is therefore a requirement. NOTE: The current limitation on CP in Kubernetes is that we do not support dynamic scaling of the cluster. The number of members defined at the time of deployment is static and the CP members and CP group size From 7145a2e16ae079fb927f43d179537591ad470d31 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 14:58:11 +0100 Subject: [PATCH 05/15] events --- .../cp-subsystem/pages/cp-subsystem.adoc | 41 +++++++++++++++++-- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 1d3ae1389..8258f5775 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -277,15 +277,15 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste == Kubernetes IMPORTANT: We strongly encourage using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] for deployments into Kubernetes. When using Helm please make sure to use the official -`hazelcast/hazelcast-enterprise` Helm Chart (see xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Hazelcast Platform Operator,window=_blank]) -within the limitations of what is described in this section. +`hazelcast/hazelcast-enterprise` Helm Chart (see xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank]) +and configure within the limitations of what is described in this section. Deployment of CP within Kubernetes is supported from Hazelcast Enterprise 5.5 and covers the following scenarios when using platform opertor our Helm Hazelcast Enterprise chart: - Deployment: see xref:kubernetes:deploying-in-kubernetes.adoc[Deploying in Kubernetes,window=_blank]. - Pause: scaling of compute to `0` -- Resume: scaling of compute back to the number of members defined during Deployment +- Resume: scaling of compute back to the number of members defined during _Deployment_ - Rolling Update - Spurious pod restarts @@ -303,3 +303,38 @@ Explicit removal and promotion of a CP member is not supported: Kubernetes has t restarting CP members should they be terminated. These restrictions will be removed in a subsequent release of Hazelcast Enterprise. +We recommend setting xref:cp-subsystem:configuration.adoc#data-load-timeout-seconds[data-load-timeout-seconds,window=_blank] +to a value that spans the duration from the when the first pod is running to when last pod is running and completed its CP +intialisation procedure. This is particularly important if you intend to do a _resume_ scenario. Currently the only way to determine when a CP member has completed its initialisation is to consult the logs. Therefore, we recommend the following to determine a reasonable value for `data-load-timeout-seconds`: + +1. Load CP with an amount of data that is representative of your production use case +2. Pause the cluster +3. Resume the cluster and determine the duration in seconds between when first member in the `StatefulSet` running and when the last member in the `StatefulSet` is running and outputted an `INFO` level log message that matches the pattern `CP restore completed` as described shortly. + +If you are using a log aggregation service and wish to filter key startup events within CP then the `INFO` level patterns emitted by `CPPersistenceServiceImpl` can be used. + +[cols="1,1,1"] +|=== +|Phrase|Example Match|Description + +|`CP restore starting` +|`CP restore starting...in /data/cp-data/0e667605-c650-42b7-9625-376a213008a6; Timeout(s): 120` +| Denotes the starting of the entire CP restoration process. + +|`CP restore completed` +|`CP restore completed...in /data/cp-data/0e667605-c650-42b7-9625-376a213008a6; Took(ms): 50387` +| Denotes completion of the entire CP restoration process. This includes notification to other CP members that this member has rejoined in addition to the loading of its persisted data. + +|`CP restore starting(CPGroupId` +|`CP restore starting(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0` +| Denotes the starting of loading a particular CP Group's data. + +|`CP restore starting(CPGroupId` +|`CP restore starting(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0` +| Denotes the starting of loading a particular CP Group's data. + +|`CP restore completed(CPGroupId` +|`CP restore completed(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0; Took(ms): 29` +| Denotes the starting of loading a particular CP Group's data. + +|=== From 4bdceff0c27a6e20f69534c38b660fe6da1fd347 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 15:04:15 +0100 Subject: [PATCH 06/15] Fix table --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 8258f5775..181c98b26 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -309,19 +309,19 @@ intialisation procedure. This is particularly important if you intend to do a _r 1. Load CP with an amount of data that is representative of your production use case 2. Pause the cluster -3. Resume the cluster and determine the duration in seconds between when first member in the `StatefulSet` running and when the last member in the `StatefulSet` is running and outputted an `INFO` level log message that matches the pattern `CP restore completed` as described shortly. +3. Resume the cluster and determine the duration in seconds between when first pod in the `StatefulSet` running and when the last pod in the `StatefulSet` is running and outputted an `INFO` level log message that matches the pattern `CP restore completed...in` as described shortly. -If you are using a log aggregation service and wish to filter key startup events within CP then the `INFO` level patterns emitted by `CPPersistenceServiceImpl` can be used. +If you are using a log aggregation service and wish to filter key startup events within CP then the `INFO` level patterns emitted by `CPPersistenceServiceImpl` can be used as detailed below. [cols="1,1,1"] |=== |Phrase|Example Match|Description -|`CP restore starting` +|`CP restore starting...in` |`CP restore starting...in /data/cp-data/0e667605-c650-42b7-9625-376a213008a6; Timeout(s): 120` | Denotes the starting of the entire CP restoration process. -|`CP restore completed` +|`CP restore completed...in` |`CP restore completed...in /data/cp-data/0e667605-c650-42b7-9625-376a213008a6; Took(ms): 50387` | Denotes completion of the entire CP restoration process. This includes notification to other CP members that this member has rejoined in addition to the loading of its persisted data. @@ -329,10 +329,6 @@ If you are using a log aggregation service and wish to filter key startup events |`CP restore starting(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0` | Denotes the starting of loading a particular CP Group's data. -|`CP restore starting(CPGroupId` -|`CP restore starting(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0` -| Denotes the starting of loading a particular CP Group's data. - |`CP restore completed(CPGroupId` |`CP restore completed(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0; Took(ms): 29` | Denotes the starting of loading a particular CP Group's data. From a83f971fa41e4ca8157ac7131786a8e02a56969f Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 15:10:18 +0100 Subject: [PATCH 07/15] Minor corrections --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 181c98b26..7f73a3ac1 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -277,15 +277,15 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste == Kubernetes IMPORTANT: We strongly encourage using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] for deployments into Kubernetes. When using Helm please make sure to use the official -`hazelcast/hazelcast-enterprise` Helm Chart (see xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank]) +`hazelcast/hazelcast-enterprise` xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank] and configure within the limitations of what is described in this section. Deployment of CP within Kubernetes is supported from Hazelcast Enterprise 5.5 and covers the -following scenarios when using platform opertor our Helm Hazelcast Enterprise chart: +following scenarios when using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] or our `hazelcast/hazelcast-enterprise` xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank]. - Deployment: see xref:kubernetes:deploying-in-kubernetes.adoc[Deploying in Kubernetes,window=_blank]. -- Pause: scaling of compute to `0` -- Resume: scaling of compute back to the number of members defined during _Deployment_ +- Pause: scaling of pods to `0` +- Resume: scaling of pods back to the same number of pods defined at the point of _Deployment_ - Rolling Update - Spurious pod restarts @@ -305,7 +305,7 @@ release of Hazelcast Enterprise. We recommend setting xref:cp-subsystem:configuration.adoc#data-load-timeout-seconds[data-load-timeout-seconds,window=_blank] to a value that spans the duration from the when the first pod is running to when last pod is running and completed its CP -intialisation procedure. This is particularly important if you intend to do a _resume_ scenario. Currently the only way to determine when a CP member has completed its initialisation is to consult the logs. Therefore, we recommend the following to determine a reasonable value for `data-load-timeout-seconds`: +intialisation procedure. This is particularly important if you intend to perform _resume_ scenarios. Currently the only way to determine when a CP member has completed its initialisation is to consult the logs. Therefore, we recommend the following to determine a reasonable value for `data-load-timeout-seconds`: 1. Load CP with an amount of data that is representative of your production use case 2. Pause the cluster From e639334f6b90ef1f9f705b5a9e932807b2212062 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 15:32:07 +0100 Subject: [PATCH 08/15] Supported CP member sizes --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 7f73a3ac1..eabb56ef7 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -289,6 +289,8 @@ following scenarios when using xref:kubernetes:deploying-in-kubernetes.adoc#haze - Rolling Update - Spurious pod restarts +We support 3, 5- and 7-CP member deployments under the constraints discussed in this section. + The method by which deployment, pause, resume and rolling update are performed will vary according to the way that CP was deployed. See xref:kubernetes:deploying-in-kubernetes.adoc[Deploying in Kubernetes,window=_blank] for more information. From 40c6e8db45cf1129ab5ffb4dadbdfe38da25d310 Mon Sep 17 00:00:00 2001 From: Granville Barnett <140408555+gbarnett-hz@users.noreply.github.com> Date: Fri, 17 May 2024 16:16:14 +0100 Subject: [PATCH 09/15] Update docs/modules/cp-subsystem/pages/cp-subsystem.adoc Adopt - suggestion on IMPORTANT Helm wording Co-authored-by: rebekah-lawrence <142301480+rebekah-lawrence@users.noreply.github.com> --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index eabb56ef7..af63f930d 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -276,7 +276,7 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste == Kubernetes -IMPORTANT: We strongly encourage using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] for deployments into Kubernetes. When using Helm please make sure to use the official +IMPORTANT: We strongly encourage using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] for deployments into Kubernetes. If you choose to use Helm, use the official `hazelcast/hazelcast-enterprise` xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank] and configure within the limitations of what is described in this section. From 9ce030c169c41e8e7afc996efa06a2f79e417041 Mon Sep 17 00:00:00 2001 From: Granville Barnett <140408555+gbarnett-hz@users.noreply.github.com> Date: Fri, 17 May 2024 16:16:42 +0100 Subject: [PATCH 10/15] Update docs/modules/cp-subsystem/pages/cp-subsystem.adoc Adopt - suggestion on configuration wording Co-authored-by: rebekah-lawrence <142301480+rebekah-lawrence@users.noreply.github.com> --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index af63f930d..96dc83f08 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -278,7 +278,7 @@ a force-reset. See xref:management.adoc#cp-subsystem-management-apis[CP Subsyste IMPORTANT: We strongly encourage using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] for deployments into Kubernetes. If you choose to use Helm, use the official `hazelcast/hazelcast-enterprise` xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank] -and configure within the limitations of what is described in this section. +and configure within the limitations described in this section. Deployment of CP within Kubernetes is supported from Hazelcast Enterprise 5.5 and covers the following scenarios when using xref:kubernetes:deploying-in-kubernetes.adoc#hazelcast-platform-operator-for-kubernetesopenshift[Hazelcast Platform Operator,window=_blank] or our `hazelcast/hazelcast-enterprise` xref:kubernetes:deploying-in-kubernetes.adoc#helm-chart[Helm Chart,window=_blank]. From 1eaf7b596d4f392be5929249f88202b4c36dc0e1 Mon Sep 17 00:00:00 2001 From: Granville Barnett <140408555+gbarnett-hz@users.noreply.github.com> Date: Fri, 17 May 2024 16:19:34 +0100 Subject: [PATCH 11/15] Apply suggestions from code review More suggestions adopted. Co-authored-by: rebekah-lawrence <142301480+rebekah-lawrence@users.noreply.github.com> --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 96dc83f08..7687dfebb 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -295,7 +295,9 @@ The method by which deployment, pause, resume and rolling update are performed w to the way that CP was deployed. See xref:kubernetes:deploying-in-kubernetes.adoc[Deploying in Kubernetes,window=_blank] for more information. -NOTE: CP is only supported on Kubernetes with CP xref:cp-subsystem:configuration.adoc#persistence[persistence enabled,window=_blank]. +[NOTE] +==== +* CP is only supported on Kubernetes with CP xref:cp-subsystem:configuration.adoc#persistence[persistence enabled,window=_blank]. Hazelcast Enterprise is therefore a requirement. NOTE: The current limitation on CP in Kubernetes is that we do not support dynamic scaling of the cluster. @@ -321,18 +323,18 @@ If you are using a log aggregation service and wish to filter key startup events |`CP restore starting...in` |`CP restore starting...in /data/cp-data/0e667605-c650-42b7-9625-376a213008a6; Timeout(s): 120` -| Denotes the starting of the entire CP restoration process. +| Point at which the entire CP restoration process started. |`CP restore completed...in` |`CP restore completed...in /data/cp-data/0e667605-c650-42b7-9625-376a213008a6; Took(ms): 50387` -| Denotes completion of the entire CP restoration process. This includes notification to other CP members that this member has rejoined in addition to the loading of its persisted data. +| Point at which the entire CP restoration process completed, including notifying other CP members that the member has rejoined and the loading of its persisted data. |`CP restore starting(CPGroupId` |`CP restore starting(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0` -| Denotes the starting of loading a particular CP Group's data. +| Point at which a particular CP Group's data started loading. |`CP restore completed(CPGroupId` |`CP restore completed(CPGroupId{name='METADATA', seed=0, groupId=0})...in /data/persistence/cp/212561fb-c2d5-442a-a4e0-a863fdf7074b/METADATA@0@0; Took(ms): 29` -| Denotes the starting of loading a particular CP Group's data. +| Point at which a particular CP Group's data completed loading. |=== From f726e17ae9daf322e643781d04ec0f1f557d0159 Mon Sep 17 00:00:00 2001 From: Granville Barnett <140408555+gbarnett-hz@users.noreply.github.com> Date: Fri, 17 May 2024 16:20:32 +0100 Subject: [PATCH 12/15] Apply suggestions from code review Adopt limitation wording. Co-authored-by: rebekah-lawrence <142301480+rebekah-lawrence@users.noreply.github.com> --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 7687dfebb..ec06aae1f 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -300,7 +300,7 @@ for more information. * CP is only supported on Kubernetes with CP xref:cp-subsystem:configuration.adoc#persistence[persistence enabled,window=_blank]. Hazelcast Enterprise is therefore a requirement. -NOTE: The current limitation on CP in Kubernetes is that we do not support dynamic scaling of the cluster. +* The current limitation on CP in Kubernetes is that we do not support dynamic scaling of the cluster. The number of members defined at the time of deployment is static and the CP members and CP group size are expected to be equal to the total number of members (the cluster size) at the time of deployment. Explicit removal and promotion of a CP member is not supported: Kubernetes has the responsibility of From 5918f1befe1cb8760fe519410de80daedb229a78 Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 16:26:44 +0100 Subject: [PATCH 13/15] adopt suggestion --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index ec06aae1f..8f0a9b32f 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -315,7 +315,7 @@ intialisation procedure. This is particularly important if you intend to perform 2. Pause the cluster 3. Resume the cluster and determine the duration in seconds between when first pod in the `StatefulSet` running and when the last pod in the `StatefulSet` is running and outputted an `INFO` level log message that matches the pattern `CP restore completed...in` as described shortly. -If you are using a log aggregation service and wish to filter key startup events within CP then the `INFO` level patterns emitted by `CPPersistenceServiceImpl` can be used as detailed below. +If you are using a log aggregation service and want to filter key startup events within CP, you can use the `INFO` level patterns emitted by `CPPersistenceServiceImpl` as detailed below. [cols="1,1,1"] |=== From 57e000bd74b191afeff6affc0327ae988080719b Mon Sep 17 00:00:00 2001 From: Granville Barnett Date: Fri, 17 May 2024 16:28:08 +0100 Subject: [PATCH 14/15] end note section --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 8f0a9b32f..81829e65b 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -306,6 +306,7 @@ are expected to be equal to the total number of members (the cluster size) at th Explicit removal and promotion of a CP member is not supported: Kubernetes has the responsibility of restarting CP members should they be terminated. These restrictions will be removed in a subsequent release of Hazelcast Enterprise. +=== We recommend setting xref:cp-subsystem:configuration.adoc#data-load-timeout-seconds[data-load-timeout-seconds,window=_blank] to a value that spans the duration from the when the first pod is running to when last pod is running and completed its CP From 85e7c0558f97305e8852b4d92fca293abb7d8879 Mon Sep 17 00:00:00 2001 From: Granville Barnett <140408555+gbarnett-hz@users.noreply.github.com> Date: Fri, 17 May 2024 16:34:59 +0100 Subject: [PATCH 15/15] Update docs/modules/cp-subsystem/pages/cp-subsystem.adoc data-load-timeout-seconds suggestion Co-authored-by: rebekah-lawrence <142301480+rebekah-lawrence@users.noreply.github.com> --- docs/modules/cp-subsystem/pages/cp-subsystem.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc index 81829e65b..1755caca0 100644 --- a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc +++ b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc @@ -309,7 +309,7 @@ release of Hazelcast Enterprise. === We recommend setting xref:cp-subsystem:configuration.adoc#data-load-timeout-seconds[data-load-timeout-seconds,window=_blank] -to a value that spans the duration from the when the first pod is running to when last pod is running and completed its CP +to a value that spans the duration from when the first pod is running to when last pod is running and has completed its CP intialisation procedure. This is particularly important if you intend to perform _resume_ scenarios. Currently the only way to determine when a CP member has completed its initialisation is to consult the logs. Therefore, we recommend the following to determine a reasonable value for `data-load-timeout-seconds`: 1. Load CP with an amount of data that is representative of your production use case