-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.1] Feature: priority-fencing-delay #2043
[1.1] Feature: priority-fencing-delay #2043
Conversation
This feature addresses the relevant topics and implements the ideas brought up from: ClusterLabs/fence-agents#308 This commit adds priority-fencing-delay option (just the option, not the feature itself). Enforce specified delay for the fencings that are targeting the lost nodes with the highest total resource priority in case we don't have the majority of the nodes in our cluster partition, so that the more significant nodes potentially win any fencing match, which is especially meaningful under split-brain of 2-node cluster. A promoted resource instance takes the base priority + 1 on calculation if the base priority is not 0. If all the nodes have equal priority, then any pcmk_delay_base/max configured for the corresponding fencing resources will be applied. Otherwise as long as it's set, even if to 0, it takes precedence over any configured pcmk_delay_base/max. By default, priority fencing delay is disabled.
This is based on the existing test whitebox-imply-stop-on-fence.
A parameter value -1 disables enforced fencing delay. Operation fence() is now a wrapper for fence_with_delay().
…g delay It can be specified with --fence, --reboot or --unfence commands. The default value -1 disables enforced fencing delay.
Enforced fencing delay takes precedence over any pcmk_delay_base/max configured for the corresponding fencing resources. Enforced fencing delay is applied only for the first device in the first fencing topology level. Consistently use g_timeout_add_seconds() for pcmk_delay_base/max as well.
…delay with fencing topology
…bled This commit also documents the upcoming new behavior as discussed from: ClusterLabs#2012 Any static/random delays that are introduced by `pcmk_delay_base/max` configured for the corresponding fencing resources will be added to this delay. This delay should be significantly greater than, safely twice, the maximum `pcmk_delay_base/max`. By default, priority fencing delay is disabled.
… have equal priority In any cases, priority-fencing-delay won't take precedence over any configured pcmk_delay_base/max.
…uested fencing delay Requested fencing delay doesn't take precedence over any configured pcmk_delay_base/max. A delay value -1 now means disable also any static/random fencing delays from pcmk_delay_base/max. It's not used by any consumers for now.
This commit also documents the current behavior in the help: - Any static/random delays from pcmk_delay_base/max will be added to requested fencing delay. - A delay value -1 now means disable also any static/random fencing delays from pcmk_delay_base/max.
…y_base is added This commit also updates log patterns for the log changes.
Getting ready for the 1.1.23 release, it just now occurred to me that it would have been better not to backport this upstream. We can't bump the feature set in 1.1 (we need to guarantee rolling upgrades from any 1.1 to any 2.0). That means in a mixed version cluster, this feature could start or stop working depending on which node is elected DC. It's a tough call -- I can either let this be part of the release, and let that be a known problem, or revert it upstream (distros of course could still backport it). What are your thoughts? |
1.1 branch has the feature set 3.0.14, while 2.0 versions have something >= 3.1.0, right? We could bump 1.1 branch to 3.0.15, no? |
Or you mean we should support rolling upgrade to any old 2.0? Do we have to? |
Right, we currently guarantee rolling upgrades from any mix of versions 1.1.11 or later to any higher version. |
Rather than revert it entirely, I can ifdef the key sections with a new constant that users can define if they want to enable it, at the cost of upgrade compatibility. |
Hard to tell why one would upgrade from a latest 1.1 to an outdated 2.0 :-) But of course they would lose the feature upgrading to a 2.0 version that doesn't support it. OTOH, it's probably not really an incompatible change. It's just the cluster nodes incapable of handling a delay simply ignore it. Hrm , not sure if it's really an unacceptable limitation, compared with the benefits for 1.1 users to have the feature ... |
Ah, good idea! |
I'll do that then. It's unlikely a user would upgrade from a newer 1.1 to an older 2.0, but an example of where that might happen is if someone switches from compiling their own on an older platform, to stock packages on a newer platform with an older 2.0. Also, unless we did a feature set bump, a mixed-version cluster is allowed (in this case anything since 1.1.18). With such a cluster, the feature would start or stop working depending on which node is DC. (It's not wise to run such a cluster outside a rolling upgrade, but it is supported.) We could revise our support guarantees, but I'd rather do that at 3.0.0 than break a promise we've already made. I do like the current policy though, it avoids needing a compatibility matrix, and is intuitive enough to minimize surprises for anyone unfamiliar with the policy. |
BTW I see I overlooked this issue with fence-reaction in 1.1.22. :( It looks like everything else since 1.1.18 is acceptable. |
See #2082 for -DENABLE_PRIORITY_FENCING_DELAY |
Makes sense.
Didn't think of that either...
Alright. |
Backports of #2012 and #2027 for 1.1 branch.
This feature addresses the relevant topics and implements the ideas
brought up from:
ClusterLabs/fence-agents#308
Apply specified delay for the fencings that are targeting the lost
nodes with the highest total resource priority in case we don't
have the majority of the nodes in our cluster partition, so that
the more significant nodes potentially win any fencing match,
which is especially meaningful under split-brain of 2-node
cluster. A promoted resource instance takes the base priority + 1
on calculation if the base priority is not 0. Any static/random
delays that are introduced by
pcmk_delay_base/max
configuredfor the corresponding fencing resources will be added to this
delay. This delay should be significantly greater than, safely
twice, the maximum
pcmk_delay_base/max
. By default, priorityfencing delay is disabled.