Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: priority-fencing-delay #2012

Merged
merged 9 commits into from
Mar 21, 2020

Conversation

gao-yan
Copy link
Member

@gao-yan gao-yan commented Mar 17, 2020

This feature addresses the relevant topics and implements the ideas
brought up from:

ClusterLabs/fence-agents#308

Enforce specified delay for the fencings that are targeting the lost
nodes with the highest total resource priority in case we don't
have the majority of the nodes in our cluster partition, so that
the more significant nodes potentially win any fencing match,
which is especially meaningful under split-brain of 2-node
cluster. A promoted resource instance takes the base priority + 1
on calculation if the base priority is not 0. If all the nodes
have equal priority, then any pcmk_delay_base/max configured for
the corresponding fencing resources will be applied. Otherwise as
long as it's set, even if to 0, it takes precedence over any
configured pcmk_delay_base/max. By default, priority fencing
delay is disabled.

@gao-yan
Copy link
Member Author

gao-yan commented Mar 17, 2020

Not sure if it's the timing, but we could either bump the version of libstonithd on releasing.

Copy link
Contributor

@kgaillot kgaillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful work. I do think we can avoid the library bump. The tests are really nice to have.

lib/pengine/utils.c Show resolved Hide resolved
include/crm/stonith-ng.h Outdated Show resolved Hide resolved
lib/fencing/st_client.c Outdated Show resolved Hide resolved
daemons/fenced/fenced_commands.c Show resolved Hide resolved
lib/pengine/common.c Show resolved Hide resolved
@gao-yan gao-yan force-pushed the priority-fencing-delay branch 4 times, most recently from cf22ac4 to 6f09298 Compare March 19, 2020 17:29
const char *name, unsigned int timeout, unsigned int tolerance,
int delay)
{
return pcmk__fence_action(st, target, action, name, timeout, tolerancei, delay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra "i". doesn't matter now but will save us a moment of confusion later :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, good catch! You have eagle eyes :-)

delay_base, delay_max, cmd->action, device->id);
delay_base = delay_max;
// No enforced fencing delay
if (cmd->start_delay < 0 ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to change, but you did define delay_enforced :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@kgaillot
Copy link
Contributor

People are going to love this :)

I'm fine with it as is, let me know when you're ready to merge

@gao-yan
Copy link
Member Author

gao-yan commented Mar 20, 2020

It's ready to be merged.

I've received feedback about the idea from users. This definitely will be a very desirable feature also for 1.1 release. Could we make an exception to back-port this for 1.1 branch? Of course we (or basically I) will maintain this part for 1.1 branch as well. What do you think?

BTW, as suggested from libtool manual:
Update the version information only immediately before a public release of your software. More frequent updates are unnecessary, and only guarantee that the current interface number gets larger faster.

, we could bump the "revision" and the "age" for libstonithd library file right before the new release.

@gao-yan
Copy link
Member Author

gao-yan commented Mar 20, 2020

Please wait. Let me document about the priority of a promoted resource instance takes.

This feature addresses the relevant topics and implements the ideas
brought up from:

ClusterLabs/fence-agents#308

This commit adds priority-fencing-delay option (just the option, not the
feature itself).

Enforce specified delay for the fencings that are targeting the lost
nodes with the highest total resource priority in case we don't
have the majority of the nodes in our cluster partition, so that
the more significant nodes potentially win any fencing match,
which is especially meaningful under split-brain of 2-node
cluster. A promoted resource instance takes the base priority + 1
on calculation if the base priority is not 0. If all the nodes
have equal priority, then any pcmk_delay_base/max configured for
the corresponding fencing resources will be applied. Otherwise as
long as it's set, even if to 0, it takes precedence over any
configured pcmk_delay_base/max. By default, priority fencing
delay is disabled.
This is based on the existing test whitebox-imply-stop-on-fence.
A parameter value -1 disables enforced fencing delay.
Operation fence() is now a wrapper for fence_with_delay().
…g delay

It can be specified with --fence, --reboot or --unfence commands.
The default value -1 disables enforced fencing delay.
Enforced fencing delay takes precedence over any pcmk_delay_base/max
configured for the corresponding fencing resources.

Enforced fencing delay is applied only for the first device in the first
fencing topology level.

Consistently use g_timeout_add_seconds() for pcmk_delay_base/max as
well.
@gao-yan
Copy link
Member Author

gao-yan commented Mar 20, 2020

Please wait. Let me document about the priority of a promoted resource instance takes.

Done.

@kgaillot kgaillot merged commit 3e73aef into ClusterLabs:master Mar 21, 2020
@kgaillot
Copy link
Contributor

I've received feedback about the idea from users. This definitely will be a very desirable feature also for 1.1 release. Could we make an exception to back-port this for 1.1 branch? Of course we (or basically I) will maintain this part for 1.1 branch as well. What do you think?

Certainly, if you want to backport it, feel free.

@kgaillot
Copy link
Contributor

@wenningerk had some interesting feedback. His concern was if the cluster could get in a state where one node decided to use priority delay and the other decided to use static delay. Then it might not line up with the user's expectations.

His suggestion was to make the priority delay add on top of the maximum static delay. In the likely case where the user is only using one or the other, there's no difference from the current setup. But if someone specifies both, they would specify their static delay, and then specify a small priority delay that would sit on top of that.

What do you think?

@gao-yan
Copy link
Member Author

gao-yan commented Mar 27, 2020

@wenningerk had some interesting feedback. His concern was if the cluster could get in a state where one node decided to use priority delay and the other decided to use static delay. Then it might not line up with the user's expectations.

As long as priority-fencing-delay is configured, the only case where pcmk_delay_base could be applied is, that all the nodes are considered having equal priority. I assume @wenningerk meant what if split partitions have different views on resources status, right? First it's unlikely ... It might be possible if split-brain happens to happen right while a cluster transition is executing, specifically while any resource actions are already in-flight. But in case of such a situation, even without pcmk_fence_base being configured, cluster partitions might have different opinions about which has the higher priority anyway ... But apparently nobody could tell any cluster partitions the whole picture. I think what we could only and should do is respect each partitions' own views and opinions , and live with bad but small chances.

His suggestion was to make the priority delay add on top of the maximum static delay. In the likely case where the user is only using one or the other, there's no difference from the current setup. But if someone specifies both, they would specify their static delay, and then specify a small priority delay that would sit on top of that.

I was also thinking, in case that both priority delay and static delay are configured, whether the delays at different levels should "combine" under any situations ... Hmm, I don't see much point of that so far but rather the confusions that might introduce ...

Resources are more significant. A node is preferred usually not because it has a favored name :-) A node is considered more significant because it's hosting more significant resources. But resources move around and the preferred node switches. That'd make administrators kind of frustrated to have to manually intervene and change pcmk_delay_base settings at times getting the new preferences reflected. That's also the reason why we came up with this priority-fencing-delay to dynamically, automatically and directly reflect the significance of nodes.

If one anyway thinks static delay is a more suitable solution, I couldn't understand why they would need to configure a "small" priority-fencing-delay to add onto that basically affecting nothing, right?

OTOH, if one thinks priority-fencing-delay is the more suitable solution, I don't see how they would like to add a static delay and evaluate it as another factor by directly combining the delay values themselves..

But actually the good idea of configuring both priority-fencing-delay and pcmk_delay_* is consider pcmk_delay_* as the "tiebreaker" in case that all the nodes have equal priority. For that purpose, it's actually even recommended to configure both.

@gao-yan
Copy link
Member Author

gao-yan commented Mar 27, 2020

Well, the situations that I can think of where combining of pcmk_delay_base would be helpful are, in cases that both partitions think they have the higher priority themselves, or both think each other has the higher priority. So that pcmk_delay_base could be a remedy. If we are to combine pcmk_delay_base for that purpose, we'd have to tell users the configured values of priority-fencing-delay and pcmk_delay_base should be different enough...

@gao-yan
Copy link
Member Author

gao-yan commented Mar 27, 2020

Well, the situations that I can think of where combining of pcmk_delay_base would be helpful are, in cases that both partitions think they have the higher priority themselves, or both think each other has the higher priority. So that pcmk_delay_base could be a remedy. If we are to combine pcmk_delay_base for that purpose, we'd have to tell users the configured values of priority-fencing-delay and pcmk_delay_base should be different enough...

And of course it's practically meaningful only if priority-fencing-delay is bigger than pcmk_delay_base.

@kgaillot
Copy link
Contributor

kgaillot commented Mar 27, 2020

Your explanation makes sense to me.

@gao-yan
Copy link
Member Author

gao-yan commented Mar 27, 2020

Well, the situations that I can think of where combining of pcmk_delay_base would be helpful are, in cases that both partitions think they have the higher priority themselves, or both think each other has the higher priority. So that pcmk_delay_base could be a remedy. If we are to combine pcmk_delay_base for that purpose, we'd have to tell users the configured values of priority-fencing-delay and pcmk_delay_base should be different enough...

And of course it's practically meaningful only if priority-fencing-delay is bigger than pcmk_delay_base.

So the question is, do we want to introduce that and teach users the implications and the additional requirement of relevant configuration parts (complexity against usability) to deal with the kind of corner cases? :-)

@wenningerk
Copy link
Contributor

That touches exactly my concerns.
If we don't prevent configuration of both mechanisms (base + random and priority) - which we theoretically could to prevent configuration of something that is hard to understand and possibly not really working as expected - then it shouldn't be as hard as possible to configure something that doesn't behave as expected.
Thus the idea of adding the delays which as said doesn't make a difference if just one is configured.
Btw. in the case of both nodes thinking they should use priority-delay the most generic approach (not favoring a node) might be using the random addition - significantly smaller than priority-delay (which we could check and at least moan about) - to further reduce the chance of a death-match where both opponents kill each other simultaneously.

@wenningerk
Copy link
Contributor

As we are already using this here for discussion ...
What about adding something like priority-fencing-probe - optional of course - that would allow replacing the priority collection algorithm implemented here by whatever seems to be useful for a specific cluster.
I was e.g. thinking of prioritizing nodes by their connectivity. Nodes would independently do periodic checks of their connectivity to e.g. gateways (e.g. pingd) and memorize the results in some storage available to the other nodes as well (attributes in cib or whatever). When fencing is requested - possibly after a reevaluation of the connectivity of the local node - that data can be used by a priority-fencing-probe to derive a list of nodes and corresponding priorities.
To enable implementation of a combined approach - without having to reimplement the priority evaluation based on resource-priorities - that list could be provided as an imput to a priority-fencing-probe.

gao-yan added a commit to gao-yan/pacemaker that referenced this pull request Apr 1, 2020
…bled

This commit also documents the upcoming new behavior as discussed from:

ClusterLabs#2012

priority-fencing-delay will combine with any static/random delays that
are introduced by pcmk_delay_base/max configured for the corresponding
fencing resources. It's recommended to configure this delay to be at
least twice the maximum pcmk_delay_base/max.
gao-yan added a commit to gao-yan/pacemaker that referenced this pull request Apr 1, 2020
…bled

This commit also documents the upcoming new behavior as discussed from:

ClusterLabs#2012

priority-fencing-delay will combine with any static/random delays that
are introduced by pcmk_delay_base/max configured for the corresponding
fencing resources. This delay should be significantly greater than,
safely twice, the maximum pcmk_delay_base/max. By default, priority
fencing delay is disabled.
gao-yan added a commit to gao-yan/pacemaker that referenced this pull request Apr 1, 2020
…bled

This commit also documents the upcoming new behavior as discussed from:

ClusterLabs#2012

`priority-fencing-delay` will combine with any static/random delays that
are introduced by `pcmk_delay_base/max` configured for the corresponding
fencing resources. This delay should be significantly greater than,
safely twice, the maximum `pcmk_delay_base/max`. By default, priority
fencing delay is disabled.
gao-yan added a commit to gao-yan/pacemaker that referenced this pull request Apr 3, 2020
…bled

This commit also documents the upcoming new behavior as discussed from:

ClusterLabs#2012

Any static/random delays that are introduced by `pcmk_delay_base/max`
configured for the corresponding fencing resources will be added to this
delay. This delay should be significantly greater than, safely twice,
the maximum `pcmk_delay_base/max`. By default, priority fencing delay is
disabled.
gao-yan added a commit to gao-yan/pacemaker that referenced this pull request Apr 20, 2020
…bled

This commit also documents the upcoming new behavior as discussed from:

ClusterLabs#2012

Any static/random delays that are introduced by `pcmk_delay_base/max`
configured for the corresponding fencing resources will be added to this
delay. This delay should be significantly greater than, safely twice,
the maximum `pcmk_delay_base/max`. By default, priority fencing delay is
disabled.
gao-yan added a commit to gao-yan/pacemaker that referenced this pull request Apr 23, 2020
…bled

This commit also documents the upcoming new behavior as discussed from:

ClusterLabs#2012

Any static/random delays that are introduced by `pcmk_delay_base/max`
configured for the corresponding fencing resources will be added to this
delay. This delay should be significantly greater than, safely twice,
the maximum `pcmk_delay_base/max`. By default, priority fencing delay is
disabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants