-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: service: add pre-start configuration validator #99
base: main
Are you sure you want to change the base?
Conversation
1) Currently the only hard errors are: - sbd binary is not existing or not executable. - 'TimeoutStartSec' of service is shorter than 'msgwait' if 'SBD_DELAY_START' is enabled. - 'TimeoutStartSec' of service is shorter than the delay value explicitly configured with 'SBD_DELAY_START'. Hard errors prevent service from starting. 2) Warnings are given if: - sbd devices are not existing. - meta-data cannot be read from sbd devices. On start-up, since sbd daemon waits for sbd devices to appear and get ready within start-up timeout, such situations shouldn't prevent service from starting. Besides, for the latter situation, it doesn't necessarily mean sbd devices are not correctly initialized. But anyway warnings are given. 3) Notices are given if: - '/etc/sysconfig/sbd' is not existing. - 'SBD_DEVICE' is not configured in case it's unintentional. - 'TimeoutStartUSec'/'TimeoutStartSec' setting of service somehow cannot be retrieved or recognized. - 'TimeoutStartSec' of service is shorter than sbd start-up timeout. - 'SBD_DELAY_START' is enabled but not by being specified an explicit delay value (It's recommended to set it to be longer than 'corosync token timeout + consensus timeout + pcmk_delay_max/base + msgwait'). - 'SBD_DELAY_START' is configured with an explicit delay value but shorter than the configured 'msgwait' These are not strictly or necessarily mis-configuration. They shouldn't prevent service from starting either.
Can one of the admins verify this patch? |
add to whitelist |
ppc64le on travis sucks on travis recently - let's give it another try ... |
Ooh that grew into a complex thing ... I know I was the one to come up with the pre-exec-script ;-) Klaus |
I think so too. It's not for sbd to understand what mechanism starts its daemon.
But I don't think we are unnecessarily duplicating code here. Although the main purpose is for checking TimeoutStartSec accordingly, which is beyond sbd's knowledge/tasks. But in order to achieve that, it's probably unavoidable to parse and check the prerequisite configuration information here. Indeed it should align with the logic in the C code, for example it should probably parse more relevant options like "-d "from SBD_OPTS.
It does, but it's not harmful to get them again. I've actually been thinking the script could be more than just a pre-start checker called by systemd. It could be directly run by users/bootstraps as a configuration validator doing sanity-check during setup/bootstraping without or before starting cluster. That's the reason why I added a "validate-all" action for it, which for now apparently does the exactly same as "pre-start" though :-)
I haven't figured out a way either. OTOH, I don't think it's for the binary to care about systemd settings as mentioned :-) |
I would have called it something like SBD_MAX_STARTUP_DELAY coming from /etc/sysconfig/sbd or wherever possibly overwritten by the pre-exec (if that works) without sbd having to explicitly know where it is coming from just to be fed into a consistency check. |
Not another option from sbd sysconfig please ;-) The existing options are already tough enough for users to understand and make right.
No matter where the checking is done, as long as the checker doesn't explicitly say a misconfiguration is exactly because of "TimeoutStartSec" from sbd.service, the error message won't be helpful but introducing confusions. |
Would've mentioned it in sbd sysconfig but usually one wouldn't have to touch it because the startup is setting it. And I guess a sentence in the remarks of that file could state that in a comprehensive way.
There you've really got a point ... |
Not saying this idea is ready for implementation but maybe thinking along these lines might lead to a general improvement: Was thinking something like ... Implementation wise that approach would just require some string-filtering done on all environment-variables. Everything else can be done over time. |
Have you considered putting the shared bits in a private C library, and doing the pre-start script in C? |
Hmm ... interesting idea. We could probably do a split that keeps the systemd-library-dependencies out of the basic sbd code base. |
Indeed putting the checking parts into a library, and systemd-specific things even into a separate library would be beneficial for implementing a pre-start in C. I might be misunderstanding anything in here, but if a pre-start is not the way to go, I don't see any way of preventing sbd binary from depending on systemd library, right? And I'm not sure if it'd be worth it to make it depend on systemd just because of this, although systemd is commonly used and the dependency on systemd could be determined at compiling time... OTOH generally, a daemon may have its systemd service file, but how could it be sure that it was started by systemd and is being limited by systemd timeouts? It could be started by an init-script or purely manually even if there's systemd. Technically I think other things would be facing the similar topic. For example for pacemaker, 'shutdown-escalation' is not recommended to be configured, but still it's configurable. pacemaker-controld knows the value of 'shutdown-escalation' , but not actually anything about TimeoutStopSec of pacemaker.service. What pacemaker does for now is give TimeoutStopSec=30min in the default pacemaker.serivce file, which is longer that the default 'shutdown-escalation' (20min). So similarly, what we could first easily do for now is probably give the default sbd.service file a generally long enough TimeoutStartSec value for example 10min, as suggested before? That'd likely resolve 99% of "mis-configuration"/"forgotten-configuration". I don't see much drawback of that. And it wouldn't conflict/overlap with any further improvements for configuration validation. |
Hrm, this could be interesting: |
It doesn't seem like it can be done per unit/service ... |
sbd binary is not existing or not executable.
'TimeoutStartSec' of service is shorter than 'msgwait' if
'SBD_DELAY_START' is enabled.
'TimeoutStartSec' of service is shorter than the delay value
explicitly configured with 'SBD_DELAY_START'.
Hard errors prevent service from starting.
sbd devices are not existing.
meta-data cannot be read from sbd devices.
On start-up, since sbd daemon waits for sbd devices to appear and get
ready within start-up timeout, such situations shouldn't prevent service
from starting. Besides, for the latter situation, it doesn't necessarily
mean sbd devices are not correctly initialized. But anyway warnings are
given.
'/etc/sysconfig/sbd' is not existing.
'SBD_DEVICE' is not configured in case it's unintentional.
'TimeoutStartUSec'/'TimeoutStartSec' setting of service somehow cannot
be retrieved or recognized.
'TimeoutStartSec' of service is shorter than sbd start-up timeout.
'SBD_DELAY_START' is enabled but not by being specified an explicit
delay value (It's recommended to set it to be longer than 'corosync
token timeout + consensus timeout + pcmk_delay_max/base + msgwait').
'SBD_DELAY_START' is configured with an explicit delay value but
shorter than the configured 'msgwait'
These are not strictly or necessarily mis-configuration. They shouldn't
prevent service from starting either.