You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
God resurrects a process after god stop <task name> is run if the god config for that process uses keepalive, a custom stop command using -QUIT, and does not terminate within the stop_timeout period.
Reproduction
Setup
Create a simple script, fodder.rb, that just logs messages:
require'logger'@logger=Logger.new"/home/pair/fodder/output.log"@logger.info"Start feeding!"@feeding_time=truedefquit_feeding@feeding_time=falseendSignal.trap("QUIT"){quit_feeding}while@feeding_timedo@logger.info"More fodder!"sleep1end@logger.info"Starting graceful quit..."sleep_time=12@logger.info"Sleeping for #{sleep_time}s..."sleepsleep_time@logger.info"We're finally quitting, BYE!"
Create the following simple god config, god-fodder.rb, which uses keepalive, a custom stop command, and an interval:
Start god with the god-fodder config, then after a few seconds execute god's stop command:
$ god -c god-fodder.rb
$ god stop fodder
Output
The output from the fodder script:
I, [2017-10-24T15:10:29.797266 #30297] INFO -- : Start feeding!
I, [2017-10-24T15:10:29.797423 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:30.797660 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:31.798036 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:32.798408 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:33.798786 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:34.799169 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:35.799517 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:36.799891 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:37.800278 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:38.800637 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:39.801020 #30297] INFO -- : More fodder!
I, [2017-10-24T15:10:40.801379 #30297] INFO -- : Starting graceful quit...
I, [2017-10-24T15:10:40.801521 #30297] INFO -- : Sleeping for 12s...
I, [2017-10-24T15:10:50.689377 #30352] INFO -- : Start feeding!
I, [2017-10-24T15:10:50.689526 #30352] INFO -- : More fodder!
I, [2017-10-24T15:10:51.689765 #30352] INFO -- : More fodder!
I, [2017-10-24T15:10:52.690183 #30352] INFO -- : More fodder!
I, [2017-10-24T15:10:53.690517 #30352] INFO -- : More fodder!
I, [2017-10-24T15:10:54.690874 #30352] INFO -- : More fodder!
I, [2017-10-24T15:10:55.691255 #30352] INFO -- : More fodder!
I, [2017-10-24T15:10:56.691608 #30352] INFO -- : More fodder!
Output from god's log:
Oct 24 15:10:29 god[30255]: Syslog enabled.
Oct 24 15:10:29 god[30257]: Loading /etc/god/god.conf.rb
Oct 24 15:10:29 god[30257]: Using pid file directory: /var/run/god
Oct 24 15:10:29 god[30257]: Started on drbunix:///tmp/god.17165.sock
Oct 24 15:10:29 god[30257]: fodder move 'unmonitored' to 'init'
Oct 24 15:10:29 god[30257]: fodder moved 'unmonitored' to 'init'
Oct 24 15:10:29 god[30257]: fodder [trigger] process is not running (ProcessRunning)
Oct 24 15:10:29 god[30257]: fodder move 'init' to 'start'
Oct 24 15:10:29 god[30257]: fodder start: ruby /home/pair/fodder/fodder.rb
Oct 24 15:10:29 god[30257]: fodder moved 'init' to 'start'
Oct 24 15:10:29 god[30257]: fodder [trigger] process is running (ProcessRunning)
Oct 24 15:10:29 god[30257]: fodder move 'start' to 'up'
Oct 24 15:10:29 god[30257]: fodder registered 'proc_exit' event for pid 30297
Oct 24 15:10:29 god[30257]: fodder moved 'start' to 'up'
Oct 24 15:10:40 god[30257]: fodder stop: kill -QUIT `cat /var/run/god/fodder.pid`
Oct 24 15:10:40 god[30257]: fodder ensuring stop...
Oct 24 15:10:50 god[30257]: fodder still alive after 10s; sent SIGKILL
Oct 24 15:10:50 god[30257]: fodder move 'up' to 'unmonitored'
Oct 24 15:10:50 god[30257]: fodder deregistered 'proc_exit' event for pid 30322
Oct 24 15:10:50 god[30257]: fodder moved 'up' to 'unmonitored'
Oct 24 15:10:50 god[30257]: fodder [trigger] process 30297 exited {:pid=>30297, :exit_code=>9, :exit_signal=>17, :thread_group_id=>30297} (ProcessExits)
Oct 24 15:10:50 god[30257]: fodder move 'unmonitored' to 'start'
Oct 24 15:10:50 god[30257]: fodder start: ruby /home/pair/fodder/fodder.rb
Oct 24 15:10:50 god[30257]: fodder moved 'unmonitored' to 'start'
Oct 24 15:10:50 god[30257]: fodder [trigger] process is running (ProcessRunning)
Oct 24 15:10:50 god[30257]: fodder move 'start' to 'up'
Oct 24 15:10:50 god[30257]: fodder registered 'proc_exit' event for pid 30352
Oct 24 15:10:50 god[30257]: fodder moved 'start' to 'up'
Investigation
During initialization, god creates a new thread for an EventHandler loop and then registers the :process_exits condition used in the keepalive method with the EventHandler.
When a command is sent through the CLI (e.g. god stop fodder), a new thread is created. This new thread pushes a :stop event, followed by an :unmonitored event onto the god event driver queue.
The :stop event will be processed on the original monitoring thread (not the EventHandler loop thread or the CLI command thread), and if the :stop_timeout time period is reached and the original process is still running, a KILL signal will be sent through this original thread. This particular stop logic never tries to detach or stop tracking the PID, instead, god continues to track our PID process.
At this point the KILL signal triggers the :process_exits condition on the EventHandler thread, which then pushes an event responsible for starting the process back up onto the event queue. The original monitoring thread immediately tries to handle that trigger event.
IF a stop_grace is added to the god config, the pause in code execution, via sleep in lib/god/watch.rb after executing the action actually allows one of the thread loops to clean up after itself and clear out our events queue containing the most recently triggered :process_exits event. Once that thread cleans up, there's no action for the original monitoring thread to take since our app is registered as :unmonitored at this point. This particular behavior skirts the problem and allows god to move the process to unmonitored, but it does not fix the underlying issue.
The text was updated successfully, but these errors were encountered:
God resurrects a process after
god stop <task name>
is run if the god config for that process useskeepalive
, a custom stop command using-QUIT
, and does not terminate within thestop_timeout
period.Reproduction
Setup
Create a simple script,
fodder.rb
, that just logs messages:Create the following simple god config,
god-fodder.rb
, which useskeepalive
, a custom stop command, and an interval:Execution
Start god with the
god-fodder
config, then after a few seconds execute god's stop command:Output
The output from the
fodder
script:Output from god's log:
Investigation
During initialization, god creates a new thread for an
EventHandler
loop and then registers the:process_exits
condition used in thekeepalive
method with theEventHandler
.When a command is sent through the CLI (e.g.
god stop fodder
), a new thread is created. This new thread pushes a:stop
event, followed by an:unmonitored
event onto the god event driver queue.The
:stop
event will be processed on the original monitoring thread (not theEventHandler
loop thread or the CLI command thread), and if the:stop_timeout
time period is reached and the original process is still running, a KILL signal will be sent through this original thread. This particular stop logic never tries to detach or stop tracking the PID, instead, god continues to track our PID process.At this point the KILL signal triggers the
:process_exits
condition on theEventHandler
thread, which then pushes an event responsible for starting the process back up onto the event queue. The original monitoring thread immediately tries to handle that trigger event.IF a
stop_grace
is added to the god config, the pause in code execution, viasleep
in lib/god/watch.rb after executing the action actually allows one of the thread loops to clean up after itself and clear out our events queue containing the most recently triggered:process_exits
event. Once that thread cleans up, there's no action for the original monitoring thread to take since our app is registered as:unmonitored
at this point. This particular behavior skirts the problem and allows god to move the process to unmonitored, but it does not fix the underlying issue.The text was updated successfully, but these errors were encountered: