Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd solution to workers not wanting to stop with SIGTERM #150

Open
ghost opened this issue Jul 20, 2018 · 9 comments
Open

Odd solution to workers not wanting to stop with SIGTERM #150

ghost opened this issue Jul 20, 2018 · 9 comments

Comments

@ghost
Copy link

ghost commented Jul 20, 2018

I've been a longtime user of gearman-manager, thank you for your work on this!

Background: Just updated a worker server from centos 6.5 to 7.5. I noticed ^C (killing the manager while running in the foreground, and even background use) would cause the manager to hang until 60 seconds, when it will just forcefully kill the workers. Normally the workers will shutdown gracefully in a few seconds.

I was having a difficult time finding a solution, however, finally I just randomly added a line of code in the start_lib_worker method in GearmanPeclManager.php, and now GM will shutdown nicely.

        while(!$this->stop_work){

            if(@$thisWorker->work() ||
               $thisWorker->returnCode() == GEARMAN_IO_WAIT ||
               $thisWorker->returnCode() == GEARMAN_NO_JOBS) {

                if ($thisWorker->returnCode() == GEARMAN_SUCCESS) continue;

                if (!@$thisWorker->wait()){
                    if ($thisWorker->returnCode() == GEARMAN_NO_ACTIVE_FDS){
                        sleep(5);
                    }
                }

            }

            /**
             * Check the running time of the current child. If it has
             * been too long, stop working.
             */
            if($this->max_run_time > 0 && time() - $start > $this->max_run_time) {
                $this->log("Been running too long, exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

            if(!empty($this->config["max_runs_per_worker"]) && $this->job_execution_count >= $this->config["max_runs_per_worker"]) {
                $this->log("Ran $this->job_execution_count jobs which is over the maximum({$this->config['max_runs_per_worker']}), exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

        }

Adding any sort of line to evaluate does the trick:

        while(!$this->stop_work){

            if(@$thisWorker->work() ||
               $thisWorker->returnCode() == GEARMAN_IO_WAIT ||
               $thisWorker->returnCode() == GEARMAN_NO_JOBS) {

                if ($thisWorker->returnCode() == GEARMAN_SUCCESS) continue;

                if (!@$thisWorker->wait()){
                    if ($thisWorker->returnCode() == GEARMAN_NO_ACTIVE_FDS){
                        sleep(5);
                    }
                }

            }

            /**
             * Check the running time of the current child. If it has
             * been too long, stop working.
             */
            if($this->max_run_time > 0 && time() - $start > $this->max_run_time) {
                $this->log("Been running too long, exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

            if(!empty($this->config["max_runs_per_worker"]) && $this->job_execution_count >= $this->config["max_runs_per_worker"]) {
                $this->log("Ran $this->job_execution_count jobs which is over the maximum({$this->config['max_runs_per_worker']}), exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

        $true = false; # some extra code
        }

This is the most bizarre behavior I've witnessed, particularly since it doesn't output/echo any new characters or do anything at the IO level.

Do you have any possible explanation as to why doing this allows gearman manager to exit gracefully rather than having to SIGKILL the entire process? I tried running this on the v2 tag and v1 tag and the behavior is the same. I'm also running PHP 7.

Thanks!

@brianlmoon
Copy link
Owner

That is odd. I am running PHP 7 on Ununtu and have seen this issue. What settings do you have in your ini file?

@brianlmoon
Copy link
Owner

Also, what specific version of PHP 7?

@ghost
Copy link
Author

ghost commented Jul 21, 2018

That's comforting to know you've seen it before. Did you know that adding a line in the worker script "fixed" it? I'd give anything to know what was going on there behind the curtains.

Config:

[GearmanManager]
worker_dir=/var/www/vhosts/app.itemlogic.com/app/gearman
auto_update=1
include=*
count=2
max_worker_lifetime=3600
host=app-jobmaster
-vvvv -c /etc/gearman-manager/config.ini -l /var/log/gearman-manager.log

PHP:

[centos@stg-app-jobslave ~]$ php -v
PHP 7.0.30 (cli) (built: Apr 28 2018 08:14:08) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2017 Zend Technologies

@brianlmoon
Copy link
Owner

Sorry, I meant I have not seen it. Sorry for the typo.

@kevcam4891
Copy link

@brianlmoon Do you use virtualbox? I can create an image and let you have it in case you want to have the specific environment.

@brianlmoon
Copy link
Owner

No. But I do use Docker.

@frasche
Copy link

frasche commented Aug 1, 2018

I noticed the same behavior in my VM Ubuntu 18.04 and PHP 7.2.7, pecl version only:
It calls stop_children() with SIGTERM, but this does not kill the process. Finally after 60 secs it calls stop_children with SIGKILL again and the process get killed.

After some research I replaced the

declare(ticks = 1);

in the manager script and in GearmanManager.php with

pcntl_async_signals(true);

http://php.net/manual/en/function.pcntl-async-signals.php

This of course only works with >= PHP 7.1 but it did the trick for me. All workers are shutting down now nicely within a few seconds.

Edit
Thinking a bit of this, I got to the following conclusion
Based on this bug report: https://bugs.php.net/bug.php?id=71448 and the comment of Nikita Popov:

Due to an implementation bug, the declare(ticks=1) directive leaked into different compilation units prior to PHP 7.0. This is not how declare() directives, which are per-file or per-scope, are supposed to work.

Using a single declare(ticks=1) in some class inside a multi-class autoloaded project doesn't make a lot of sense. Even on PHP 5.x declare(ticks=1) would only affect code that is compiled starting from that declaration, which means that depending on the loading order of classes your whole codebase might end up using ticks or maybe only a few classes that were loaded late. (Not to mention that using ticks will make your code many times slower, but that's a different issue.)

For PHP 7.1 we may make automatic pcntl signal handling work without ticks, as we now have a new internal mechanism for handling ordinary timeouts which we can reuse.

Due to this statement it seems to me, declare(ticks = 1) should not be used anymore for this. At least in a case where you use autoloading your classes in a larger project, it is almost unpredictible.

There are two possibilities:
For PHP >= 7.1: use of pcntl_async_signals(true); is the way to go replacement for declare(ticks) in posix/pcntl functions. Just replace the declare statements with pcntl_async_signals(true);

For PHP > 5.3 < 7.1: we may use pcntl_signal_dispatch(), but this needs to be called after every posix_kill() and after some other pcntl functions. After playing around a while with this, I had a smooth running version without ticks and pcntl_async_signals(true) and use only pcntl_signal_dispatch().

@brianlmoon
Copy link
Owner

Anyone looked into this further? I have declare ticks in my bin script here and apparently always have. I always understood it that you had to declare that in the initial script, at the very top. But if there is a better way going forward, I am all for it.

@ReduktorSpalin
Copy link
Contributor

ReduktorSpalin commented Feb 1, 2019

Unfortunately declaring ticks in class-scope might have unexpected outcome. I didn't see it coming as well, so please check two interesting cases. Those just blown my mind: https://github.com/ReduktorSpalin/php_ticks_demo .

So it figures placing declare in different parts might have a different outcome. I hope it will be helpful finding a solution to this problem. I am not sure what is right in context of this project and Gearman's case. Maybe current scenario is correct in general, but some part should be ported to other class.

I inherited an another class, but feel free to increase the complexity of references to other classes in order to get even better & bigger image of solutions that might take advantage of this change (e.g. until now I found it difficult to autoload different class into worker).

Tested on:

  • OS: Linux Mint 19.1
  • interpreter: PHP 7.3.1-1+ubuntu18.04.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants