Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggerer's async thread was blocked #830

Open
2 tasks done
WytzeBruinsma opened this issue Feb 23, 2024 · 5 comments
Open
2 tasks done

Triggerer's async thread was blocked #830

WytzeBruinsma opened this issue Feb 23, 2024 · 5 comments
Labels
kind/bug kind - things not working properly

Comments

@WytzeBruinsma
Copy link

Checks

Chart Version

8.8.0

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"55a7e688f9220adca1c99b7903953911dd38b771", GitTreeState:"clean", BuildDate:"2023-11-03T12:18:23Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}

Description

The airflow pod trigger is raising errors and is slowing down Airflow processes. The error is; Triggerer's async thread was blocked for 0.23 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.. I tried resolving this by increasing the resources, but even after removing all the limits and giving it 10 GB RAM and lots of CPU head room it still raises this error. I also check the response times of the Postgres database and couldn't find any thing that could slow down the async process and cause this error. Please let me know what other steps I can do to resolve this error.

Relevant Logs

2024-02-23 01:48:09.268	
[2024-02-23T00:48:09.267+0000] {triggerer_job_runner.py:573} INFO - Triggerer's async thread was blocked for 0.23 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.
2024-02-23 09:00:44.327	
[2024-02-23T08:00:44.325+0000] {triggerer_job_runner.py:573} INFO - Triggerer's async thread was blocked for 0.38 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.

Custom Helm Values

No response

@WytzeBruinsma WytzeBruinsma added the kind/bug kind - things not working properly label Feb 23, 2024
@justplanenutz
Copy link

We are seeing this as well, although the time values are a bit longer. Is there a tolerance var we can set to make the aysnc process a bit less time sensitive?

@thesuperzapper
Copy link
Member

@justplanenutz @WytzeBruinsma you should check the code of the trigger you are using, the message is probably correct that something is wrong with it. Did you write it yourself, or are you using one from the official providers?

Also, please note that this is an INFO-level log, so it's probably cosmetic.
Are you seeing any issues related to it?

For your reference, here is the code (in airflow itself) that detects this condition and writes the log:

https://github.com/apache/airflow/blob/2.9.2/airflow/jobs/triggerer_job_runner.py#L557-L582

@justplanenutz
Copy link

@thesuperzapper Trigger has been running fine in the past so we're confident that the code is sound. We normally set our logs at INFO as well, filtering them is not the issue.
We are currently looking for any deltas in the code base that may have aggravated an edge condition.

@justplanenutz
Copy link

justplanenutz commented Jun 11, 2024

Our trigger process is running in kubernetes and we have collected metrics for CPU and Memory usage. We noticed a significant increase in CPU and Memory consumption just before the problems started. When we restart the pod it's all good... so maybe a resource leak of some kind?

@thesuperzapper
Copy link
Member

@justplanenutz In any case, it's very unlikely to be related to this chart.

You should probably raise an issue upstream if you figure out what was causing it, feel free to link it here if you do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind - things not working properly
Projects
None yet
Development

No branches or pull requests

3 participants