Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watch_job_rules: true is causing huge memory consumption #16840

Closed
bgruening opened this issue Oct 12, 2023 · 6 comments
Closed

watch_job_rules: true is causing huge memory consumption #16840

bgruening opened this issue Oct 12, 2023 · 6 comments

Comments

@bgruening
Copy link
Member

Describe the bug

If we run with watch_job_rules: true we see a memory usage of up to 30GB per handler.

More information including a py-syp trace is here: https://gist.github.com/bgruening/7be19830e254b4bca99a3ae2e3844041

Disabling the watchdog, as done here, usegalaxy-eu/infrastructure-playbook#937 is solving the problem for us.

grafik

Feel free to close this issue, if this seems to be a EU-specific thing. We are also using TPV, so ping also @nuwang and @cat-bro.

@cat-bro
Copy link
Contributor

cat-bro commented Oct 12, 2023

we downgraded watchdog from 3.0.0 to 2.2.1 in response to handlers consuming lots of memory

@bgruening
Copy link
Member Author

I can confirm:

(venv) galaxy@sn06:~$ pip list | grep watchdo
watchdog 3.0.0

Should we pin watchdog then, until someone can find the problem in 3.0.0?

@nsoranzo
Copy link
Member

@mvdbeek
Copy link
Member

mvdbeek commented Oct 13, 2023

We already ran into this during the admin training in Ghent. I thought the fix at the time was to move the job rules to a place where they aren't monitored by other watchers as well, so apologies that this was lost.

@bgruening
Copy link
Member Author

No need to apologize, really. I'm not even sure why it has just hit us now. So glad I created an issue and you got a fix so fast. Awesome!

@mvdbeek
Copy link
Member

mvdbeek commented Oct 13, 2023

I am fairly certain e9bc593 fixed this, I'd appreciate a little test of course. And thanks @cat-bro for figuring out the version at which this still worked, that made the fix easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants