Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in standalone mcrouter on invalid config #455

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mszabo-wikia
Copy link
Contributor

@mszabo-wikia mszabo-wikia commented Dec 10, 2024

After 6c2142a, standalone mcrouter now deadlocks if the router configuration was incorrect. Spotted on #449, where test_unknown_named_handles started failing due to the mcrouter process being tested never exiting.

GDB indicates a deadlock between three threads:

(gdb) info threads
  Id   Target Id                                          Frame
* 1    Thread 0x7f7b4cce1e00 (LWP 211878) "mcrouter"      syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  2    Thread 0x7f7b43fff6c0 (LWP 211882) "mcr-cpuaux-0"  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  3    Thread 0x7f7b49e0a6c0 (LWP 211879) "IOThreadPool0" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  • Thread 1, the main thread, has triggered exit() and is waiting on the auxiliary thread pool to be destroyed.
  • Thread 2, an auxiliary thread pool thread, is in the process of destroying the CarbonRouterInstance due to
    6c2142a and is blocked on destroying the virtual EVBs by child proxies. These however are ultimately sourced from the IO thread pool which is also used by AsyncMcServer.
  • Thread 3, an IO thread pool thread, is blocked by AsyncMcServer which is waiting to be started by later initialization code that never runs due to the config error, preventing the IO thread pool itself from being destroyed.

Fix it by initializing the AsyncMcServer only after the router has been initialized.

@facebook-github-bot
Copy link
Contributor

@disylh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mszabo-wikia mszabo-wikia force-pushed the fix-oss-config-deadlock branch from 45b81a3 to 2858489 Compare December 11, 2024 11:00
After 6c2142a, standalone mcrouter now
deadlocks if router the configuration was incorrect. Spotted on
facebook#449, where
`test_unknown_named_handles` started failing due to the mcrouter process
being tested never existing.

GDB
[indicates](https://gist.github.com/mszabo-wikia/47916c5655deffdb95332e972a52caf8)
a deadlock between three threads:
```
(gdb) info threads
  Id   Target Id                                          Frame
* 1    Thread 0x7f7b4cce1e00 (LWP 211878) "mcrouter"      syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  2    Thread 0x7f7b43fff6c0 (LWP 211882) "mcr-cpuaux-0"  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  3    Thread 0x7f7b49e0a6c0 (LWP 211879) "IOThreadPool0" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
```

Thread 1, the main thread, has triggered exit() and is waiting on the
auxiliary thread pool to be destroyed.
Thread 2, an auxiliary thread pool thread, is in the process of
destroying the CarbonRouterInstance due to
6c2142a and is blocked on destroying
the virtual EVBs by child proxies. These however are ultimately sourced
from the IO thread pool which is also used by AsyncMcServer.
Thread 3, an IO thread pool thread, is blocked by AsyncMcServer which is
waiting to be started by later initialization code that never runs due
to the config error, preventing the IO thread pool itself from being
destroyed.

Fix it by initializing the AsyncMcServer only after the router has been
initialized.
@mszabo-wikia mszabo-wikia force-pushed the fix-oss-config-deadlock branch from 2858489 to 080d8be Compare December 11, 2024 11:53
@mszabo-wikia
Copy link
Contributor Author

Formatting should hopefully be fixed in the latest commit (using https://github.com/facebook/folly/blob/abf77ba9340d7fb225c0a0fe6b355ca2a5652231/folly/.clang-format as a baseline)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants