You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug funcx-endpoint list relies on checking for the PID listed in the daemon.pid file to determine whether the endpoint is live. On multi-login-node systems this check will fail because the process might be on a different login node. The issue here is that the funcx-endpoint list will erroneously say the endpoint is disconnected, and then when the user tries to start the endpoint, the cli will wipe the current daemon.pid file in an attempt to cleanup, and then start a new endpoint with the same endpoint_id, ending up in a broken state.
To Reproduce
Steps to reproduce the behavior, for e.g:
Install funcx-endpoint==0.3.2 with Python 3.7/3.8 on cluster
Connect to loginnode01 of many
Run funcx-endpoint configure test; funcx-endpoint start test
Connect to loginnode02
Run funcx-endpoint list; This will show test is disconnected
Run funcx-endpoint start test.
Expected behavior funcx-endpoint list should not show a connected endpoint on another login node as disconnected. funcx-endpoint start should not wipe the daemon.pid, and start a duplicate endpoint with the same endpoint id.
Distributed Environment
Running on a multi-login-node system
The text was updated successfully, but these errors were encountered:
Describe the bug
funcx-endpoint list
relies on checking for the PID listed in thedaemon.pid
file to determine whether the endpoint is live. On multi-login-node systems this check will fail because the process might be on a different login node. The issue here is that thefuncx-endpoint list
will erroneously say the endpoint is disconnected, and then when the user tries to start the endpoint, the cli will wipe the current daemon.pid file in an attempt to cleanup, and then start a new endpoint with the same endpoint_id, ending up in a broken state.To Reproduce
Steps to reproduce the behavior, for e.g:
funcx-endpoint configure test; funcx-endpoint start test
funcx-endpoint list
; This will showtest
isdisconnected
funcx-endpoint start test
.Expected behavior
funcx-endpoint list
should not show a connected endpoint on another login node asdisconnected
.funcx-endpoint start
should not wipe the daemon.pid, and start a duplicate endpoint with the same endpoint id.Distributed Environment
The text was updated successfully, but these errors were encountered: