You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This is from a debugging conversation with @ravescovi. When an endpoint fails with a ZMQ error, the endpoint appears to start, a series of log messages announce connection steps which seem to indicate that the endpoint is starting including funcx-endpoint list which indicates that the endpoint started only for it to fail silently later. The delay to failure is a problem, and the fact that the funcx-endpoint list only says disconnected rather than failed is a problem.
Describe the solution you'd like
Ideally the endpoint fails right away, however this might be difficult since the failure happens in the endpoint interchange which is a daemonized process. The next best option would be to have funcx-endpoint list be more descriptive with what failed.
Describe alternatives you've considered
This failure message pops up in the interchange.stderr and isn't reported at the end of the the EndpointInterchange.log. Having this error go the EndpointInterchange.log would have been ideal, one option would be squash the EndpointInterchange.log, interchange.stderr and interchange.stdout all into one interchange.log. Having three places to check is pretty bad.
Additional context
Following the instructions in #393 fixed the ZMQ issue.
The text was updated successfully, but these errors were encountered:
Now that the endpoints are starting properly, Raf has got endpoints on Theta and Cooley and will report here if he sees any issue with them disconnecting.
For externally reported and tracked bugs for open source components, we still need to keep them in GitHub. However, it would be good to have a clubhouse issue to track on the board. Or if this relates to other work already in Clubhouse, linking this issue to the CH issue would be good.
Is your feature request related to a problem? Please describe.
This is from a debugging conversation with @ravescovi. When an endpoint fails with a ZMQ error, the endpoint appears to start, a series of log messages announce connection steps which seem to indicate that the endpoint is starting including
funcx-endpoint list
which indicates that the endpoint started only for it to fail silently later. The delay to failure is a problem, and the fact that thefuncx-endpoint list
only saysdisconnected
rather thanfailed
is a problem.Describe the solution you'd like
Ideally the endpoint fails right away, however this might be difficult since the failure happens in the endpoint interchange which is a daemonized process. The next best option would be to have
funcx-endpoint list
be more descriptive with what failed.Describe alternatives you've considered
This failure message pops up in the
interchange.stderr
and isn't reported at the end of the theEndpointInterchange.log
. Having this error go theEndpointInterchange.log
would have been ideal, one option would be squash theEndpointInterchange.log
,interchange.stderr
andinterchange.stdout
all into oneinterchange.log
. Having three places to check is pretty bad.Additional context
Following the instructions in #393 fixed the ZMQ issue.
The text was updated successfully, but these errors were encountered: