You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since this library is targeting communication with ROS assets remotely I think disconnection / re-connection should be a very core feature of the API with effective and clean error handling.
Current state:
Client::new() is stubborn and completely blocks until connection is established, kinda breaks timeout support, but it does start the "stubborn connect" which I do like
Subscriptions should survive disconnection, but impossible to know if they are disconnected? Should we push a disconnected error through them? We probably have to re-subscribe after we lose comms.
Publishers should have individual publishes fail, but once connection is re-established trying them again should be fine. However, do we need to re-advertise?
Service calls that experience a disconnect mid-call will likely hang forever and be effectively a memory leak. Likely need to be able to push an error back here and flush in-flight service calls when we disconnect.
Overall the library is in a lot of ways pretty brittle to disconnection right now for moderately complex usages.
The ideal outcome of this issue is a significant improvement to how we approach disconnections including systematic testing of disconnections (how do we create a disconnect in a test?)
The text was updated successfully, but these errors were encountered:
We've been picking at this issue intermittently in other upgrades want to update on current state.
Client::new() is still stubborn, but new_with_options() works with timeout
Subscribers still can't tell if they are disconnected or some other upstream error is preventing them from getting data like a deserialization failure
The re-advertise problem is extremely challenging, we don't have a good way to tell if rosbridge has accepted our message, it is very challenging to distinguish between re-connecting to the same rosbridge instance or a new one.
Service calls still have the above issue, but clear path to adding timeout to them.
What has been improved is that all ClientHandle calls check is_disconnected() up front and "fail fast" if currently disconnected. This prevents most deadlocks that could occur from disconnection, but service_calls remain a challenge.
Since this library is targeting communication with ROS assets remotely I think disconnection / re-connection should be a very core feature of the API with effective and clean error handling.
Current state:
Overall the library is in a lot of ways pretty brittle to disconnection right now for moderately complex usages.
The ideal outcome of this issue is a significant improvement to how we approach disconnections including systematic testing of disconnections (how do we create a disconnect in a test?)
The text was updated successfully, but these errors were encountered: