Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with disconnection? #14

Open
Carter12s opened this issue Jun 28, 2022 · 1 comment
Open

How to deal with disconnection? #14

Carter12s opened this issue Jun 28, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@Carter12s
Copy link
Collaborator

Since this library is targeting communication with ROS assets remotely I think disconnection / re-connection should be a very core feature of the API with effective and clean error handling.

Current state:

  • Client::new() is stubborn and completely blocks until connection is established, kinda breaks timeout support, but it does start the "stubborn connect" which I do like
  • Subscriptions should survive disconnection, but impossible to know if they are disconnected? Should we push a disconnected error through them? We probably have to re-subscribe after we lose comms.
  • Publishers should have individual publishes fail, but once connection is re-established trying them again should be fine. However, do we need to re-advertise?
  • Service calls that experience a disconnect mid-call will likely hang forever and be effectively a memory leak. Likely need to be able to push an error back here and flush in-flight service calls when we disconnect.

Overall the library is in a lot of ways pretty brittle to disconnection right now for moderately complex usages.

The ideal outcome of this issue is a significant improvement to how we approach disconnections including systematic testing of disconnections (how do we create a disconnect in a test?)

@Carter12s Carter12s added the enhancement New feature or request label Jun 28, 2022
@Carter12s Carter12s added this to the V0.0.1 milestone Jun 28, 2022
@Carter12s
Copy link
Collaborator Author

We've been picking at this issue intermittently in other upgrades want to update on current state.

  • Client::new() is still stubborn, but new_with_options() works with timeout
  • Subscribers still can't tell if they are disconnected or some other upstream error is preventing them from getting data like a deserialization failure
  • The re-advertise problem is extremely challenging, we don't have a good way to tell if rosbridge has accepted our message, it is very challenging to distinguish between re-connecting to the same rosbridge instance or a new one.
  • Service calls still have the above issue, but clear path to adding timeout to them.

What has been improved is that all ClientHandle calls check is_disconnected() up front and "fail fast" if currently disconnected. This prevents most deadlocks that could occur from disconnection, but service_calls remain a challenge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant