Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK-3926] Treat all transport errors as recoverable #1823

Merged

Conversation

lawrence-forooghian
Copy link
Collaborator

@lawrence-forooghian lawrence-forooghian commented Nov 6, 2023

This treats all transport errors as recoverable from the point of view of RTN14d.

The immediate reason for doing this is so that an error of OSStatus 9806 (errSSLClosedAbort) doesn't cause the client to transition to the FAILED state. But, as described in the commit message, this also reflects the broader approach that @paddybyers wants us to take for handling transport errors.

Resolves #1817.

@github-actions github-actions bot temporarily deployed to staging/pull/1823/features November 6, 2023 13:47 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/1823/jazzydoc November 6, 2023 13:50 Inactive
@lawrence-forooghian lawrence-forooghian force-pushed the 1817-treat-all-transport-errors-as-recoverable branch from 82d6180 to e716768 Compare November 6, 2023 14:13
@github-actions github-actions bot temporarily deployed to staging/pull/1823/features November 6, 2023 14:13 Inactive
@lawrence-forooghian
Copy link
Collaborator Author

@paddybyers this is pretty much the PR — I've got to pop out shortly and will add a further test when back to check that an arbitrary error does not provoke a transition to FAILED, but feel free to look at the general shape of things beforehand if you want.

@github-actions github-actions bot temporarily deployed to staging/pull/1823/jazzydoc November 6, 2023 14:19 Inactive
@paddybyers
Copy link
Member

Thanks. See ably/specification#171 (comment).

So if these changes implement that approach, we're good.

@lawrence-forooghian lawrence-forooghian force-pushed the 1817-treat-all-transport-errors-as-recoverable branch from e716768 to 46ee0b0 Compare November 6, 2023 15:41
@github-actions github-actions bot temporarily deployed to staging/pull/1823/features November 6, 2023 15:41 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/1823/jazzydoc November 6, 2023 15:45 Inactive
@lawrence-forooghian
Copy link
Collaborator Author

lawrence-forooghian commented Nov 6, 2023

As things stand in this PR:

  • No error that results from the process of establishing a WebSocket connection to Ably is considered fatal.
  • No WebSocket event (e.g. a closed connection) that occurs after the connection to Ably is established is considered fatal.
  • I haven't touched the logic regarding what it will do with a ProtocolMessage received from Ably on the realtime transport, which I assume follows the relevant spec points for DISCONNECTED and ERROR messages.

@lawrence-forooghian
Copy link
Collaborator Author

lawrence-forooghian commented Nov 6, 2023

@paddybyers is there more that needs to be done than what I've detailed above?

@lawrence-forooghian lawrence-forooghian marked this pull request as ready for review November 6, 2023 17:22
lawrence-forooghian added a commit that referenced this pull request Nov 6, 2023
This is a cherry-pick of 46ee0b0, to resolve #1823 on the
fix/socketrocket-fix branch which one of our customers is using.

TODO bring in test changes
lawrence-forooghian added a commit that referenced this pull request Nov 6, 2023
This is a cherry-pick of 46ee0b0, to resolve #1823 on the
fix/socketrocket-fix branch which one of our customers is using.

TODO bring in test changes
lawrence-forooghian added a commit that referenced this pull request Nov 6, 2023
This is a cherry-pick of 46ee0b0, to resolve #1823 on the
fix/socketrocket-fix branch which one of our customers is using.

TODO bring in test changes
lawrence-forooghian added a commit that referenced this pull request Nov 6, 2023
This is a cherry-pick of 46ee0b0, to resolve #1823 on the
fix/socketrocket-fix branch which one of our customers is using.

TODO bring in test changes
lawrence-forooghian added a commit that referenced this pull request Nov 6, 2023
This is a cherry-pick of 46ee0b0, to resolve #1823 on the
fix/socketrocket-fix branch which one of our customers is using.

TODO bring in test changes
Copy link
Collaborator

@maratal maratal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with minor comment.

Source/ARTWebSocketTransport.m Outdated Show resolved Hide resolved
I’m going to need to change this test in an upcoming commit and want the
original intent to be clear before I do.
The use of the non-completing authCallback already ensures a timeout.
Want to add a new test with another failure reason.
RTN14d says that a transport error should be considered recoverable if
it is

> a network failure, a timeout such as RTN14c, or a disconnected
> response, other than a token failure RTN14b)

However, it does not define what it means by a "network failure",
leading to each platform having to provide its own interpretation.

In particular, a client recently told us that they’ve been seeing lots
of OSStatus 9806 (errSSLClosedAbort) errors. This appears to indicate
some sort of failure to perform an SSL handshake. We don’t understand
the cause of this issue but we noticed that it was causing the client to
transition to the FAILED state.

Speaking to Paddy, he said that this error should be provoking a
connection retry, not failure. And more broadly, he indicated that,
basically, _all_ transport errors should be considered recoverable. So,
that’s what we do here. He’s raised a specification issue [1] for us to
properly specify this and to decide if there are any nuances to
consider, but is keen for us to implement this broad behaviour in
ably-cocoa ASAP to help the customer experiencing the errSSLClosedAbort
errors.

Resolves #1817.

[1] ably/specification#171
Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question, but LGTM, thanks

Source/ARTRealtime.m Show resolved Hide resolved
lawrence-forooghian added a commit that referenced this pull request Nov 7, 2023
This is a cherry-pick of 86b7cc7, to resolve #1823 on the
fix/socketrocket-fix branch which one of our customers is using.
This was referenced Nov 7, 2023
@lawrence-forooghian lawrence-forooghian merged commit f3bf3c4 into main Nov 7, 2023
6 checks passed
@lawrence-forooghian lawrence-forooghian deleted the 1817-treat-all-transport-errors-as-recoverable branch November 7, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

errSSLClosedAbort is considered a terminal error
3 participants