Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use async-websocket instead of celluloid-io. #75

Merged
merged 4 commits into from
Sep 8, 2018

Conversation

dblock
Copy link
Collaborator

@dblock dblock commented Aug 27, 2018

No description provided.

@dblock dblock changed the title WIP: use async-websocket. Use async-websocket instead of celluloid-io. Aug 27, 2018
@dblock dblock force-pushed the async branch 4 times, most recently from 388d3d6 to 132b871 Compare August 28, 2018 17:29
@ioquatix
Copy link

It looks good. Can you explain in layman's terms how I can run this to reproduce the failure you are seeing?

@dblock
Copy link
Collaborator Author

dblock commented Aug 29, 2018

@ioquatix

  • What I am seeing with this change is that the ping thread doesn't always successfully restart the bot. It calls driver.close, but then not seeing the disconnect as expected, the bot doesn't exit and restart. This is new compared to celluloid implementation.

  • Still seeing Slack-side disconnects slack-ruby-client#208, but more info there.

@ioquatix
Copy link

What I am seeing with this change is that the ping thread doesn't always successfully restart the bot. It calls driver.close, but then not seeing the disconnect as expected, the bot doesn't exit and restart. This is new compared to celluloid implementation.

Once you close the driver, I don't think you can expect to see any further events. WDYT?

@dblock
Copy link
Collaborator Author

dblock commented Aug 29, 2018

Correct, the driver should exit run_loop in that case, terminate its thread, and gracefully get restarted from https://github.com/slack-ruby/slack-ruby-bot/blob/1708693843487f1c1f97765d0d7ebbda1ef34d24/lib/slack-ruby-bot/server.rb. I see it happen locally just fine and by forcing online? to return false as many times as I can watch it. Happens in production as well, a few times, then eventually stops doing it. Haven't had much time to debug this one though (I'm traveling ;). Needs code to log whether it was able to call driver.close, etc. (I am going to guess that yes, but didn't correctly abort the reactor.)

@dblock
Copy link
Collaborator Author

dblock commented Aug 29, 2018

Can confirm with some logs that the ping thread correctly calls client.close, but that the client fails to disconnect and restart.

W, [2018-08-29T15:08:34.492086 #216]  WARN -- : DOWN: name=..., id=..., 0 retries left
W, [2018-08-29T15:08:34.492392 #216]  WARN -- : RESTART: name=..., id=..., #<Slack::RealTime::Concurrency::Async::Client:0x007fec8376b138>
W, [2018-08-29T15:08:34.492697 #216]  WARN -- : Done pinging team 

@ioquatix for Celluloid we used for abort the connection with a driver.emit(:close, WebSocket::Driver::CloseEvent.new(1001, 'bot offline')) - looks like what we did in close doesn't always have the same effect. Any suggestions?

dblock referenced this pull request in slack-ruby/slack-ruby-client Sep 2, 2018
@ioquatix
Copy link

ioquatix commented Sep 2, 2018

Where is the code for the ping thread?

@dblock
Copy link
Collaborator Author

dblock commented Sep 2, 2018

@dblock
Copy link
Collaborator Author

dblock commented Sep 2, 2018

@ioquatix
Copy link

ioquatix commented Sep 2, 2018

My apologies but slack-ruby/slack-ruby-client#222 might be the first thing to check.

@dblock
Copy link
Collaborator Author

dblock commented Sep 3, 2018

Thanks for this @ioquatix, dblock/slack-ruby-client@f2062ec, testing.

@dblock dblock force-pushed the async branch 2 times, most recently from 377d0ac to 3a76f6a Compare September 6, 2018 19:23
@dblock
Copy link
Collaborator Author

dblock commented Sep 7, 2018

Emitting a close event on the web socket similarly to what I was doing with Celluloid makes this ping situation work as before.

driver.emit(:close, WebSocket::Driver::CloseEvent.new(1001, 'bot offline'))

Without this the socket doesn't close and just sits there waiting for more. Closing the driver isn't enough.

It sounds like this is related to slack-ruby/slack-ruby-client#222 (comment). Is the right(er) solution to call close on the socket here and ignore Async::Wrapper::Cancelled?

Or is the better solution to try and send a message over the websocket and expect the next read to return nil as per faye/websocket-driver-ruby#61 (comment)?

@ioquatix
Copy link

ioquatix commented Sep 7, 2018

In theory you can just ignore the cancelled error, but I feel like it's the wrong approach.

In almost every case where I've written network IO, the #read call fails, and doesn't need an external check to indicate that #read is broken. That being said, it's fine to catch the cancelled error (perhaps report it).

@dblock
Copy link
Collaborator Author

dblock commented Sep 7, 2018

@ioquatix I mean in this case I am force-closing the socket per your recommendation in slack-ruby-client#222. Is that incorrect?

@ioquatix
Copy link

ioquatix commented Sep 7, 2018

I think it's the best option given the circumstances.

@dblock dblock merged commit 2de6062 into slack-ruby:master Sep 8, 2018
@dblock dblock deleted the async branch September 8, 2018 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants