-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Graceful Shutdown / Connection Draining #177
Comments
To clarify a bit more - this ticket will just be for graceful shutdown (connection draining), as restarting/reloading is much more complicated and will require us to do things with separate processes, which I am not keen to take on at the moment. |
Yeah, I'm not sure what benefit that provides though. Once you stop accepting connections you need something to take it's place. The only way you can do that is with a proxy before Daphne. If the proxy is routing connections to another instance, connection draining would prevent something that wouldn't happen anyways. Unless I'm missing something and there is a way to bind two processes to a port or socket or something. |
Graceful shutdown is mostly so you can prevent new connections while you close out old ones, which is especially useful for WebSockets, which are more stateful than HTTP. New Linux kernels do in fact allow you to bind two processes to a port using |
Oh thats awesome. |
As discussed on #182, As for how restart without losing connections generally, the best way right now would be to use a loadbalancer (e.g. HAProxy) or process manager that supports graceful restarts itself and swap in and out servers as you change them over. Not ideal, I know - it only really works at large scale with automation. Hopefully I'll have time for proper graceful restart soon. |
So will Daphne handle SIGINT by exiting after all connections terminate with the current codebase? |
It won't until I implement it, which is why this ticket is still open. Right now it will just hard-exit. |
If it did that it seems Circus would work fine. The file descriptor feature appears to work well with Circus. Edit: After spending some more time with this, the best solution I found was to put HAProxy after Nginx. Its heaver than I would of hoped for, but it allows me to set multiple instances and put them into "drain" mode one by one. It has a web UI, and once an instance is drained I can load the new code and the users don't notice anything. |
+1, subscribing for notifications |
To solve this problem, I started using Uvicorn & Gunicorn (those names make me laugh every time I write them...). Gunicorn can deploy your new code by spinning up new workers for you then gracefully shutting down your old workers, so that you have no down-time. See the Turns out it has a nice side effect too... a very nice side effect... it's like 10x faster (at least for my deployment; of course your millage may vary). By "faster" I mean my server's CPU usage is much lower now. My server used to sit at ~20% CPU when I ran my "pummel the server" script. Same script, new interface server, CPU barely hits 2%. I rolled back to Daphne just to double check it! It holds. I'm using Nginx as a proxy in front of Gunicorn. One weird thing I ran into is that if I had Nginx proxy to Gunicorn over a unix socket, I would get a weird exception somewhere deep inside channels (at request-time). If I proxy from Nginx to Gunicorn over TCP, it all works great. So that's where I left it. I didn't look into it further -- just something to be aware of it you try it out. |
@acu192 Does the HUP handing actually work for you? I've tried it myself but the HUP signal causes it to reload immediately and drop all of it's websocket connections. |
Yeah it will drop websocket connections, but any "normal" HTTP connections should be drained gracefully before the old workers are shut down (I haven't tested it super-well, but it does seem to work based on some basic experimentation I've done -- I've only had this setup for a few days now). I don't know of a way to not have the websocket connections drop... since it's a long lived TCP connection, if the connection-holding process dies it will have to drop. The only solution I know of would be to let those old workers live a long time to hold open those old websocket connections (I don't want to do that). Or do something like channels 1 did where it had an entirely separate interface server (as its own process) which communicated to the workers via redis (or whatever channel layer). I was never a fan of that though -- channels 2 is way better in my opinion by having the workers be the interface servers as well. In my case I don't mind if the websocket connections drop. They'll quickly reconnect and the user will never know. As long as the "normal" HTTP connections are all served (i.e. no one sees an error message when loading the page for the first time), then I'm happy in my case. |
In my case I really need the websocket connections to drain. I have some
pages where it wouldnt matter but we are doing things like web based ssh
sessions. Haproxy is the only way I've found to drain websocket connections.
…On Fri, Nov 30, 2018, 6:13 PM Ryan Henning ***@***.***> wrote:
Yeah it will drop websocket connections, but any "normal" HTTP connections
*should* be drained gracefully before the old workers are shut down (I
haven't tested it super-well, but it does seem to work based on some basic
experimentation I've done -- I've only had this setup for a few days now).
I don't know of a way to *not* have the websocket connections drop...
since it's a long lived TCP connection, if the connection-holding process
dies it will have to drop. The only solution I know of would be to let
those old workers live a long time to hold open those old websocket
connections (I don't want to do that). Or do something like channels 1 did
where it had an entirely separate interface server (as its own process)
which communicated to the workers via redis (or whatever channel layer). I
was never a fan of that though -- channels 2 is way better in my opinion by
having the workers *be* the interface servers as well.
In my case I don't mind if the websocket connections drop. They'll quickly
reconnect and the user will never know. As long as the "normal" HTTP
connections are all server (i.e. no one sees an error message when loading
the page for the first time), then I'm happy in my case.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#177 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB8pv05gjDAUNlFhUqxaetg4Xnp7Rt-Pks5u0buwgaJpZM4SMgya>
.
|
@agronick,
I can understand your problem with having the websockets disconnected,
but as I've been told, websocket client connections should be built to
withstand disconnections and reconnect/resync gracefully, without
letting the user know (being stateless, practically). For the most part,
this is done by many websocket clients. I built several ones that aren't
even browser based, and every time they reconnect, they either exchange
synchronization information with the server, or they assume everything
continues as it were happening before. YMMV, but this should be the case
most of the time.
Maybe you want to put some extra connection handler into your
client/server logic to handle disconnects.
Cheers,
--
László Károlyi
http://linkedin.com/in/karolyi
…On 2018-12-01 00:20, Kyle Agronick wrote:
In my case I really need the websocket connections to drain. I have some
pages where it wouldnt matter but we are doing things like web based ssh
sessions. Haproxy is the only way I've found to drain websocket
connections.
On Fri, Nov 30, 2018, 6:13 PM Ryan Henning ***@***.***>
wrote:
> Yeah it will drop websocket connections, but any "normal" HTTP
connections
> *should* be drained gracefully before the old workers are shut down (I
> haven't tested it super-well, but it does seem to work based on some
basic
> experimentation I've done -- I've only had this setup for a few days
now).
> I don't know of a way to *not* have the websocket connections drop...
> since it's a long lived TCP connection, if the connection-holding
process
> dies it will have to drop. The only solution I know of would be to let
> those old workers live a long time to hold open those old websocket
> connections (I don't want to do that). Or do something like channels
1 did
> where it had an entirely separate interface server (as its own process)
> which communicated to the workers via redis (or whatever channel
layer). I
> was never a fan of that though -- channels 2 is way better in my
opinion by
> having the workers *be* the interface servers as well.
>
> In my case I don't mind if the websocket connections drop. They'll
quickly
> reconnect and the user will never know. As long as the "normal" HTTP
> connections are all server (i.e. no one sees an error message when
loading
> the page for the first time), then I'm happy in my case.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
>
<#177 (comment)>,
or mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AB8pv05gjDAUNlFhUqxaetg4Xnp7Rt-Pks5u0buwgaJpZM4SMgya>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#177 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA8Pr3lPEq4qbKXDaDho7jjQ1u2TXB4rks5u0b1SgaJpZM4SMgya>.
|
Were not talking about the user disconnecting and reconnecting. We're
talking about the process dying and a new one rebuilding the previous
process' state in memory. Some things just can't be serialized and
persisted. Other things aren't worth an exponentially larger development
effort when connection draining solves the problem fine. Especially sockets.
I don't know if there is even a way to hand a socket off from a process that is
shutting down to a new process.
|
If you find uvicorn works better for you, then please use it! Daphne is a reference server but doesn't have as much active development, so it likely will never beat it. |
@andrewgodwin Thank you for working so hard to build channels! Btw, channels 2 is wonderful. All the changes are well worth breaking the interface from channels 1. It's great to see other project (like uvicorn) adopting the ASGI standard as well. Very well done. |
Seems this ticket was left out. @andrewgodwin @carltongibson any plans in the nearest future to fix this? Thank you |
@Ken4scholars No immediate plans no. Next priority is updating to be fully ready for Django 3.1 — which mostly involves making Channels ASGI v3 ready, and updating the documentation there. If you would like to contribute then here is an opportunity! |
This is something I'd be interested in as well. |
Any updates ? |
This was present in the old version of Channels. The changelog says:
0.9.4 (2016-03-08)
sent SIGTERM or SIGINT.
This is no longer the case.
With the new architecture in Channels 2 this ability will need to be moved to Daphne. Daphne can not stop running. It will need some kind of API to reload new code while continuing to service existing connections on the old processes.
The text was updated successfully, but these errors were encountered: