Doc: deploying with multiple daphne processes (v1 -> v2) #182

jxrossel · 2018-03-07T12:02:15Z

Hi,

With v1, I could run a single Daphne process and several workers, so that I could bind Daphne directly to a port number.

With v2, the worker and the server share the same Daphne process, so that the number of daphne processes must be increased to reach a similar situation. The Channels doc mentions using a process supervisor to increase the number of processes, but it doesn't state how a port can be shared between the Daphne processes.

I use Circus (http://circus.readthedocs.io/en/latest/tutorial/step-by-step/) as a process supervisor. Their doc mentions using Circus to create a socket bound to the desired port and to run multiple servers (Chaussette in the case) with the resulting (and common) socket file descriptor. Chaussette's doc explicitly mentions that:

The particularity of Chaussette is that it can either bind a socket on a port like any other server does or run against already opened sockets.

Is that also true for Daphne? Can Daphne processes share the same socket file descriptor? If yes, could that be mentionned in the doc?

Thanks in advance!

andrewgodwin · 2018-03-07T17:03:19Z

So:

Daphne can use a shared file descriptor if you launch it with --fd 1 or similar. This allows process managers that can share open file descriptors down to child processes (like Circus or SystemD) to work with Daphne, like Chausette. I will add this to the README.
Modern Linux kernels support SO_REUSEPORT, which allows multiple processes to listen on the same port without a supervising process. This was sort-of-covered in Implement Graceful Shutdown / Connection Draining #177, but I will just add it now as well since I see no problem having it on by default.

andrewgodwin · 2018-03-07T17:06:30Z

Update: It seems Twisted does not yet have support for SO_REUSEPORT, so I can't do that yet. The fd thing, however, still works.

agronick · 2018-03-07T17:28:30Z

It looks like it might be possible if you implement the socket code yourself.

In the meantime do you have any suggestions on how to do deployments without breaking everyone's connections?

andrewgodwin · 2018-03-07T17:32:27Z

Unfortunately we use Twisted endpoint syntax, which means it's not as easy as that example (if we still want to support e.g. automatic SSL). As for deployment stuff, take a look at #177, I discussed that some in there.

andrewgodwin · 2018-03-07T17:32:58Z

Wait, I didn't, I'm misremembering. Let me put suggestions there now.

jxrossel · 2018-03-08T09:43:37Z

Thanks for the answer. I can't make it work with Circus though (at least on Windows, see circus-tent/circus#1058). Do you know if Circus + Daphne (multi-process with shared socket) has already been tested ?

andrewgodwin · 2018-03-08T10:29:15Z

I have not tested it, and I don't think file handle passing works super well on Windows, unfortunately.

jxrossel · 2018-03-08T10:32:41Z

Ok, thanks for the input. I guess I'll stick with v1 then

andrewgodwin · 2018-03-08T11:51:34Z

You are free to, but please be aware it will get no updates or support. I'd recommend developing on Windows Subsystem for Linux or inside a VM if you need to run more than one process on a development machine. If you need to run things in production, I recommend deploying onto Linux rather than Windows (but I realise this is not always possible).

ricleal · 2018-04-26T21:15:01Z

Hi All,
Anyone found a solution for this (using systemd)?

I have a socket /usr/lib/systemd/system/daphne.socket to comunicate with nginx:

[Unit]
Description=Daphne Socket for the API
PartOf[email protected]

[Socket]
ListenStream=/usr/local/reduction/dist/daphne.sock
SocketMode=0660
SocketUser=reduction
SocketGroup=reduction

[Install]
WantedBy=sockets.target

And a service /usr/lib/systemd/system/[email protected]:

[Unit]
Description=Daphne Service For Django %I
After=syslog.target
After=network.target
After=postgresql-9.5.service
After=nginx.service

[Service]
Type=simple
RuntimeDirectory=daphne
PIDFile=/run/daphne.pid
WorkingDirectory=/usr/local/reduction/src

ExecStart=/usr/local/reduction/venv/bin/daphne --fd %i \
    server.asgi:application
User=reduction

[Install]
WantedBy=multi-user.target

Linux file descriptors are:

STDIN = 0
STDOUT = 1
STDERR = 2

Launching the service as sudo systemctl start daphne@{0..2}.service (this is the same has launching 3 daphne processes with with --fd 0, --fd 1 and --fd 2) I get the errors:

2018-04-27 08:31:33,166 INFO     Configuring endpoint fd:fileno=0
2018-04-27 08:31:33,177 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket

2018-04-27 08:31:33,471 INFO     Configuring endpoint fd:fileno=1
2018-04-27 08:31:33,481 CRITICAL Listen failure: index out of range

2018-04-27 08:36:31,439 INFO     Configuring endpoint fd:fileno=2
2018-04-27 08:36:31,455 CRITICAL Listen failure: index out of range

Launching with 3 to 5:

2018-04-27 08:38:42,867 INFO     Configuring endpoint fd:fileno=3
2018-04-27 08:38:42,876 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket

2018-04-27 08:38:42,645 INFO     Configuring endpoint fd:fileno=4
2018-04-27 08:38:42,657 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket

2018-04-27 08:38:42,692 INFO     Configuring endpoint fd:fileno=5
2018-04-27 08:38:42,724 CRITICAL Listen failure: index out of range

Any idea what I'm doing wrong?

ricleal · 2018-04-27T19:39:37Z

I have something running... I tried a script that uses the same principle: https://gist.github.com/ricleal/7db44dc626c01c25f1b461ae5ad7d5b1

For the script, it works, but not with daphne... May be because it only accepts int as file descriptor. Is that a valid assumption?

Here my scripts

$ cat /usr/lib/systemd/system/[email protected] 

[Unit]
Description=Daphne Service For Django %I
After=syslog.target
After=network.target
After=postgresql-9.5.service
After=nginx.service

Requires=daphne.socket

[Service]
Type=simple
RuntimeDirectory=reduction

PIDFile=/run/daphne.pid

WorkingDirectory=/usr/local/reduction/src

ExecStart=/usr/local/reduction/venv/bin/daphne --fd %i  \
    server.asgi:application

[Install]
WantedBy=multi-user.target

$ cat /usr/lib/systemd/system/daphne.socket 
[Unit]
Description=Daphne Socket

[Socket]
ListenStream=/usr/local/reduction/dist/daphne.sock
Accept=yes

[Install]
WantedBy=sockets.target

The service starts as: sudo systemctl restart daphne.socket

When I refresh the browser this is the result of the journalctl:

usage: daphne [-h] [-p PORT] [-b HOST] [--websocket_timeout WEBSOCKET_TIMEOUT]
 [--websocket_connect_timeout WEBSOCKET_CONNECT_TIMEOUT]
 [-u UNIX_SOCKET] [--fd FILE_DESCRIPTOR] [-e SOCKET_STRINGS]
 [-v VERBOSITY] [-t HTTP_TIMEOUT] [--access-log ACCESS_LOG]
 [--ping-interval PING_INTERVAL] [--ping-timeout PING_TIMEOUT]
 [--application-close-timeout APPLICATION_CLOSE_TIMEOUT]
 [--ws-protocol [WS_PROTOCOLS [WS_PROTOCOLS ...]]]
 [--root-path ROOT_PATH] [--proxy-headers]
 application
daphne: error: argument --fd: invalid int value: '11-11470-991'

The name of the file descriptor is 11-11470-991 which is the fd number followed by the user and group ids.

agronick · 2018-05-03T22:50:11Z

I ended up putting HAproxy after Nginx. Its the only thing I found where I could have one socket with x many workers and put them into a drain state without any interruptions to the users.

…

On Fri, Apr 27, 2018, 3:39 PM Ricardo Ferraz Leal ***@***.***> wrote: I have something running... I tried a script that uses the same principle: https://gist.github.com/ricleal/7db44dc626c01c25f1b461ae5ad7d5b1 For the script, it works, but not with daphne... May be because it only accepts int as file descriptor. *Is that a valid assumption?* Here my scripts $ cat ***@***.*** [Unit]Description=Daphne Service For Django %IAfter=syslog.targetAfter=network.targetAfter=postgresql-9.5.serviceAfter=nginx.service Requires=daphne.socket [Service]Type=simpleRuntimeDirectory=reduction PIDFile=/run/daphne.pid WorkingDirectory=/usr/local/reduction/src ExecStart=/usr/local/reduction/venv/bin/daphne --fd $(echo %i | cut -d '-' -f 1) \ server.asgi:application [Install]WantedBy=multi-user.target $ cat /usr/lib/systemd/system/daphne.socket [Unit]Description=Daphne Socket [Socket]ListenStream=/usr/local/reduction/dist/daphne.sockAccept=yes [Install]WantedBy=sockets.target The service starts as: sudo systemctl restart daphne.socket When I refresh the browser this is the result of the journalctl: usage: daphne [-h] [-p PORT] [-b HOST] [--websocket_timeout WEBSOCKET_TIMEOUT] [--websocket_connect_timeout WEBSOCKET_CONNECT_TIMEOUT] [-u UNIX_SOCKET] [--fd FILE_DESCRIPTOR] [-e SOCKET_STRINGS] [-v VERBOSITY] [-t HTTP_TIMEOUT] [--access-log ACCESS_LOG] [--ping-interval PING_INTERVAL] [--ping-timeout PING_TIMEOUT] [--application-close-timeout APPLICATION_CLOSE_TIMEOUT] [--ws-protocol [WS_PROTOCOLS [WS_PROTOCOLS ...]]] [--root-path ROOT_PATH] [--proxy-headers] application daphne: error: argument --fd: invalid int value: '11-11470-991' The name of the file descriptor is 11-11470-991 which is the fd number followed by the user and group ids. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#182 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB8pv-zqpVuZQ2nqMXWs9RlI7gdzCVT_ks5ts3P6gaJpZM4SgVDt> .

ricleal · 2018-05-04T11:46:53Z

While someone finds a way of doing it, I have played with NGINX. Since my websockets start with /ws, I have one instance of daphne serving them, and I have multiple uWSGI serving the rest of the requests.

karolyi · 2018-05-08T00:18:05Z

@agronick care to share your solution's details? I'm about to deploy daphne 2 and I'm too looking for a solution.

ricleal · 2018-05-08T11:40:10Z

It would be nice to have a solution. I think so far no one has managed to have daphne with multiple processes....

ricleal · 2018-05-08T19:01:39Z

This only works for Ubuntu 16.04 Systemd v229. Can't have it working for RHEL7 Systemd v219.
Also Unix sockets don't appear to work. We need to use hosts and ports.

/usr/lib/systemd/system/[email protected]

[Unit]
Description=Daphne Worker %i
After=syslog.target
After=network.target
After=nginx.service

Requires=daphne@%i.socket

[Service]
Type=simple
PIDFile=/run/daphne.pid
WorkingDirectory=/usr/local/reduction/src

ExecStart=/usr/local/reduction/venv/bin/daphne \
    -e systemd:domain=INET:index=0 \
    server.asgi:application

NonBlocking=true

[Install]
WantedBy=multi-user.target

/usr/lib/systemd/system/[email protected]

[Unit]
Description = Daphne Socket for worker %i

[Socket]
ListenStream = 8888
Service = daphne@%i.service
ReusePort=true

[Install]
WantedBy = sockets.target

Launch as: sudo systemctl daemon-reload && sudo systemctl start web@{1..4}.socket for 4 workers.

In your nginx you should have something like this:

    upstream app_server {
        server 0.0.0.0:8888;
    }

karolyi · 2018-05-08T20:04:24Z

I'd vote for a 'non-vendor-locked' solution, besides the systemd variant lacks the draining of the switched off service.

I'll probably have to implement an extra message for websocket clients to close their connection gracefully and then reconnect. Other than that, haproxy can hinder new connections to the draining state daphne instance, and so with some extra internal implementation, it can shutdown gracefully when all the websocket connections are closed.

agronick · 2018-05-11T23:15:32Z

The way I did it is to have Ansible set it all up. I can tune the number of workers with simple variables. Ansible creates the SystemD service files. It creates the Nginx config. So they always match up. If I want 20 workers tomorrow I can just run it and know then Nginx config is right, the sockets are there, and the service files are there.

I have two worker groups A and B. When A is running and I want to deploy new code I spin up B and put A into a drain state. That way all new connections get the new code and anyone who is still connected can take their time disconnecting.

I have Ansible poll for 4 hours waiting for there to be 0 connections on the inactive worker group. If people haven't disconnected by then they get kicked off and usually automatically reconnect with Channel's websocket bridge.

When I want to deploy again I take A from offline and make it active and put B into drain - constantly switching between the two. This limits the amount of deployments you can do. Usually 1 a day without interrupting anyone that is in the middle of something.

I would post the code but I work for a big company that likes to fire people over stupid things.

karolyi · 2018-05-12T09:38:08Z

@agronick this all seems fine, but what happens when you deploy often and your 'other' daphne instances aren't drained off yet?

Leaving this here for information: I talked to the guys on IRC, and they said it's better to make the clients and the implementation so robust that it will reconnect and continue it's thing without data loss. In my case, this happens basically by using transaction IDs for the payments I control through websockets.

agronick · 2018-05-14T01:16:59Z

@karolyi Nothing stops you from having worker groups C, D, E, F, and G if you have the memory to run them. I find that if I'm doing two deployments in one day it is probably because the first one broke something.

I use the channels websocket bridge to do reconnecting. But a big challenge is you need to be able to get your consumer into the state it was in before it disconnected. If message 3 comes in on a new connection and the state wasn't set up with messages 1 and 2 you need to simulate processing messages 1 and 2 before you work with message 3.

jxrossel · 2018-08-10T13:17:58Z

Just a heads up: it actually works with Circus. The Windows issue comes from Twisted (on which Daphne is built): see circus-tent/circus#1058 (comment)

jxrossel changed the title ~~Doc: deploying with multiple daphne process (v1 -> v2)~~ Doc: deploying with multiple daphne processes (v1 -> v2) Mar 7, 2018

andrewgodwin closed this as completed Mar 7, 2018

andrewgodwin mentioned this issue Mar 7, 2018

Implement Graceful Shutdown / Connection Draining #177

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: deploying with multiple daphne processes (v1 -> v2) #182

Doc: deploying with multiple daphne processes (v1 -> v2) #182

jxrossel commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

agronick commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

jxrossel commented Mar 8, 2018

andrewgodwin commented Mar 8, 2018

jxrossel commented Mar 8, 2018

andrewgodwin commented Mar 8, 2018

ricleal commented Apr 26, 2018 •

edited

Loading

ricleal commented Apr 27, 2018 •

edited

Loading

agronick commented May 3, 2018 via email

ricleal commented May 4, 2018

karolyi commented May 8, 2018

ricleal commented May 8, 2018

ricleal commented May 8, 2018 •

edited

Loading

karolyi commented May 8, 2018

agronick commented May 11, 2018

karolyi commented May 12, 2018

agronick commented May 14, 2018

jxrossel commented Aug 10, 2018

Doc: deploying with multiple daphne processes (v1 -> v2) #182

Doc: deploying with multiple daphne processes (v1 -> v2) #182

Comments

jxrossel commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

agronick commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

jxrossel commented Mar 8, 2018

andrewgodwin commented Mar 8, 2018

jxrossel commented Mar 8, 2018

andrewgodwin commented Mar 8, 2018

ricleal commented Apr 26, 2018 • edited Loading

ricleal commented Apr 27, 2018 • edited Loading

agronick commented May 3, 2018 via email

ricleal commented May 4, 2018

karolyi commented May 8, 2018

ricleal commented May 8, 2018

ricleal commented May 8, 2018 • edited Loading

karolyi commented May 8, 2018

agronick commented May 11, 2018

karolyi commented May 12, 2018

agronick commented May 14, 2018

jxrossel commented Aug 10, 2018

ricleal commented Apr 26, 2018 •

edited

Loading

ricleal commented Apr 27, 2018 •

edited

Loading

ricleal commented May 8, 2018 •

edited

Loading