Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: deploying with multiple daphne processes (v1 -> v2) #182

Closed
jxrossel opened this issue Mar 7, 2018 · 21 comments
Closed

Doc: deploying with multiple daphne processes (v1 -> v2) #182

jxrossel opened this issue Mar 7, 2018 · 21 comments

Comments

@jxrossel
Copy link

jxrossel commented Mar 7, 2018

Hi,

With v1, I could run a single Daphne process and several workers, so that I could bind Daphne directly to a port number.

With v2, the worker and the server share the same Daphne process, so that the number of daphne processes must be increased to reach a similar situation. The Channels doc mentions using a process supervisor to increase the number of processes, but it doesn't state how a port can be shared between the Daphne processes.

I use Circus (http://circus.readthedocs.io/en/latest/tutorial/step-by-step/) as a process supervisor. Their doc mentions using Circus to create a socket bound to the desired port and to run multiple servers (Chaussette in the case) with the resulting (and common) socket file descriptor. Chaussette's doc explicitly mentions that:

The particularity of Chaussette is that it can either bind a socket on a port like any other server does or run against already opened sockets.

Is that also true for Daphne? Can Daphne processes share the same socket file descriptor? If yes, could that be mentionned in the doc?

Thanks in advance!

@jxrossel jxrossel changed the title Doc: deploying with multiple daphne process (v1 -> v2) Doc: deploying with multiple daphne processes (v1 -> v2) Mar 7, 2018
@andrewgodwin
Copy link
Member

So:

  • Daphne can use a shared file descriptor if you launch it with --fd 1 or similar. This allows process managers that can share open file descriptors down to child processes (like Circus or SystemD) to work with Daphne, like Chausette. I will add this to the README.

  • Modern Linux kernels support SO_REUSEPORT, which allows multiple processes to listen on the same port without a supervising process. This was sort-of-covered in Implement Graceful Shutdown / Connection Draining #177, but I will just add it now as well since I see no problem having it on by default.

@andrewgodwin
Copy link
Member

Update: It seems Twisted does not yet have support for SO_REUSEPORT, so I can't do that yet. The fd thing, however, still works.

@agronick
Copy link
Contributor

agronick commented Mar 7, 2018

It looks like it might be possible if you implement the socket code yourself.

In the meantime do you have any suggestions on how to do deployments without breaking everyone's connections?

@andrewgodwin
Copy link
Member

Unfortunately we use Twisted endpoint syntax, which means it's not as easy as that example (if we still want to support e.g. automatic SSL). As for deployment stuff, take a look at #177, I discussed that some in there.

@andrewgodwin
Copy link
Member

Wait, I didn't, I'm misremembering. Let me put suggestions there now.

@jxrossel
Copy link
Author

jxrossel commented Mar 8, 2018

Thanks for the answer. I can't make it work with Circus though (at least on Windows, see circus-tent/circus#1058). Do you know if Circus + Daphne (multi-process with shared socket) has already been tested ?

@andrewgodwin
Copy link
Member

I have not tested it, and I don't think file handle passing works super well on Windows, unfortunately.

@jxrossel
Copy link
Author

jxrossel commented Mar 8, 2018

Ok, thanks for the input. I guess I'll stick with v1 then

@andrewgodwin
Copy link
Member

You are free to, but please be aware it will get no updates or support. I'd recommend developing on Windows Subsystem for Linux or inside a VM if you need to run more than one process on a development machine. If you need to run things in production, I recommend deploying onto Linux rather than Windows (but I realise this is not always possible).

@ricleal
Copy link

ricleal commented Apr 26, 2018

Hi All,
Anyone found a solution for this (using systemd)?

I have a socket /usr/lib/systemd/system/daphne.socket to comunicate with nginx:

[Unit]
Description=Daphne Socket for the API
PartOf[email protected]

[Socket]
ListenStream=/usr/local/reduction/dist/daphne.sock
SocketMode=0660
SocketUser=reduction
SocketGroup=reduction

[Install]
WantedBy=sockets.target

And a service /usr/lib/systemd/system/[email protected]:

[Unit]
Description=Daphne Service For Django %I
After=syslog.target
After=network.target
After=postgresql-9.5.service
After=nginx.service

[Service]
Type=simple
RuntimeDirectory=daphne
PIDFile=/run/daphne.pid
WorkingDirectory=/usr/local/reduction/src

ExecStart=/usr/local/reduction/venv/bin/daphne --fd %i \
    server.asgi:application
User=reduction

[Install]
WantedBy=multi-user.target

Linux file descriptors are:

STDIN = 0
STDOUT = 1
STDERR = 2

Launching the service as sudo systemctl start daphne@{0..2}.service (this is the same has launching 3 daphne processes with with --fd 0, --fd 1 and --fd 2) I get the errors:

2018-04-27 08:31:33,166 INFO     Configuring endpoint fd:fileno=0
2018-04-27 08:31:33,177 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket

2018-04-27 08:31:33,471 INFO     Configuring endpoint fd:fileno=1
2018-04-27 08:31:33,481 CRITICAL Listen failure: index out of range

2018-04-27 08:36:31,439 INFO     Configuring endpoint fd:fileno=2
2018-04-27 08:36:31,455 CRITICAL Listen failure: index out of range

Launching with 3 to 5:

2018-04-27 08:38:42,867 INFO     Configuring endpoint fd:fileno=3
2018-04-27 08:38:42,876 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket

2018-04-27 08:38:42,645 INFO     Configuring endpoint fd:fileno=4
2018-04-27 08:38:42,657 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket

2018-04-27 08:38:42,692 INFO     Configuring endpoint fd:fileno=5
2018-04-27 08:38:42,724 CRITICAL Listen failure: index out of range

Any idea what I'm doing wrong?

@ricleal
Copy link

ricleal commented Apr 27, 2018

I have something running... I tried a script that uses the same principle: https://gist.github.com/ricleal/7db44dc626c01c25f1b461ae5ad7d5b1

For the script, it works, but not with daphne... May be because it only accepts int as file descriptor. Is that a valid assumption?

Here my scripts

$ cat /usr/lib/systemd/system/[email protected] 

[Unit]
Description=Daphne Service For Django %I
After=syslog.target
After=network.target
After=postgresql-9.5.service
After=nginx.service

Requires=daphne.socket

[Service]
Type=simple
RuntimeDirectory=reduction

PIDFile=/run/daphne.pid

WorkingDirectory=/usr/local/reduction/src

ExecStart=/usr/local/reduction/venv/bin/daphne --fd %i  \
    server.asgi:application

[Install]
WantedBy=multi-user.target
$ cat /usr/lib/systemd/system/daphne.socket 
[Unit]
Description=Daphne Socket

[Socket]
ListenStream=/usr/local/reduction/dist/daphne.sock
Accept=yes

[Install]
WantedBy=sockets.target

The service starts as: sudo systemctl restart daphne.socket

When I refresh the browser this is the result of the journalctl:

usage: daphne [-h] [-p PORT] [-b HOST] [--websocket_timeout WEBSOCKET_TIMEOUT]
 [--websocket_connect_timeout WEBSOCKET_CONNECT_TIMEOUT]
 [-u UNIX_SOCKET] [--fd FILE_DESCRIPTOR] [-e SOCKET_STRINGS]
 [-v VERBOSITY] [-t HTTP_TIMEOUT] [--access-log ACCESS_LOG]
 [--ping-interval PING_INTERVAL] [--ping-timeout PING_TIMEOUT]
 [--application-close-timeout APPLICATION_CLOSE_TIMEOUT]
 [--ws-protocol [WS_PROTOCOLS [WS_PROTOCOLS ...]]]
 [--root-path ROOT_PATH] [--proxy-headers]
 application
daphne: error: argument --fd: invalid int value: '11-11470-991'

The name of the file descriptor is 11-11470-991 which is the fd number followed by the user and group ids.

@agronick
Copy link
Contributor

agronick commented May 3, 2018 via email

@ricleal
Copy link

ricleal commented May 4, 2018

While someone finds a way of doing it, I have played with NGINX. Since my websockets start with /ws, I have one instance of daphne serving them, and I have multiple uWSGI serving the rest of the requests.

@karolyi
Copy link
Contributor

karolyi commented May 8, 2018

@agronick care to share your solution's details? I'm about to deploy daphne 2 and I'm too looking for a solution.

@ricleal
Copy link

ricleal commented May 8, 2018

It would be nice to have a solution. I think so far no one has managed to have daphne with multiple processes....

@ricleal
Copy link

ricleal commented May 8, 2018

This only works for Ubuntu 16.04 Systemd v229. Can't have it working for RHEL7 Systemd v219.
Also Unix sockets don't appear to work. We need to use hosts and ports.

/usr/lib/systemd/system/[email protected]

[Unit]
Description=Daphne Worker %i
After=syslog.target
After=network.target
After=nginx.service

Requires=daphne@%i.socket

[Service]
Type=simple
PIDFile=/run/daphne.pid
WorkingDirectory=/usr/local/reduction/src

ExecStart=/usr/local/reduction/venv/bin/daphne \
    -e systemd:domain=INET:index=0 \
    server.asgi:application

NonBlocking=true

[Install]
WantedBy=multi-user.target

/usr/lib/systemd/system/[email protected]

[Unit]
Description = Daphne Socket for worker %i

[Socket]
ListenStream = 8888
Service = daphne@%i.service
ReusePort=true

[Install]
WantedBy = sockets.target

Launch as: sudo systemctl daemon-reload && sudo systemctl start web@{1..4}.socket for 4 workers.

In your nginx you should have something like this:

    upstream app_server {
        server 0.0.0.0:8888;
    }

@karolyi
Copy link
Contributor

karolyi commented May 8, 2018

I'd vote for a 'non-vendor-locked' solution, besides the systemd variant lacks the draining of the switched off service.

I'll probably have to implement an extra message for websocket clients to close their connection gracefully and then reconnect. Other than that, haproxy can hinder new connections to the draining state daphne instance, and so with some extra internal implementation, it can shutdown gracefully when all the websocket connections are closed.

@agronick
Copy link
Contributor

The way I did it is to have Ansible set it all up. I can tune the number of workers with simple variables. Ansible creates the SystemD service files. It creates the Nginx config. So they always match up. If I want 20 workers tomorrow I can just run it and know then Nginx config is right, the sockets are there, and the service files are there.

I have two worker groups A and B. When A is running and I want to deploy new code I spin up B and put A into a drain state. That way all new connections get the new code and anyone who is still connected can take their time disconnecting.

I have Ansible poll for 4 hours waiting for there to be 0 connections on the inactive worker group. If people haven't disconnected by then they get kicked off and usually automatically reconnect with Channel's websocket bridge.

When I want to deploy again I take A from offline and make it active and put B into drain - constantly switching between the two. This limits the amount of deployments you can do. Usually 1 a day without interrupting anyone that is in the middle of something.

I would post the code but I work for a big company that likes to fire people over stupid things.

@karolyi
Copy link
Contributor

karolyi commented May 12, 2018

@agronick this all seems fine, but what happens when you deploy often and your 'other' daphne instances aren't drained off yet?

Leaving this here for information: I talked to the guys on IRC, and they said it's better to make the clients and the implementation so robust that it will reconnect and continue it's thing without data loss. In my case, this happens basically by using transaction IDs for the payments I control through websockets.

@agronick
Copy link
Contributor

@karolyi Nothing stops you from having worker groups C, D, E, F, and G if you have the memory to run them. I find that if I'm doing two deployments in one day it is probably because the first one broke something.

I use the channels websocket bridge to do reconnecting. But a big challenge is you need to be able to get your consumer into the state it was in before it disconnected. If message 3 comes in on a new connection and the state wasn't set up with messages 1 and 2 you need to simulate processing messages 1 and 2 before you work with message 3.

@jxrossel
Copy link
Author

Just a heads up: it actually works with Circus. The Windows issue comes from Twisted (on which Daphne is built): see circus-tent/circus#1058 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants