-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request/Question: Running this in Docker #23
Comments
Hi @kaysond. Thanks for the suggestion. We have considered this, as it would be neat to be able to use docker services to deploy DIND. As you suggest, it's possible to gain most of the required privileges by launching the container using However, without also launching the DIND container with Further, it seems the DIND container is not permitted to use And so while one can't launch a DIND service directly, one can launch a DIND container. Here's a Debian POC (which assumes
As you suggest, it is also possible to bootstrap this DIND container using a wrapper service. Here's a POC of that:
As to your remark about "grabbing the ingress network IPs", that step only needs to be done once, before DIND is launched, and in general it is not necessary or desireable to grab all ingress IPs, only those of the nodes one wishes to use as load-balancer(s), as it is this subset of IPs that one passes to And so finally, subject to these limitations, it seems it is possible to arrange for DIND to be launched in this way via a wrapper service. Two potentially interesting ways to extend DIND, following this exercise, are:
Please let us know whether (1) and/or (2) sound worth progressing. |
You can bind mount
I'm less familiar with this one, but In any case, it seems that you're only using
I'm assuming by ingress gateways you mean the host IP of the nodes you want to apply the iptables rules on. In that case, it would still be nice to have this happen automatically, because you could then leverage node labels to dynamically assign/unassign load balancer nodes to the daemon. With the above, I think you can mirror my approach, which is a service that launches services. To me, that's preferable to a service that launches containers. This is closer to #2, but I think you still want an official image a la #1. In my case, I used the same dockerhub image both for the parent service (one per swarm), and the child service (global; one per node), using an env var to distinguish between the two. |
Hi @kaysond. I've been considering your approach. The problem I see with it, is while bind-mounting Otherwise, I believe it may be necessary - and indeed still satisfactory - to have the wrapper global service launching privileged containers. On this, you said you preferred a wrapper service that launches services: I still don't understand the benefits of this, over a wrapper service that launches containers. Could you elaborate on this? |
Let me dig into this a little bit this weekend and get back to you. If I remember correctly from my previous investigations, all of the network namespaces are available via
It's minor, but doing the way I suggested means you have a parent swarm service with one replica running on a manager node, and then another child service running globally or on selected nodes (or based on label, etc). Your suggestion would have a global parent service container on every node and also the child container on every selected node. |
Given your show of confidence - thanks for that - I've done some digging myself, and I believe I've worked it out. For a container
This should allow much of DIND to operate without
We now actually have a POC implementation of a DIND service that works this way, which I hope to commit to a branch soon. P.S. Regarding automatic detection of node ingress network IPs, by node labels, I can see some value in this, and the idea of scraping them from service logs is a good one. But, we've experienced issues with |
The POC implementation of a DIND service can now be found at https://github.com/newsnowlabs/docker-ingress-routing-daemon/tree/service-v1. P.S. On further consideration of the automatic detection of node ingress network IPs, any process calling |
Nice! That's basically what I was doing but I was doing the doing network's namespace via the network ID. Am I correct in understanding that you need the container's namespace because you have to add rules there for the outbound packets?
Interesting. I never ran into that issue in all my (admittedly limited) testing. Your workaround seems like a good idea.
Agreed
IMO the usefulness of being able to have node IP's get picked up automatically and dynamically outweighs the loss of automatic sysctl tweaks. Also, I would say sysctl tweaks are outside the scope of a container orchestration system, but determining the selected gateway nodes and their IP's automatically based on node labels seems to be more in the realm of a container routing daemon!
I'll try to give this a shot over the weekend! |
Hi @kaysond. I've mostly implemented the outer-wrapper service with autodetection of node IPs, and cross-checking of these IPs against node labels, with no loss of sysctl functionality. I expect to push what I have to the branch later today. To your questions:
Yes, for mapping the TOS byte (identifying the load balancer node) on incoming packets to a connection mark, then mapping the connection mark on outgoing packets to a firewall mark, then matching the firewall wark to a routing table back to the correct load balancer node.
Regarding the sysctl tweaks, you might be correct that these would not be within scope of DIND, except that they proved necessary to make the performance of the system viable (and prevent issues on some kernels), and DIND is a convenient tool for applying them (particularly the service containers). Should we remove that functionality, the user would have to use docker's |
Hi @kaysond. I'm pleased to let you know that I've now pushed a more complete implementation to the branch, and a ready-build image to Docker Hub. I'd appreciate you trying it out, and commenting on the code. There are several ways to launch. To launch as a manager container, with connected terminal (easiest way to demo and see what's going on):
To launch as a single-replica manager service:
(To simplify launching using these commands, please also see the https://github.com/newsnowlabs/docker-ingress-routing-daemon/blob/service-v1/docker/create-service.sh launch script.) N.B. In the default mode (which is currently hard-coded) the manager service will assume all nodes to be load balancers, unless the I've tested this out in a vanilla Docker Swarm launched on Google Cloud:
Looking forward to hearing from you, once you've had a chance to look at this. |
I haven't forgotten about this! Just haven't had time. Should be able to take a look this weekend. |
Thanks so much @kaysond. Given the changes in this branch do not remotely touch the core code, and that the documentation for the new service added to P.S. See https://github.com/newsnowlabs/docker-ingress-routing-daemon/tree/service-v1 for the updated documentation. |
Following further testing, I have applied several fixes and improvements to the branch:
|
Still on my radar, sorry! I am very excited to play around with this when I get a chance. |
@struanb - Alright I finally had a chance to test this out! It worked just fine with a dummy traefik and whoami service. Here are my thoughts:
This gets back to the It may be worth a little bit more research (or maybe another github issue on moby) to see if there is a workaround to setting host sysctls in a network namespace Example for the bug (same thing happens if I use the host's local ip instead of 127.0.0.1)
|
Hi @kaysond. Thanks ever so much for looking at the branch, and for your thoughtful feedback.
This hole is very real, we experience it too. At the time we created DIND, we decided it was safer to work around it by allowing access to the relevant services via the Docker bridge IP range. I would agree this is not the best approach. According to our records, the alternative was to install one of these example iptables rules: I think it may be appropriate to extend DIND (at least provide an option) to add a firewall rule of this sort to each host. What do you think, and what form of rule should it take?
I agree We had better at least rename
I believe you're right, judging by my earlier comment (#23 (comment)). Thanks for highlighting. We will look at this again.
It was convenient. I agree an alpine-based image would be leaner. Thanks for suggesting. We can certainly try it.
I agree this is unecessary, though I found it a very useful way to run the manager container when debugging.
Good idea. Would you like to contribute some?
This would be a simplification; but it might be an over-simplification. After all, if you don't need node load-balancer autodiscovery, you don't need the manager service at all. I suspect it could be useful to retain this bit of explanation for people whose load-balancer node(s) are fixed and prefer to avoid the risk of any mis-configuration leading to inadvertent restart of the global service? (NewsNow's systems fall into this category).
I agree it would be neater not to need the subcontainer, but I'm still not sure I see an objective concern here. A subcontainer is a bit like a subprocess, and you can't observe all your swarm service container's subprocesses from manager nodes either. So this preference seems subjective.
It's correct a global service container can check to see if it is running on a manager node. However, we need the manager process to run on one and only one manager node. I can't see how that can be done from within the global service. This is why we need the manager-constrained 1-replica service.
So I agree with this.
I'm not convinced that we can rely on correct values being set by the operating system. These defaults may be different between distributions, and between distribution versions, and also subject to change over time. We can't possibly test them all. And because of the early-day performance issues we experienced, I remain convinced that setting these sysctl values is really important. Although I'm sensitive to security concerns, right now I'm still not sure I see an objective concern here. The original mode of operation of the daemon is to run it as root on the host anyway, and so running it in a privileged container does not seem to grant any additional privileges to the daemon, and avoiding
If we can discover a more restricted capability that can be granted to allow the subcontainer to still run The only way I see to avoid needing the sysctl now would be to require the administrator to (re)launch all their services with |
Agreed. If you want to keep things simple, I would suggest using the interface-based rule. I think docker internally rotates the subnets it uses, and once it run through some set of the If you wanted to get fancy, since you already have access to the docker socket, you could actually scan for all the subnets being used, or the interface names even. Not sure it's worth the effort, though.
Happy to. Once things are a little more finalized I can whip them up.
If you combined the manager service functionality into the global service, presumably it would only perform autodiscovery when the load balancer ips are not specified.
I may not be able to see all of the subprocesses, but I can see see all of the containers across the swarm, and the swarm has visibility into every container as well because the nodes report back. The problem with manually deploying a container is that now you're responsible for handling orchestration yourself, which to me is bad practice. What happens if the container crashes or gets killed? Or updates to a new version that doesn't work? What about healthchecks to make sure its still running? Sure, you can probably try to address these things from within the global service, but the whole point of docker swarm is to do this all for you.
I think I mixed myself up a bit here because I was trying to extend my methodology from trafficjam. I also had a single 1-replica, manager-constrained service that performed the autodiscovery, but it could just launch a global service with the necessary capabilities to create firewall rules on each node (since I didn't need
That's fair.
I don't doubt it, I'm just not sure its appropriate for a firewall/routing daemon to configure the kernel settings, and I still don't think it's worth the complexity/confusion and lack of orchestration I mentioned above. Most (all?) of the enterprise software I've worked with recommends the necessary kernel settings, and even warns when they're not set, but doesn't actually set them. Though, to be fair, since this is only being set in the ingress network namespace... I wouldn't really be opposed to that happening automatically if it had less of a cost.
I actually have no problems with running
I created moby/moby#43769. We can see if there are any useful responses.
Does that actually work? Isn't that setting the sysctl inside each service's namespace where you are setting the sysctl in the |
Ok so I think I found a good "workaround" that addresses some of my concern with the extra child container. (Maybe you're doing this or something similar already, but from what I remember in the code this is not the case). Take a look at moby/swarmkit#1030 (comment) Rather than running a global service of your own image which then manages the child container, you use the global service as a dummy that simply launches the child container as its command. So the manager service would execute something like:
|
Ok so the conclusion of moby/moby#43769 is basically that we're not going to see privileged services anytime soon, but the workaround in the above comment is the generally accepted solution. So I was thinking a good way to do this all is:
This is more secure because it just uses the privileged docker container for sysctl settings, but the persistent service has limited capabilities. It's clean, because you only have the privileged container run once on startup, then it exits. And its robust because you get the docker swarm orchestration monitoring the whole thing! Let me know what you think |
Hi @kaysond. Apologies for the long delay in resuming work on this issue. We have been affected by illness in our team. For now, we are happy enough with the service model implemented in the We do still want to bugfix the On this basis, we will keep this issue open for now. |
I'm curious if you've thought about/tried installing or deploying this from within docker itself. It would be really convenient to use docker's own orchestration to get this running on every node. I took a quick look through the code, and I think you could do this using a docker-dind image with host networking (and some CAP_ADD's).
I've done something similar in https://github.com/kaysond/trafficjam
I think the one tricky bit is grabbing the ingress network IPs without having to run the daemon manually on each. I had a similar issue where I need to find the internal loadbalancer IPs. I ended up with a very hack-y but totally functional/reliable method: trafficjam runs as a service on a manager node which itself deploys "child" services on all nodes. The child services are able to detect and report their loadbalancer IPs via the docker logs. The parent service monitors those logs, and when it finds new loadbalancer IPs, it restarts all the child services.
The text was updated successfully, but these errors were encountered: