-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] What does this do about churn (& related madness)? #6
Comments
We agree - and have been looking at some different techniques to keep the info blorbs synchronized (both purposefully executed saves, diffs, etc - as well as continuously updating cpu/mem state deltas for shadow nodes) - including something like syncthing - but maybe mechanizing it (or other appropriate mechanism) via the externalized cnvm tools so that all necessary states and bits are tracked and flagged for push as appropriate (really all about state maintenance). So - I think using something like syncthing as a transport isnt a bad idea - but we may want/need a way to do things like effective QoS or prioritization of different data types across it. One of the datatypes could certainly be node/cluster state (for the "re-animation" post shut-down as you describe above). Have you tried just shipping the filesystem structures across with it as docker stands? Does it work? |
Actually, I am doing that right now. Here's the arch: Glusterfs (surprisngly easy!) running on super-peers, which we're assuming the highest level of reliability from-- not 100% mind you, but-- high. The state bit does get tough, doesn't it? Where is state stored in a typical deploy? I've seen criu in action before, but I have no idea its structure. Anyway our leaf nodes will use the glusterfs volume driver to store their docker stuff. The super peer is a health-check runner, and if it finds that a node/docker is missing, it will then respawn that node's docker containers. I'll let you know how the test goes when it's finished :). |
Looks like we are onto a zfs pool strategy. I will let you know how it
|
churn (as pertains to p2p networks) - a node entering or leaving the network.
The only piece that I don't see here is that one. I'm going to describe the solution that we have planned for dealing with this below, and would love to hear your thoughts on to what extent this solution
A) Scales
B) Works alongside of what you've got now overall
My initial thought was to centralize storage, but centralization (blah blah blah)
My new thought is to use something like SyncThing to ensure that all of the nodes always have the same registry of images & container diffs. This way, when a node goes down, and something (might want to have a look at the unfinished github.com/superordinate/kdaemon) brings the containers back up, things are exactly as they were. In order for the kinds of use cases I'm pursuing to work out, I've got to create a situation where containers are relaunched when their host goes down, and also must account for the fact that we will not be able to predict nodes departure from the network.
(I should mention that I do not consider this a great or full solution)
The text was updated successfully, but these errors were encountered: