Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What does this do about churn (& related madness)? #6

Open
faddat opened this issue Feb 1, 2016 · 3 comments
Open

[Question] What does this do about churn (& related madness)? #6

faddat opened this issue Feb 1, 2016 · 3 comments

Comments

@faddat
Copy link

faddat commented Feb 1, 2016

churn (as pertains to p2p networks) - a node entering or leaving the network.

The only piece that I don't see here is that one. I'm going to describe the solution that we have planned for dealing with this below, and would love to hear your thoughts on to what extent this solution

A) Scales
B) Works alongside of what you've got now overall

My initial thought was to centralize storage, but centralization (blah blah blah)

My new thought is to use something like SyncThing to ensure that all of the nodes always have the same registry of images & container diffs. This way, when a node goes down, and something (might want to have a look at the unfinished github.com/superordinate/kdaemon) brings the containers back up, things are exactly as they were. In order for the kinds of use cases I'm pursuing to work out, I've got to create a situation where containers are relaunched when their host goes down, and also must account for the fact that we will not be able to predict nodes departure from the network.

(I should mention that I do not consider this a great or full solution)

@faddat faddat changed the title [Question] What does this do about churn & health checks? [Question] What does this do about churn (& related madness)? Feb 1, 2016
@stlalpha
Copy link
Member

stlalpha commented Feb 3, 2016

We agree - and have been looking at some different techniques to keep the info blorbs synchronized (both purposefully executed saves, diffs, etc - as well as continuously updating cpu/mem state deltas for shadow nodes) - including something like syncthing - but maybe mechanizing it (or other appropriate mechanism) via the externalized cnvm tools so that all necessary states and bits are tracked and flagged for push as appropriate (really all about state maintenance).

So - I think using something like syncthing as a transport isnt a bad idea - but we may want/need a way to do things like effective QoS or prioritization of different data types across it. One of the datatypes could certainly be node/cluster state (for the "re-animation" post shut-down as you describe above).

Have you tried just shipping the filesystem structures across with it as docker stands? Does it work?

@faddat
Copy link
Author

faddat commented Feb 3, 2016

Actually, I am doing that right now. Here's the arch:

Glusterfs (surprisngly easy!) running on super-peers, which we're assuming the highest level of reliability from-- not 100% mind you, but-- high. The state bit does get tough, doesn't it? Where is state stored in a typical deploy? I've seen criu in action before, but I have no idea its structure. Anyway our leaf nodes will use the glusterfs volume driver to store their docker stuff. The super peer is a health-check runner, and if it finds that a node/docker is missing, it will then respawn that node's docker containers. I'll let you know how the test goes when it's finished :).

@faddat
Copy link
Author

faddat commented Feb 18, 2016

Looks like we are onto a zfs pool strategy. I will let you know how it
pans out in a global deployment....
On Feb 3, 2016 10:44 AM, "Jim McBride" [email protected] wrote:

We agree - and have been looking at some different techniques to keep the
info blorbs synchronized (both purposefully executed saves, diffs, etc - as
well as continuously updating cpu/mem state deltas for shadow nodes) -
including something like syncthing - but maybe mechanizing it (or other
appropriate mechanism) via the externalized cnvm tools so that all
necessary states and bits are tracked and flagged for push as appropriate
(really all about state maintenance).

So - I think using something like syncthing as a transport isnt a bad idea

  • but we may want/need a way to do things like effective QoS or
    prioritization of different data types across it. One of the datatypes
    could certainly be node/cluster state (for the "re-animation" post
    shut-down as you describe above).

Have you tried just shipping the filesystem structures across with it as
docker stands? Does it work?


Reply to this email directly or view it on GitHub
#6 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants