Skip to content
This repository has been archived by the owner on Mar 25, 2019. It is now read-only.

Add Velum-Salt integration RFC #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add Velum-Salt integration RFC #3

wants to merge 2 commits into from

Conversation

ereslibre
Copy link

@ereslibre ereslibre commented Jun 28, 2018

Rendered

Salt sometimes cannot insert some events into the database because they are too
big to fit in a single transaction. This leads to Velum missing these events
and never reacting to them (never marking nodes as successfully bootstrapped, for
example).

Debugging is also cumbersome because of the big amount of events present in
the database, when most of them are of no interest for debugging purposes.

@inercia
Copy link

inercia commented Jun 28, 2018

I've never been fond of this kind of solutions where we add Velum to the equation, so we end up with more dependencies and more coupling. With your proposal, events would go through haproxy and Velum, and I can imagine how many headaches things like updates and certificates would cause us. I think it would be easier to implement a custom returner in Python that could save all the events we are interested in (it doesn't need to be specific to orchestration), or our own engine (the reactor is an engine).

@ereslibre
Copy link
Author

ereslibre commented Jun 28, 2018

With your proposal, events would go through haproxy and Velum, and I can imagine how many headaches things like updates and certificates would cause us.

I don't understand this sentence.

I think it would be easier to implement a custom returner in Python that could save all the events we are interested in (it doesn't need to be specific to orchestration), or our own engine (the reactor is an engine).

We could write our own engine or write a custom returner. The problem still stands: we save the events (where?), and Velum reads them (how?).

@inercia
Copy link

inercia commented Jun 28, 2018

With your proposal, events would go through haproxy and Velum, and I can imagine how many headaches things like updates and certificates would cause us.

I don't understand this sentence.

I mean that there your proposal increases significantly the number of moving parts in the architecture, and some things like certificates, updates and so are declared enemies of moving parts.

I think it would be easier to implement a custom returner in Python that could save all the events we are interested in (it doesn't need to be specific to orchestration), or our own engine (the reactor is an engine).

We could write our own engine or write a custom returner. The problem still stands: we save the events (where?), and Velum reads them (how?).

The problem always stands, but the solution can be different. In this case, instead of sending the flow of information through 1) the reactor, 2) that is serialized and converted to a HTTP request that 3) goes to haproxy that 4) sends it to a web server that 5) passes it to a Ruby program that 6) deserializes it, 7) serializes it in a different way and 8) persists it in a database, I propose to 1) serialize the right events and 2) save them to the same database (there where). And Velum could continue to read the data in the same way (the how).

@ereslibre
Copy link
Author

I mean that there your proposal increases significantly the number of moving parts in the architecture, and some things like certificates, updates and so are declared enemies of moving parts.

The moving part is already there with the pillar API, and the pillar API is also served using valid certificates. It's not like with this solution we are including a new moving part.

I don't quite get what the difference would be. The admin node is updated first, and with it, all the configurations. I agree there could be a timeframe in which the certificate for Velum could've expired and Salt cannot connect to the API (because it requires a valid certificate), but nothing we cannot solve I'd say.

The problem always stands, but the solution can be different. In this case, instead of sending the flow of information through 1) the reactor, 2) that is serialized and converted to a HTTP request that 3) goes to haproxy that 4) sends it to a web server that 5) passes it to a Ruby program that 6) deserializes it, 7) serializes it in a different way and 8) persists it in a database, I propose to 1) serialize the right events and 2) save them to the same database (there where). And Velum could continue to read the data in the same way (the how).

I have to say that I find interesting the 8 steps vs the 2 steps false sensation of something being incredibly complex and the other solution being incredibly simple.

Different components in a system need to talk to each other using "gates" built for that purpose. A well maintained and documented API even when internal is a good thing to have in my opinion. It brings stability. Shortcuts sometimes lead to bad coupling (like in our case -- two different components writing/reading directly from the same storage -- a relational database).

For me, it's time to leave the times of a shared database between two different components that just need to talk to each other. The same way that we don't trigger events in salt by writing some actions on the database or a text file, but using the salt-api instead should apply also in the other direction.

But of course; this has been a discussion since the pre-1.0 days in which we could never reach an agreement.

@flavio
Copy link
Member

flavio commented Jun 28, 2018

Sorry, maybe it's just me, but I miss how @ereslibre or @inercia proposals would allow us to split big events (the ones that are chocking mysql right now).

Could you elaborate a bit on that?

@ereslibre
Copy link
Author

ereslibre commented Jun 28, 2018

Sorry, maybe it's just me, but I miss how @ereslibre or @inercia proposals would allow us to split big events (the ones that are chocking mysql right now).

Could you elaborate a bit on that?

My idea (and my fault, didn't explicitly specify that on the RFC) was to let salt notify Velum with a minimum amount of information on the HTTP callback (e.g. jid, result and some other data that is important for Velum). In the case of an orchestration, the whole JSON output could be saved as a plain text file (jid.txt) by the salt-master in the FS that we can compress along with the rest of the data in a supportconfig file.

The Velum database would just contain basic information, like all the historical Orchestrations with all their jids, and we would directly read the jid.txt file that we are interested in when debugging an issue.

@inercia
Copy link

inercia commented Jun 28, 2018

Sorry, maybe it's just me, but I miss how @ereslibre or @inercia proposals would allow us to split big events (the ones that are chocking mysql right now).

My idea (and my fault, didn't explicitly specify that on the RFC) was to let salt notify Velum with a minimum amount of information on the HTTP callback (e.g. jid, result and some other data that is important for Velum). In the case of an orchestration, the whole JSON output could be saved as a plain text file (jid.txt) by the salt-master in the FS that we can compress along with the rest of the data in a supportconfig file.

And my proposal would by to just save to the database that minimum amount of information straight from a returner or engine.

Salt sometimes cannot insert some events into the database because they are too
big to fit in a single transaction. This leads to Velum missing these events
and never reacting to them (marking nodes as successfully bootstrapped, for
example).

Debugging is also cumbersome because of the big amount of events present in
the database, when most of them are of no interest for debugging purposes.
@flavio
Copy link
Member

flavio commented Aug 9, 2018

Is this RFC still relevant after we announced the new containerized and self-hosted architecture? Maybe we should just drop this RFC given we will significantly reduce the amount of salt used by us.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants