Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Recsync server crashes when CF server is down #11

Open
asoderq opened this issue Aug 2, 2016 · 10 comments
Open

BUG: Recsync server crashes when CF server is down #11

asoderq opened this issue Aug 2, 2016 · 10 comments
Assignees
Labels

Comments

@asoderq
Copy link

asoderq commented Aug 2, 2016

I am running the ChannelFinder with Glassfish. I start both glassfish and recsync-server with systemd, on the same machine.
The recsync-server service is configured to start after the glassfish service. However at this point the ChannelFinder service does not seem to be up, causing a crash in the recsync-server. It would be nice if this was handled and that the recsync-server tries to connect again a while later.

It works fine after restarting recsync-server service .

recsync-server-log.txt

@mdavidsaver mdavidsaver added the bug label Aug 2, 2016
@mdavidsaver mdavidsaver self-assigned this Aug 2, 2016
@mdavidsaver
Copy link
Collaborator

Some more robustness would certainly be desirable.

Right now the handling of exceptions from plugins during commit() is strict. As recceiver works with deltas, dropping a single delta would leave a plugin out of sync, which would defeat the whole purpose.

Through the use of Deferred() a plugin's commit() is allowed take an arbitrarily long time to complete (although it should not do so by blocking). This should allow some delay and retry mechanism to be added to cfstore.

@shroffk fyi

@asoderq
Copy link
Author

asoderq commented Aug 2, 2016

Also worth mentioning is that it crashes in a different way when http server is not even up yet, i.e. when glassfish is not even running.

@shroffk
Copy link
Collaborator

shroffk commented Aug 2, 2016

Just to clarify, the recsync server removes the cfstore support - the twistd server is still running right.

I did not know about the Deffered() method - I had considered a mechanism in which the client would keep trying to create a connection with an exponential backoff, but was thwarted by my lack of knowledge of multi-threaded programming in python. I guess this feature would be a good excuse to finally learn that.

@mdavidsaver
Copy link
Collaborator

I did not know about the Deffered() method

In this case the extent of the knowledge necessary is to wrapper with deferToThread() so that blocking calls are made on a worker thread.

http://twistedmatrix.com/documents/current/api/twisted.internet.threads.html#deferToThread

@shroffk
Copy link
Collaborator

shroffk commented Aug 9, 2016

So the 3 scenarios we want to handle

  • ChannelFinder service cannot be connected to initially.
    CFStore should continuously try to reestablish connection with the channelfinder service. The connection attempts can be made with an exponential backoff and the commit messages can be delayed.
  • ChannelFinder service connection is lost in between updates.
    CFStore will try to reestablish connection, clean up any incomplete updates, and then complete the pending incoming commit messages.

@mskinner5278
Copy link
Contributor

  • Two IOCs provide the same channel, or one IOC restarts quickly and has not disconnected from its channels yet. Only one IOC can be named in the channel's properties. Cf-store can keep track of the collisions.
  • an IOC disconnects from recsync. All channels associated with the IOC should be marked as inactive. If another IOC has any of the channels it should update those channels with the other IOC's iocname/hostname.

@mskinner5278
Copy link
Contributor

diagram1
Diagram for new cf-store design.

@mskinner5278
Copy link
Contributor

  • Contested property to note when a channel is currently being served by more than a single IOC

@mdavidsaver
Copy link
Collaborator

@alex-soderqvist FYI work is in progress to address this situation.

mskinner5278 added a commit to mskinner5278/recsync that referenced this issue Aug 22, 2016
@shroffk
Copy link
Collaborator

shroffk commented Jun 21, 2018

@mdavidsaver @alex-soderqvist can we close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants