Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blacklisting #18

Open
edwardcapriolo opened this issue Jul 19, 2014 · 2 comments
Open

Add blacklisting #18

edwardcapriolo opened this issue Jul 19, 2014 · 2 comments

Comments

@edwardcapriolo
Copy link
Owner

We recently had a plan that was misconfiguration and could not start for some reason. It correctly kept firing off WorkerStart exception, but that gets somewhat spammy and may turn into a fork bomb. We should track failed start up attempts potentially inside a new path in ZK. If a process attempts to start N times and fails we should blacklist it for some period of time. Also the code that determines job start TeknekDeamon.considerStarting() is getting fairly beefy and a touch hard to test. This would be a nice time to refactor it in a way that would easy testability. @sinemetu1

@edwardcapriolo
Copy link
Owner Author

So to break down the scenario more clearly. We had a plan that was designed to read from kafka and write to cassandra. The setProperties method of the operator was attempting to establish a astyanax connection pool, which was failing because of a misconfiguration. Each scan cycle a worker attempted to start the operator, it failed because of a RuntimeException. These are probably being logged at info, which should be raised to warn. It would be nice if the cluster failed to start a given operator a certain number of times it created an entry in zk that would but the plan to sleep for a while without permanently disabling it. Other workers could notice this in the considerStarting phase and return quickly.

@edwardcapriolo
Copy link
Owner Author

As a first pass I cleaned up the logging and made it more consistent and utilized the proper log levels. This probably might have helped us locate the misconfiguration sooner. edwardcapriolo/teknek-core#9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant