You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently had a plan that was misconfiguration and could not start for some reason. It correctly kept firing off WorkerStart exception, but that gets somewhat spammy and may turn into a fork bomb. We should track failed start up attempts potentially inside a new path in ZK. If a process attempts to start N times and fails we should blacklist it for some period of time. Also the code that determines job start TeknekDeamon.considerStarting() is getting fairly beefy and a touch hard to test. This would be a nice time to refactor it in a way that would easy testability. @sinemetu1
The text was updated successfully, but these errors were encountered:
So to break down the scenario more clearly. We had a plan that was designed to read from kafka and write to cassandra. The setProperties method of the operator was attempting to establish a astyanax connection pool, which was failing because of a misconfiguration. Each scan cycle a worker attempted to start the operator, it failed because of a RuntimeException. These are probably being logged at info, which should be raised to warn. It would be nice if the cluster failed to start a given operator a certain number of times it created an entry in zk that would but the plan to sleep for a while without permanently disabling it. Other workers could notice this in the considerStarting phase and return quickly.
As a first pass I cleaned up the logging and made it more consistent and utilized the proper log levels. This probably might have helped us locate the misconfiguration sooner. edwardcapriolo/teknek-core#9
We recently had a plan that was misconfiguration and could not start for some reason. It correctly kept firing off WorkerStart exception, but that gets somewhat spammy and may turn into a fork bomb. We should track failed start up attempts potentially inside a new path in ZK. If a process attempts to start N times and fails we should blacklist it for some period of time. Also the code that determines job start TeknekDeamon.considerStarting() is getting fairly beefy and a touch hard to test. This would be a nice time to refactor it in a way that would easy testability. @sinemetu1
The text was updated successfully, but these errors were encountered: