snap-kubestate gets recycled on a medium sized cluster #46

romankor · 2017-10-17T16:56:29Z

We have an issue that the kubestate pod gets recycled every couple of minutes and and cluster metrics are not being send on a cluster of roughly 30 machines and ~1000 pods.

This is what i see in the log file.

time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=df
time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=iostat
time="2017-10-16T23:20:12Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=load
time="2017-10-16T23:20:12Z" level=warning msg="Ignoring JSON/Yaml file: core.json" _block=start _module=control autodiscoverpath="/opt/snap/tasks_startup"
time="2017-10-16T23:20:14Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:14Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=1 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:24Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:24Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=2 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:34Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:34Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=3 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:44Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:44Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=4 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:54Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:54Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=5 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c

Can not figure a way to configure the max size of the message. Maybe you can shed some light on that ?
Thanks

kubectl exec -it snap-kubestate-deployment-3536784749-k0q9s -- /opt/snap/bin/snaptel task list
ID 					 NAME 						 STATE 		 HIT 	 MISS 	 FAIL 	 CREATED 		 LAST FAILURE
6b6dacb3-8b53-458c-9cba-629ade4e7a65 	 Task-6b6dacb3-8b53-458c-9cba-629ade4e7a65 	 Running 	 6 	 0 	 6 	 4:57PM 10-17-2017 	 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4283236 vs. 4194304)

We have in running on our dev/qa cluster which is much smaller , and it works there without any problem

The text was updated successfully, but these errors were encountered:

daniellee · 2017-10-19T14:46:02Z

There is no way to change the limit unfortunately. We forked snap to get around this and just hacked in a higher limit.

The proper way to fix it would be to send a PR that fixes this issue: intelsdi-x/snap-plugin-lib-go#43

DanCech · 2017-10-19T14:57:55Z

There is a PR intelsdi-x/snap-plugin-lib-go#89

romankor · 2017-10-19T20:25:18Z

@daniellee Can you point me to the forked repository that you hacked ? Or is it private ?

daniellee mentioned this issue Oct 19, 2017

snap-kubestate gets recycled on a medium sized cluster grafana/snap-plugin-collector-kubestate#8

Closed

woodsaj mentioned this issue Dec 15, 2017

refactor to allow metric collection to be split between multiple tasks grafana/snap-plugin-collector-kubestate#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snap-kubestate gets recycled on a medium sized cluster #46

snap-kubestate gets recycled on a medium sized cluster #46

romankor commented Oct 17, 2017 •

edited

Loading

daniellee commented Oct 19, 2017

DanCech commented Oct 19, 2017

romankor commented Oct 19, 2017

snap-kubestate gets recycled on a medium sized cluster #46

snap-kubestate gets recycled on a medium sized cluster #46

Comments

romankor commented Oct 17, 2017 • edited Loading

daniellee commented Oct 19, 2017

DanCech commented Oct 19, 2017

romankor commented Oct 19, 2017

romankor commented Oct 17, 2017 •

edited

Loading