Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snap-kubestate gets recycled on a medium sized cluster #46

Open
romankor opened this issue Oct 17, 2017 · 3 comments
Open

snap-kubestate gets recycled on a medium sized cluster #46

romankor opened this issue Oct 17, 2017 · 3 comments

Comments

@romankor
Copy link

romankor commented Oct 17, 2017

We have an issue that the kubestate pod gets recycled every couple of minutes and and cluster metrics are not being send on a cluster of roughly 30 machines and ~1000 pods.

This is what i see in the log file.

time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=df
time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=iostat
time="2017-10-16T23:20:12Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=load
time="2017-10-16T23:20:12Z" level=warning msg="Ignoring JSON/Yaml file: core.json" _block=start _module=control autodiscoverpath="/opt/snap/tasks_startup"
time="2017-10-16T23:20:14Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:14Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=1 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:24Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:24Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=2 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:34Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:34Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=3 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:44Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:44Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=4 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:54Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:54Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=5 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c

Can not figure a way to configure the max size of the message. Maybe you can shed some light on that ?
Thanks

kubectl exec -it snap-kubestate-deployment-3536784749-k0q9s -- /opt/snap/bin/snaptel task list
ID 					 NAME 						 STATE 		 HIT 	 MISS 	 FAIL 	 CREATED 		 LAST FAILURE
6b6dacb3-8b53-458c-9cba-629ade4e7a65 	 Task-6b6dacb3-8b53-458c-9cba-629ade4e7a65 	 Running 	 6 	 0 	 6 	 4:57PM 10-17-2017 	 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4283236 vs. 4194304)

We have in running on our dev/qa cluster which is much smaller , and it works there without any problem

@daniellee
Copy link
Contributor

There is no way to change the limit unfortunately. We forked snap to get around this and just hacked in a higher limit.

The proper way to fix it would be to send a PR that fixes this issue: intelsdi-x/snap-plugin-lib-go#43

@DanCech
Copy link
Contributor

DanCech commented Oct 19, 2017

There is a PR intelsdi-x/snap-plugin-lib-go#89

@romankor
Copy link
Author

@daniellee Can you point me to the forked repository that you hacked ? Or is it private ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants