-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fleet-server "failed to fetch elasticsearch version" - ECK install on OpenShift isn't working #8111
Comments
I think this issue is more appropriate for the ECK repo rather than the Elasticsearch repo, so I'll move this there. Let me know if there's an underlying issue with Elasticsearch here. |
I didn't manage to reproduce the problem using the provided Elasticsearch and Kibana manifests and the following versions:
I would first try to understand the connectivity issue between Kibana and Elasticsearch. Could you check that the ES cluster is healthy, all the Pods are running and ready and check if there is anything suspicious in the ES logs? FWIW here is the apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: fleet-server-sample
namespace: elastic
spec:
version: 8.15.2
kibanaRef:
name: kibana-sample
elasticsearchRefs:
- name: elasticsearch-sample
mode: fleet
fleetServerEnabled: true
policyID: eck-fleet-server
deployment:
replicas: 1
podTemplate:
spec:
serviceAccountName: fleet-server
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: agent
securityContext:
privileged: true With the following command to add the service account to the privileged SCC: oc adm policy add-scc-to-user privileged -z fleet-server -n elastic |
Hi @barkbay, Thanks for your reply. I was basically using the same fleet-server definition except the privileged security context. I've now tried with
kibana pod error logs:
either of these two logs seems to be the root-cause for this error. is it possible that the first log is causing "failed to get elasticsearch version" error?
elasticsearch error logs:
for the first log, according to the post https://discuss.elastic.co/t/not-all-primary-shards-of-geoip-databases-index-are-active/324401 , it appears this issue should go away on its own but it doesn't. other posts suggest it may have to do with limited resources on the servers but OpenShift servers have more than plenty of available cpu and memory to be used. Finally, latest fleet server error logs:
many other fleet-server errors appear to have gone but "failed to get elasticsearch version" and similar seem to be persistent. fleetserver pod is still in crashloopbackoff state. elasticsearch pod is running and elasticsearch service exists so not sure why fleet-server cannot reach elasticsearch cluster. This feels more ECK issue than OpenShift/Infrastructure issue as I don't have this problem with any other apps. Can you please nudge me as to what could be the actual problem here? |
I have the same problem, but running ECK on Rancher. I basically followed https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stack-helm-chart.html#k8s-install-fleet-agent-elasticsearch-kibana-helm. Can't figure out where the problem with the connection between fleet-server and and elasticsearch is. |
@manas-suleman @barkbay @michbeck100 I'm facing the same problem. Did you guys manage to find a solution? |
Could you get the shards status using the following API calls:
Also you cluster is |
Elasticsearch Version
Version: 8.15.2, Build: docker/98adf7bf6bb69b66ab95b761c9e5aadb0bb059a3/2024-09-19T10:06:03.564235954Z, JVM: 22.0.1
Installed Plugins
No response
Java Version
bundled
OS Version
OpenShift BareMetal
Problem Description
I have deployed ECK on OpenShift baremetal servers for POC. While I can get kibana dashboard, I cannot get fleet-server to start and work. I'm using default configuration (from these documentations https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-openshift-deploy-the-operator.html and https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-elastic-agent-fleet-quickstart.html) for the most part with little modifications where needed.
these are my manifests:
Agent state:
oc get agents
oc describe agent fleet-server-sample
fleet-server pod error logs (which is in CrashLoopBackoff):
From the logs it appears that fleet-server pod is looking for elasticsearch cluster at localhost instead of sending requests to elasticsearch service. There are other errors as well but I think this needs to be resolved first.
Errors in kibana pod:
Steps to Reproduce
Deploy ECK cluster using manifests mentioned above. Which are default for the most part with some changes.
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: