Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred when redeploying OpneSearch. #668

Open
shimoune opened this issue Aug 2, 2024 · 4 comments
Open

An error occurred when redeploying OpneSearch. #668

shimoune opened this issue Aug 2, 2024 · 4 comments

Comments

@shimoune
Copy link

shimoune commented Aug 2, 2024

The following error occurred when redeploying OpneSearch.
Please let me know how to resolve it.

【Error Description】
INFO Loading Content into OpenSearch
Warning: Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
ERROR The OpenSearch REST endpoint has NOT become accessible in the expected time; exiting.
ERROR Review the OpenSearch pod's events and log to identify the issue and resolve it before trying again.
ERROR Exiting script [deploy_logging.sh] due to an error executing the command [logging/bin/deploy_opensearch_content.sh].

INFO User directory: /*****************************
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Warning: Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Warning: Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
INFO Kubernetes client version: v1.27.12-eks-aaaaaa
INFO Kubernetes server version: v1.28.9-eks-bbbbbb

【Procedure】
The procedure for redeploying OpenSearch is as follows.
Note that we are using OpneSearch version 1.3.5.

・remove the log-monitoring components
Set the USER_DIR environment variable
Execution of logging/bin/remove_logging.sh

・deploy the log-monitoring components
Set the USER_DIR environment variable
Execution of logging/bin/deploy_logging.sh

@gsmith-sas
Copy link
Member

@shimoune You mentioned that you are deploying OpenSearch 1.3.5. We moved to OpenSearch 2.x with our 1.2.10 release (14FEB2023), so it's been quite a while since we deployed that version. Is there a specific reason you are trying to deploy with OpenSearch 1.3.5? What version of our project (branch or tag) are you using? We generally recommend users deploy from our "stable" branch and the code on that branch is geared towards OpenSearch 2.x. We also recommend users upgrade once or twice a year so they can benefit from new features and security fixes.

As the ERROR message suggests, the first place to look when diagnosing this problem would be the OpenSearch pods. Are the OpenSearch pods-up-and running? If not, can you determine why they are not up? They could be down for a wide range of reasons: insufficient resources (CPU, Memory, etc.), they could be waiting for the PVCs to be ready or they may be unable to access the container image. If the pods are up, review the pod logs and see if there are any helpful messages (WARNING, ERRORS, etc.) that might explain why OpenSearch isn't ready.

But my recommendation would be to move to the newest version of our project (i.e. deploy from the stable branch) if you can. I would also encourage you to review what you have included in your USER_DIR directories. There are often changes, sometime significant ones, in the configuration settings defined in those files from release. I would expect that to be especially true if you have not upgraded since we deployed OpenSearch 1.3.5. For many users, our default values are acceptable/approprate and there is only a minimal amount of customization that needs to be provided in the USER_DIR files.

@shimoune
Copy link
Author

shimoune commented Aug 6, 2024

@gsmith-sas Thank you for your response.

I am separately considering migrating to the latest version of OpenSearch.
Apart from that, I would like to know the cause of this issue.
The error occurred during the redeployment of OpenSearch by executing "logging/bin/deploy_logging.sh" after removing OpenSearch by executing "logging/bin/remove_logging.sh".

After checking the error message, I found that the error occurred when executing "logging/bin/deploy_opensearch_content.sh" in "logging/bin/deploy_logging.sh".
In addition, I recognize that "logging/bin/deploy_opensearch_content.sh" outputs the error message on lines 66~70.


if [ "$esready" != "TRUE" ]; then
log_error "The OpenSearch REST endpoint has NOT become accessible in the expected time; exiting."
log_error "Review the OpenSearch pod's events and log to identify the issue and resolve it before trying again."
exit 1
fi


@shimoune
Copy link
Author

shimoune commented Aug 6, 2024

@gsmith-sas Please allow me to ask an additional question.
OpenSearch is not available, but there is a failure in the Viya environment and I would like to investigate the logs.
OpenSearch deployment has failed in the middle, so fluent-bit has not been deployed.
Will the logs of the Viya environment accumulate in this state?
If the logs of the Viya environment are accumulated, could you please tell me where the logs are accumulated?

@gsmith-sas
Copy link
Member

@shimoune I apologize for the delayed response but I was out of the office.

  • I suspect the deployment problem is because the version of the OpenSearch Helm chart used in the current (and more recent) versions of this project are designed for OpenSearch 2.x and are not compatible with the OpenSearch 1.x version you were trying to deploy. You may need to fall back to an earlier version of this project if you wish to continue deploying OpenSearch 1.x. I believe version 1.2.9 of this project was the last version that deployed OpenSearch 1.x. To use that version of this project, use the command git checkout 1.2.9.
  • I notice that OpenSearch released OpenSearch 1.3.18 in July. If you must stay on the 1.3.x branch of OpenSearch, you probably should try to deploy that version or something newer than 1.3.5. This will ensure you have the latest bug fixes and security updates. However, we have not tested this project with any of the newer versions on the OpenSearch 1.3.x branch since we moved to OpenSearch 2.x branch 18 months ago. And, as I said in my initial response, we would strongly encourage you to move a newer version of this project and OpenSearch 2.x.
  • As you have seen in the code, the specific messages you are seeing indicates that OpenSearch was not ready, i.e. was not accepting incoming API calls, when the deploy_opensearch_content.sh script was executing. I suspect that the OpenSearch pods were not up-and-running (due to the Helm chart problem mentioned above) when this script ran. You could run a command like kubectl -n logging get pods (from another terminal session) to see which pods were running while the deployment script is running. If the OpenSearch pods are running, use a command like kubectl logs -n logging v4m-search-0 to review the pod logs.
  • The Fluent Bit pods handle the log collection process, i.e. the gather the log messages from all of the pods, and send the collected logs on to OpenSearch (which handles the storage and search). Therefore, unfortunately, if the Fluent Bit pods have not been deployed, none of the log messages have been collected/captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants