Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add smoke test framework for opensearch bundle #5185

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

zelinh
Copy link
Member

@zelinh zelinh commented Nov 14, 2024

Description

Add smoke test framework for opensearch bundle

The smoke test workflow can be started with both local bundle artifacts or from CI.
Example command:
./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=/test/tar
or
./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.18.0/10479/linux/x64/tar/

The essential functionalities of this smoke test workflow is to deploy a staging bundle cluster by using distribution class form integ test workflow, run some basic API requests and validate its response schema against the opensearch-api-specification.

Issues Resolved

#5164

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 38 lines in your changes missing coverage. Please review.

Project coverage is 91.81%. Comparing base (c56153c) to head (f568b7a).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/run_smoke_test.py 0.00% 16 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runners.py 0.00% 9 Missing ⚠️
...orkflow/smoke_test/smoke_test_runner_opensearch.py 89.83% 6 Missing ⚠️
...rkflow/smoke_test/smoke_test_cluster_opensearch.py 94.11% 4 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runner.py 94.73% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5185      +/-   ##
==========================================
- Coverage   92.12%   91.81%   -0.31%     
==========================================
  Files         197      202       +5     
  Lines        6817     7026     +209     
==========================================
+ Hits         6280     6451     +171     
- Misses        537      575      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zelinh
Copy link
Member Author

zelinh commented Nov 14, 2024

Looking into these check failures.

zelinh and others added 3 commits November 13, 2024 18:36
Signed-off-by: Zelin Hao <[email protected]>
Signed-off-by: Zelin Hao <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
@peterzhuamazon
Copy link
Member

Codecov Report

Attention: Patch coverage is 81.81818% with 38 lines in your changes missing coverage. Please review.

Project coverage is 91.81%. Comparing base (c56153c) to head (f568b7a).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/run_smoke_test.py 0.00% 16 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runners.py 0.00% 9 Missing ⚠️
...orkflow/smoke_test/smoke_test_runner_opensearch.py 89.83% 6 Missing ⚠️
...rkflow/smoke_test/smoke_test_cluster_opensearch.py 94.11% 4 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runner.py 94.73% 3 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

Seems like missing bunch of test cases.

@zelinh
Copy link
Member Author

zelinh commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 38 lines in your changes missing coverage. Please review.

Project coverage is 91.81%. Comparing base (c56153c) to head (f568b7a).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/run_smoke_test.py 0.00% 16 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runners.py 0.00% 9 Missing ⚠️
...orkflow/smoke_test/smoke_test_runner_opensearch.py 89.83% 6 Missing ⚠️
...rkflow/smoke_test/smoke_test_cluster_opensearch.py 94.11% 4 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runner.py 94.73% 3 Missing ⚠️
Additional details and impacted files
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

Seems like missing bunch of test cases.

No sure why it's not updating. I attached respective tests after this comments but it still shows 0%

@peterzhuamazon
Copy link
Member

Pending #5187

@peterzhuamazon
Copy link
Member

Ready for review.

@zelinh
Copy link
Member Author

zelinh commented Nov 15, 2024

Ready for review.

Thanks for help on fixing the container.

@zelinh
Copy link
Member Author

zelinh commented Nov 15, 2024

Tested on this docker image opensearchstaging/ci-runner:ci-runner-al2-opensearch-build-v1 with ./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.18.0/10479/linux/x64/tar/ command.
Confirmed it's running as expected.

2024-11-15 22:41:27 INFO     | opensearch           | / GET                | PASS  |
2024-11-15 22:41:27 INFO     | opensearch           | /_bulk POST          | PASS  |
2024-11-15 22:41:27 INFO     | opensearch           | /_cat/indices GET    | PASS  |
2024-11-15 22:41:27 INFO     | opensearch           | /_cat/plugins GET    | PASS  |

Copy link
Member

@gaiksaya gaiksaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions and suggestions:

  1. How about reusing the current integration test cluster set up instead of installing a cluster for smoke testing? Let me know if the set up is different.
  2. Would we have a new test-report.yml for smoke testing or will the data be appended to current report.yml?

Adding @dblock to take a look from open-api perspective.

Thanks!

@@ -0,0 +1,99 @@
# Copyright OpenSearch Contributors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are duplicating the work of setting up clusters here. Why not just use the integ-test cluster set up instead of creating a new framework for smoke testing? Check the set ups https://github.com/opensearch-project/opensearch-build/tree/main/src/test_workflow/integ_test and see if you can use the same classes to set up a cluster for smoke testing as well. From high level I do not see a difference in the set up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the entire integ-test cluster might be overcomplicate for smoke test as we don't want to accept any customized configuration for smoke tests cluster to keep it lightweight. I also want to make smoke test workflow less dependent on integ tests as a standalone test workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For APIs to work at component level we might need additional-config to be added.
For example: You cannot check system templates plugin if these configs are not enabled.
The current set up for integ-test is light weight as well. Reusing the existing codebase might be the way to go instead of reinventing the wheel for different usecase. Also per dustribution it will go on increasing like the way we have today for integ-test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any example API that I can test for system templates plugin you are referring? Anyway I think if certain component requires specific configuration for smoke test, it's not ready to be onboarded to smoke tests framework. Because we only deploy the cluster once and run all API request checks against it. We are not sure whether that specific configuration would affect others.
We are reusing the distributions classes from integ tests workflow to install and start the cluster. For any future distribution type, it can be easily adopted by this smoke test workflow.

Copy link
Collaborator

@rishabh6788 rishabh6788 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with reusing the cluster setup from integ test, the code is pretty modular to be reused here. To be able to even run an api of system-templates you need those configurations added to opensearch.yml.
Think about adding smoke tests for features that need to be enabled explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we shouldn't allow any customized configurations as it might grow much bigger. We provide generic framework with all default configurations for the cluster by design. If any of the component needs specific configuration to be operational, it shouldn't be added to the smoke tests and can go into the integ tests.

@@ -0,0 +1,30 @@
# yamllint disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about storing these files under manifests folder per version?

manifests/2.18.0/smoke_test/opensearch.yml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, i don't feel it's necessary to keep these files distinct by version. So far the API requests in this file are generic and not version specific. In addition, any of the valid API path can be filtered in the API-spec file for the first version it was introduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new APIs that are being added how do you propose to manage that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version introduced will be specified in the API-spec file and we can check the version and compare it with the current version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not being clear, what I meant was if we introduce new APIs in say 2.19.0 would we be using same spec file that we used in 2.18.0?
Wondering how api-spec repo is managing that today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for new x-version-add (2.9, 2.10, 2.11), how would the spec look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component spec will still be the same. We check on the test workflow side.
The workflow extracts the api path from this component spec, checks current cluster version vs the x-version-add from api-spec and run checks if current version is later than added version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not differentiated on version it will become a problem when 3.0 is being prepared to release and has breaking changes in existing apis.
You don't have to separate at each minor but atleast at 2.x and 3.x. Minor versions are backward compatible so it is safe to assume there will be no breaking changes in the apis.
We don't have to worry about introducing new apis in minor versions as smoke tests are expected to run for versions to be released and not already released versions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay if openapi handles the version check and returns appropriate response.
Say, version 3.0 made a change to _cat/indices api request/response and you run this workflow for both 2.x and 3.x for a new release, how will that be handled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes should also be handled in the openapi spec files.

@zelinh
Copy link
Member Author

zelinh commented Nov 18, 2024

Few questions and suggestions:

  1. How about reusing the current integration test cluster set up instead of installing a cluster for smoke testing? Let me know if the set up is different.
  2. Would we have a new test-report.yml for smoke testing or will the data be appended to current report.yml?

Adding @dblock to take a look from open-api perspective.

Thanks!

  1. The cluster we deployed for smoke tests are intended to be universal and won't accept any customized configuration. Using the integ test cluster might be overcomplicate here and little difficult to maintain in the future.
  2. Generating test-report.yml is a separate step from this workflow. We will want to trigger the report workflow to create test-report.yml. I think we could accommodate that once we onboard the smoke test.

url = "https://localhost:9200/"
logging.info(f"Pinging {url}")
try:
request = requests.get(url, verify=False, auth=("admin", "myStrongPassword123!"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it always be with-security?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend to create an issue for without-security as security is an optional plugin and smoke tests just like integ-test need to run smoothly irrespective of security plugin

logging.info("Initiating smoke tests.")
test_cluster = SmokeTestClusterOpenSearch(self.args, os.path.join(work_dir.path), self.test_recorder)
test_cluster.__start_cluster__(os.path.join(work_dir.path))
for i in range(10):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it iterates 10 times to check if the cluster is up and running.
What happens when the cluster is not up after 10 retries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would hard fail the workflow anyway as the cluster is not available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hard-fail when you can put a check and gracefully terminate the workflow?
Please add a graceful termination check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another check before running the checks so if the cluster is not ready after 10 attempts, the workflow will exit gracefully.

}
# self.openapi = openapi_core.OpenAPI.from_file_path(spec_file)

def validate_request_swagger(self, request: Any) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used yet as currently there are some issue validating requests. I'm thinking we may be able to use it eventually; I'm okay to remove it for now.

logging.info("Request is validated.")

def validate_response_swagger(self, response: Any) -> None:
request = RequestsOpenAPIRequest(response.request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not validate request as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be some issue on the tool about request validation as the string types are different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more details on it? Is there an issue on openapi repo, may be reference that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue here opensearch-project/opensearch-api-specification#656
The implementation of the openapi tool that we using will assume the requestBodies type as JSON array however what we got from the response will return String.
The Array return type specified in the openapi will also be used to generate client code so it's complicated to modify.

Besides, the validation tool seems to use byte string while some of our response will return string, which cause error as well.

def validate_response_swagger(self, response: Any) -> None:
request = RequestsOpenAPIRequest(response.request)
response = RequestsOpenAPIResponse(response)
validate_response(response=response, spec=self.spec_, request=request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try/catch here and return.

header = api_details.get(method).get("header", self.mimetype)
logging.info(f"Parameter is {parameters_data} and type is {type(parameters_data)}")
logging.info(f"header is {header}")
status = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, better have try/catch in the called method and return True/False and take appropriate action?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This status is integer type and used for TestResults class to classify PASS/FAIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the recommendation to use boolean instead of integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestResults class will mark PASS only when status == 0. We may have to use integer here.

def __init__(self, component: str, config: dict, status: int) -> None:
self.component = component
self.config = config
self.status = status
@property
def __test_result(self) -> str:
return "PASS" if self.status == 0 else "FAIL"

logging.info(f"Response is {response.text}")
self.validate_response_swagger(response)
except:
status = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be log the exception for debugging purpose.

except:
status = 1
finally:
test_result = TestResult(component.name, ' '.join([api_requests, method]), status) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen here if the cluster failed to start and no tests were run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another check to avoid these from happening. Thanks.

validate_request(request=request, spec=self.spec_)
logging.info("Request is validated.")

def validate_response_swagger(self, response: Any) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be moved into the base class since they will be same for each component, right?
You are passing the spec, request and response for each component spec that is running or is my understanding is wrong?
Give all the validation is being done by openapi, would this method signature change when it is running for OpenSerch and SQL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class method is used for all OpenSearch components. I only verified it's working here but not sure if it still works if the scope grows so I keep it in this runner_opensearch class.

@zelinh
Copy link
Member Author

zelinh commented Nov 26, 2024

Feel free to take a look on this smoke test workflow. @reta @dblock Thanks!

Comment on lines 9 to 11
- name: opensearch
smoke-test:
test-spec: opensearch.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move to 2.19 now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't started the release cycle for 2.19 so there is no complete distribution bundle for testing purpose yet so I used 2.18 to demonstrate the correct command to start the workflow. I'm OK to change this to 2.19; and we should also update the manifest workflow to attach the smoke tests entry into the test manifest creation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to 2.19 test manifest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed new command ./test.sh smoke-test manifests/2.19.0/opensearch-2.19.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.19.0/10545/linux/x64/tar/ is running correctly.

@reta
Copy link
Contributor

reta commented Nov 27, 2024

Feel free to take a look on this smoke test workflow. @reta @dblock Thanks!

Thanks a lot @zelinh !

else:
time.sleep(10)
try:
if test_cluster.__check_cluster_ready__():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant given you are already checking above if the cluster is ready or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the latest commit.

for i in range(10):
logging.info(f"Attempt {i} of 10 to check cluster.")
if test_cluster.__check_cluster_ready__():
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner approach would be add a method in this file called is_cluster_ready().
Move the test_cluster.__check_cluster_ready__(): logic to that method and return boolean.
Add elif after if test_cluster.__check_cluster_ready__(): to check if i==9 (10th attempt) and if yes, return False.

Catch that bool value and take action accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic here to have a boolean variable so that we don't have to do another redundant cluster readiness check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 In Review
Development

Successfully merging this pull request may close these issues.

5 participants