test: performance testing for sdk core #865

OmarAlJarrah · 2024-11-26T05:00:03Z

Situation

The sdk-core package is well-covered by unit tests that validate its functionality. However, we currently lack performance tests/benchmarks to evaluate and ensure that our SDKs meet modern performance standards, in addition to being functionally correct.

Task

Benchmarking Tool

Design and implement a benchmarking tool focused on validating and measuring the SDK's performance against realistic workloads.
The tool should expose an API that is easy to extend & use.

Real-World Scenario Simulation

Leverage a mock server to simulate real-world scenarios for the SDK.
Utilize Specmatic stubs to mimic API services, ensuring compatibility with expected service behaviors.

Focus Areas

Prioritize scenarios that involve:
- I/O-Intensive Operations: Simulate high-throughput data exchanges.
- CPU-Intensive Operations: Test computationally demanding tasks.

Comprehensive Reporting

Ensure the tool generates detailed performance reports.

Load and Stress Testing

Implement load-testing to simulate real-life usage patterns and assess how the SDK performs under typical and peak conditions.
Conduct stress tests to push the SDK to its operational limits, observing behavior such as degradation, failure recovery, and scalability.

Action

We have engineered a solution for benchmarking the sdk-core, designed to perform comprehensive load and stress testing while closely replicating real-world environments to ensure accurate and meaningful performance insights.

Initial Setup

We have created a new module in the repository root named performance-test. It is worth mentioning that new GitHub actions will be proposed to consume the newly proposed module and perform benchmarks with the sdk-core package as target.

Mock Server

To mimic real-world scenarios, we will be depending on a mock service using Specmatic. Specmatic will consume the openapi.yaml spec file. The mock server will be setup in future GitHub actions.

To run the mock server locally, follow the following steps:

Download latest Specmatic JAR.

curl -s https://api.github.com/repos/znsio/specmatic/releases/latest \
          | grep "browser_download_url.*specmatic.jar" \
          | cut -d : -f 2,3 \
          | tr -d \" \
          | xargs curl -L -o specmatic.jar
          
specmatic="$(pwd)/specmatic.jar"

From repository root, change directory into main resources of the performance-test module.

cd performance-test/src/main/resources

Run the Specmatic executable with the openapi.yaml file as contract and the specmatic-data directory as request-response data, optionally on port 8080.

java -jar $specmatc stub openapi.yaml --data specmatic-data --port 8080

The specmatic-data directory content will define the responses Specmatic mock server will return. In this case, we have defined all JSON responses to have a length of ~1e7 characters. Any files returned by Spematic mock server (in case of download operation) will be an image of size nearly 0.5MB.

Sample SDK

We introduced a custom OpenAPI spec file which includes a few endpoints and operations based on the type of operations we currently support in our SDK core. The spec includes the following endpoints:

GET /text: Get operation endpoint which returns a text response with application/json as the content type.
POST /text: POST operation endpoint which accepts a text body (Message) and returns a text response with application/json as the content type.
GET /files: Get operation endpoint which can be used to download a file with image/png as the content type.

The spec file was also used to generate a sample, tweaked SDK, which is then used in the benchmark process. The SDK has multiple clients based on their authentication algorithm, along with their different configurations and builders. Currently, we have 2 SDK clients:

BasicAuthBasedClient
EanAuthBasedClient

Benchmarks

We have evaluated established performance testing frameworks such as Apache Jmeter and Gatling. While these frameworks offer robust features, they also come with inherent limitations and complexities that did not fully align with our specific requirements. Consequently, we opted to adopt a streamlined, custom in-house and simple approach tailored to our current needs.

Benchmark API

We have established a standardized contract in the form of an interface, PerformanceTestClient, for SDK clients to be benchmarked.

interface PerformanceTestClient {
    fun execute(operation: SomeOperation): Response<Something>
    fun executeAsync(operation: SomeOperation): CompletableFuture<Response<Something>>
}

class BasicAuthBasedClient(
    clientConfiguration: XapClientConfiguration
) : PerformanceTestClient, BaseXapClient(
    // Code...
) 

class EanAuthBasedClient(
    clientConfiguration: RapidClientConfiguration
) : PerformanceTestClient, BaseRapidClient(
    // Code...
)

The contract is closely aligned with and centered around the openapi.yaml specification file, ensuring consistency, acting as a bridge, facilitating seamless integration and interaction between the SDK clients and the benchmark API.

The API also defines contracts which help extending benchmark targets and differentiate between the sync and async benchmarking models such as:

Benchmarked: A representation of a scenario or operation to be benchmarked synchronously.

abstract class Benchmarked <T> (
    // stuff...
) {
    abstract fun benchmark(): Response<T>
}

AsyncBenchmarked: A representation of a scenario or operation to be benchmarked asynchronously.

abstract class AsyncBenchmarked <T> (
    // stuff...
) {
    abstract fun benchmarkAsync(): CompletableFuture<Response<T>>
}

Both the aforementioned contracts also contain metadata about each benchmark individual unit, such as:

name: Name of the benchmark.
description: Optional description if needed.
repeat: How many times to run the executable/target and collect execution results.
timeout and timeoutUnit: Both define amount of time to fail the benchmark in case the execution time exceeds the defined timeout value.

Benchmark Engine

We have developed a robust and extensible benchmarking engine designed to evaluate SDK operations and scenarios based on the pre-defined sdk-core contracts. The engine is both flexible and simple, supporting both synchronous (sync) and asynchronous (async) benchmarking models. It is capable of managing and tracking a virtually unlimited number of requests, regardless of the concurrency model, while ensuring precise and accurate performance metrics.

fun <T> benchmarkAsync(
        name: String,
        block: () -> CompletableFuture<Response<T>>,
        repeat: Int,
        timeout: Long,
        timeoutUnit: TimeUnit,
        description: String = "---",
    ): Benchmark

Each successful benchmark operation will produce a report, which in its simplest form, includes information about the benchmark final results such as:

name: Name of the benchmark.
description: Optional description.
startTime - endTime: Zoned data and time of the start and end of the benchmark.
executionsCount: The amount of times the benchmark target executable was executed.
duration: Total consumed time in benchmarking.
averageExecutionTime: Average time consumed per single execution.

Benchmark Reports

At the end of all benchmarks, results are collected and a report is generated in the form of a table. We used picnic to fetch data into a table-like structure. Sample:

┌────────────────────────────────────┬────────────────────────┬────────────────────────────┬────────────────────────────┬──────────────────┬──────────────────────┬────────────────────────────────────────────────┐
│ Benchmark                          │ Average Execution Time │ Start Time                 │ End Time                   │ Total Executions │ Total Execution Time │ Description                                    │
├────────────────────────────────────┼────────────────────────┼────────────────────────────┼────────────────────────────┼──────────────────┼──────────────────────┼────────────────────────────────────────────────┤
│ SyncPostMessageOperationBenchmark  │ 215.9 (ms)             │ 2024-11-26T07:43:09.150943 │ 2024-11-26T07:43:11.310037 │ 10               │ 2159 (ms)            │ Sync Post Message Operation Executed 10 times  │
├────────────────────────────────────┼────────────────────────┼────────────────────────────┼────────────────────────────┼──────────────────┼──────────────────────┼────────────────────────────────────────────────┤
│ SyncFileDownloadOperationBenchmark │ 102.5 (ms)             │ 2024-11-26T07:43:11.310847 │ 2024-11-26T07:43:12.336322 │ 10               │ 1025 (ms)            │ Sync File Download Operation Executed 10 times │
├────────────────────────────────────┼────────────────────────┼────────────────────────────┼────────────────────────────┼──────────────────┼──────────────────────┼────────────────────────────────────────────────┤
│ SyncGetMessageOperationBenchmark   │ 103.1 (ms)             │ 2024-11-26T07:43:12.336362 │ 2024-11-26T07:43:13.367391 │ 10               │ 1031 (ms)            │ Sync Get Message Operation Executed 10 times   │
└────────────────────────────────────┴────────────────────────┴────────────────────────────┴────────────────────────────┴──────────────────┴──────────────────────┴────────────────────────────────────────────────┘

Once the report is rendered, it is persisted to disk.

Command-Line-Interface (CLI)

A CLI is set in-place for ease-of-use, which can trigger the execution of all the benchmarks, producing a report which is written on disk. The CLI can be called as follows:

mvn clean install exec:java -Dexec.args="--output report.txt"

Testing

TBA

Results

We now possess a comprehensive set of utilities to effectively benchmark our sdk-core, enabling us to accurately simulate and evaluate performance under real-world scenarios with precision and reliability.

Notes

We are not yet supporting file uploads, multipart requests and/or other HTTP related operations, for that we only benchmark what our sdk-core supports at the moment.
A futuristic idea that was mentioned by @mohnoor94, which basically suggest that we run performance tests/benchmarks per pull-request, and add a comment on the pull-request with the final report.
This was unplanned for, but maybe we might want to include more information in the final report, such as:
- Threads and memory metrics before, while and after each benchmark execution (Threads and Memory Dumps).
- Resources available (CPU cores, memory, ...etc).
We might want to render the report into other formats, such as Markdown, JSON, YAML or others.

test: performance testing for sdk core

ba99751

OmarAlJarrah requested a review from a team as a code owner November 26, 2024 05:00

OmarAlJarrah and others added 2 commits November 26, 2024 15:01

test: performance testing for sdk core

4dfe4d2

Merge branch 'main' into OmarAlJarrah/add-sdk-core-performance-tests

5bc7f02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: performance testing for sdk core #865

test: performance testing for sdk core #865

OmarAlJarrah commented Nov 26, 2024 •

edited

Loading

test: performance testing for sdk core #865

Are you sure you want to change the base?

test: performance testing for sdk core #865

Conversation

OmarAlJarrah commented Nov 26, 2024 • edited Loading

Situation

Task

Benchmarking Tool

Real-World Scenario Simulation

Focus Areas

Comprehensive Reporting

Load and Stress Testing

Action

Initial Setup

Mock Server

Sample SDK

Benchmarks

Benchmark API

Benchmark Engine

Benchmark Reports

Command-Line-Interface (CLI)

Testing

Results

Notes

OmarAlJarrah commented Nov 26, 2024 •

edited

Loading