Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java BQ] Storage API streaming load test #28264

Merged
merged 19 commits into from
Oct 3, 2023
Merged

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Aug 31, 2023

Adding a streaming load test for writing via Storage API sink. Includes exactly-one and at-least-once semantics.

This test is set up to first write rows using batch FILE_LOADS mode to a "source of truth" table. Afterwards, it will write the same rows in streaming mode with Storage API to a second table. Then it will query between these two tables to check that they are identical. There is also the option of providing an existing table with the expected data, in which case the test will skip the first step.

The throughput, length of test (in minutes), and data shape can be changed by adding a new configuration line.

Also including a small addition: we can set an interval for the sink to intentionally crash every now and then. This is intended to test retry resilience. The sink will sometimes throw an exception to simulate a work item failure, and other times will exit the system to simulate a worker failure. Either way, we expect the pipeline to pick up where it left off and deliver data appropriately.

@ahmedabu98 ahmedabu98 mentioned this pull request Sep 13, 2023
@ahmedabu98
Copy link
Contributor Author

R: @johnjcasey

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@johnjcasey
Copy link
Contributor

We should have two different test configurations. One should publish performance metrics to the table, and should be the "healthy" scenario with no deliberate crashes. The other should not do this publication, and should include the intermittent failures.

@github-actions github-actions bot added the build label Sep 19, 2023
@ahmedabu98
Copy link
Contributor Author

Got it, I'll remove the crashSink option from TestProperties so that it's not exposed to the performance testing framework.
I'll still include the crashing logic. We can later create a test that makes use of this same class by just passing `crashStorageApiSinkEverySeconds=" to pipeline options. If this option is set, the test will run the pipelines normally without publishing any metrics.

@ahmedabu98 ahmedabu98 marked this pull request as ready for review September 21, 2023 17:50
@ahmedabu98
Copy link
Contributor Author

R: @johnjcasey
R: @reuvenlax

PTAL

@ahmedabu98
Copy link
Contributor Author

Run PostCommit_Java_Dataflow

@ahmedabu98
Copy link
Contributor Author

Run PostCommit_Java_DataflowV2

@ahmedabu98 ahmedabu98 merged commit 2e05211 into apache:master Oct 3, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants