Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windsock: get stable results #1274

Open
rukai opened this issue Aug 4, 2023 · 1 comment
Open

windsock: get stable results #1274

rukai opened this issue Aug 4, 2023 · 1 comment

Comments

@rukai
Copy link
Member

rukai commented Aug 4, 2023

Running the same bench twice does not give stable results, neither locally nor on AWS
But I'm focusing on AWS atm because that feels more important.

network and disk IO does not go above 5MB, so it feels unlikely that we are hitting limits there.

I've noticed that shotover benches will be off by a certain % for the entire bench
While non shotover benches will be based around 0.0% and then go up or down from there before returning to 0.0%

So it looks like shotover is introducing a second kind of noise.
So we should first address the noise without shotover.

cassandra benches have the bencher setup to only use 1 thread.
using 2 threads on a m6a.large instance seems to make noise worse
But maybe using more threads on a larger instance will help?

Maybe I need a better idea of what people have historically found to be stable.

cassandra,compression=none,driver=scylla,operation=read_i64,protocol=v4,shotover=none,topology=single observations

I've now observed that latte is more consistent than windsock
windsock seems to consistently drop performance at the 42-44s into the benchmark. (resolved)
Not sure if there were other differences in consistency observed.

I attempted to rewrite windsock's bencher to be more like latte, but it did not help.
Either need to profile bencher to find out whats going on or I need to blindly try copying more logic from latte.

cassandra,compression=none,driver=scylla,operation=read_i64,protocol=v4,shotover=standard,topology=cluster3 observations

latte has 10x more throughput than shotover in its default configuration.
latte gets ~60000 OPS while windsock gets ~5000 OPS
Increasing lattes thread count drops latte performance
wow, I can get numbers similar to latte by setting --operations-per-second 50000 as soon as I set --operations-per-second 55000 actual OPS drops to 5000.

If I set the bencher OPS to 50000, shotover will meet exactly 50000 OPS.
However if I set the bencher OPS to 55000, depending on the run, shotover may reach 55000, or it may get stuck at a much lower OPS, I've seen as low as 5000.
If I then set OPS to unlimited it pretty much always runs at 5000 OPS
Latte doesnt seem to experience this same cliff, it does seem to max out at about the same point that shotover can reach (60000) on its default configuration.
But if I increase the number of concurrent messages to 500 it can hit 80000

shotover=none gives similar throughputs for latte and windsock but latte still a bit higher.
Here increasing thread count does actually improve latte performance.

Things to try:

  • profiler the bencher
  • tokio-console on the bencher
  • try updatings deps on latte
@rukai
Copy link
Member Author

rukai commented Feb 1, 2024

This PR has shown promise: #1360

However, I think the next step is to add functionality to windsock to allow reusing EC2 instances.
This will eliminate the noise caused by differences in EC2 instances.
I am thinking an API like this:

> # Create the resources required to run the benches specified in FILTER and then store the information required to access those instances to disk
> cargo windsock --store-cloud-resources-to-disk FILTER
Creating AWS resources: CloudResourcesRequired {
    shotover_instance_count: 1,
    docker_instance_count: 3,
    include_shotover_in_docker_instance: false,
}
> # Run the benches once, using the instances created in the previous command.
> cargo windsock --use-cloud-resources-from-disk FILTER
Running "kafka,shotover=standard,size=100KB,topology=cluster3"
...
> # Run the benches a second time reusing the same instances
> cargo windsock --use-cloud-resources-from-disk FILTER
Running "kafka,shotover=standard,size=100KB,topology=cluster3"
...
> # Cleanup resources, also remove the resources-to-disk file to ensure that a `--use-cloud-resources-from-disk` command would fail early.
> cargo windsock --cleanup-cloud-resources
All AWS throwaway resources have been deleted

After that is implemented it should be easier to evaluate #1360

@rukai rukai added bug Something isn't working Performance and removed bug Something isn't working labels May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant