You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the same bench twice does not give stable results, neither locally nor on AWS
But I'm focusing on AWS atm because that feels more important.
network and disk IO does not go above 5MB, so it feels unlikely that we are hitting limits there.
I've noticed that shotover benches will be off by a certain % for the entire bench
While non shotover benches will be based around 0.0% and then go up or down from there before returning to 0.0%
So it looks like shotover is introducing a second kind of noise.
So we should first address the noise without shotover.
cassandra benches have the bencher setup to only use 1 thread.
using 2 threads on a m6a.large instance seems to make noise worse
But maybe using more threads on a larger instance will help?
Maybe I need a better idea of what people have historically found to be stable.
I've now observed that latte is more consistent than windsock windsock seems to consistently drop performance at the 42-44s into the benchmark. (resolved)
Not sure if there were other differences in consistency observed.
I attempted to rewrite windsock's bencher to be more like latte, but it did not help.
Either need to profile bencher to find out whats going on or I need to blindly try copying more logic from latte.
latte has 10x more throughput than shotover in its default configuration.
latte gets ~60000 OPS while windsock gets ~5000 OPS
Increasing lattes thread count drops latte performance
wow, I can get numbers similar to latte by setting --operations-per-second 50000 as soon as I set --operations-per-second 55000 actual OPS drops to 5000.
If I set the bencher OPS to 50000, shotover will meet exactly 50000 OPS.
However if I set the bencher OPS to 55000, depending on the run, shotover may reach 55000, or it may get stuck at a much lower OPS, I've seen as low as 5000.
If I then set OPS to unlimited it pretty much always runs at 5000 OPS
Latte doesnt seem to experience this same cliff, it does seem to max out at about the same point that shotover can reach (60000) on its default configuration.
But if I increase the number of concurrent messages to 500 it can hit 80000
shotover=none gives similar throughputs for latte and windsock but latte still a bit higher.
Here increasing thread count does actually improve latte performance.
Things to try:
profiler the bencher
tokio-console on the bencher
try updatings deps on latte
The text was updated successfully, but these errors were encountered:
However, I think the next step is to add functionality to windsock to allow reusing EC2 instances.
This will eliminate the noise caused by differences in EC2 instances.
I am thinking an API like this:
># Create the resources required to run the benches specified in FILTER and then store the information required to access those instances to disk> cargo windsock --store-cloud-resources-to-disk FILTER
Creating AWS resources: CloudResourcesRequired {
shotover_instance_count: 1,
docker_instance_count: 3,
include_shotover_in_docker_instance: false,
}
># Run the benches once, using the instances created in the previous command.> cargo windsock --use-cloud-resources-from-disk FILTER
Running "kafka,shotover=standard,size=100KB,topology=cluster3"
...
># Run the benches a second time reusing the same instances> cargo windsock --use-cloud-resources-from-disk FILTER
Running "kafka,shotover=standard,size=100KB,topology=cluster3"
...
># Cleanup resources, also remove the resources-to-disk file to ensure that a `--use-cloud-resources-from-disk` command would fail early.> cargo windsock --cleanup-cloud-resources
All AWS throwaway resources have been deleted
After that is implemented it should be easier to evaluate #1360
Running the same bench twice does not give stable results, neither locally nor on AWS
But I'm focusing on AWS atm because that feels more important.
network and disk IO does not go above 5MB, so it feels unlikely that we are hitting limits there.
I've noticed that shotover benches will be off by a certain % for the entire bench
While non shotover benches will be based around 0.0% and then go up or down from there before returning to 0.0%
So it looks like shotover is introducing a second kind of noise.
So we should first address the noise without shotover.
cassandra benches have the bencher setup to only use 1 thread.
using 2 threads on a m6a.large instance seems to make noise worse
But maybe using more threads on a larger instance will help?
Maybe I need a better idea of what people have historically found to be stable.
cassandra,compression=none,driver=scylla,operation=read_i64,protocol=v4,shotover=none,topology=single observations
I've now observed that latte is more consistent than windsock
windsock seems to consistently drop performance at the 42-44s into the benchmark.(resolved)Not sure if there were other differences in consistency observed.
I attempted to rewrite windsock's bencher to be more like latte, but it did not help.
Either need to profile bencher to find out whats going on or I need to blindly try copying more logic from latte.
cassandra,compression=none,driver=scylla,operation=read_i64,protocol=v4,shotover=standard,topology=cluster3 observations
latte has 10x more throughput than shotover in its default configuration.
latte gets ~60000 OPS while windsock gets ~5000 OPS
Increasing lattes thread count drops latte performance
wow, I can get numbers similar to latte by setting
--operations-per-second 50000
as soon as I set--operations-per-second 55000
actual OPS drops to 5000.If I set the bencher OPS to 50000, shotover will meet exactly 50000 OPS.
However if I set the bencher OPS to 55000, depending on the run, shotover may reach 55000, or it may get stuck at a much lower OPS, I've seen as low as 5000.
If I then set OPS to unlimited it pretty much always runs at 5000 OPS
Latte doesnt seem to experience this same cliff, it does seem to max out at about the same point that shotover can reach (60000) on its default configuration.
But if I increase the number of concurrent messages to 500 it can hit 80000
shotover=none gives similar throughputs for latte and windsock but latte still a bit higher.
Here increasing thread count does actually improve latte performance.
Things to try:
The text was updated successfully, but these errors were encountered: