what am i missing? workers immediately disconnect #2945
Replies: 2 comments 5 replies
-
its a little bit hard to know but a couple things jump right out at me: This is at best meaningless
Every FastHttpUser gets its own connection pool by default, there's probably no advantage & potentially harmful to create a shared one. This looks a little weird, and potentially bad for performance:
Maybe just use indices instead of an iterator? And is your data like really big or something? If that doesnt help, try simplifying your example further and I can have another look. |
Beta Was this translation helpful? Give feedback.
-
as a quick response (i'll add in a bit) i have the iterator randomly start somewhere for each user. i plan to test with 250/500/1250 users with a constant throughput of 4 targeting 1k/2k/5k qps. the data size is about 18m rows for it to burn through. i suppose right now each user has their own 18m though. i had the read of parquet and creation of iterators and distributors originally in the master init, and that script ones fine all within one boxes with the processes 6 command above. however i see 6 processes running, then a 7th, i assume is simply the master node, pegged at 100%. this was the case when i separated workers on to another machine as well; my conclusion was the master node distributing the data to the workers was taking up too much cpu and i needed to put it in to the workers. i don't have the code to create values at random, at least for some of the elements, so that is why i'm using an input data set in the first place. |
Beta Was this translation helpful? Give feedback.
-
i have a script where i load data and hit an api. i was running it with a command like
locust -f fs_test.py --headless --processes 6 -u 40 -r 1 -t 3m
and it worked well up until i wanted to add more users and throughput. it appeared to be single cpu bound. this was the same on another machine, and the same when i made one machine a master and another a worker. so my theory became that the distributor/iterator was the bottleneck, and i need each user to load data themselves.
i have since moved that code to users, but my new script bails right when it is expected to do anything. it basically appears to get to the self.client.post but not actually send it; i see nothing for a user after it and no requests accrue. the output says workers have been lost. strangely also at very high 90% cpu triggers even if i do just a couple users on a big box. my code is below.
if i run master and worker on the same machine, here is the output from my worker. eventually i kill the master.
by the time you can see the sending post immediately results in a 90% error. this is the user's first transaction and i'm on a 16 cpu c7i box.
what if i just run the script as is?
locust -f master.py --headless --processes 1 -u 2 -r 1 -t 3m
and this output.
missing heartbeats, 90%, nothing sent, then just hanging out doing nothing. what am i missing? i feel like this is going to be one line or something, but i really don't have any idea what that one line is.
the script does indeed work if i forego --processes, and just use
locust -f master.py --headless -u 2 -r 1 -t 3
this doesn't seem the way to go though if i'm trying to scale to 1000 users and 5000 requests/sec. i'm also unclear why there is a big 9200, as if it is counting my user's on_start time or something too?
Beta Was this translation helpful? Give feedback.
All reactions