new operational mode - percent with CPU #3351

spacetourist · 2024-03-28T22:05:29Z

Summary
Implements a further load_balancer module strategy for distributing calls more evenly when dealing with high request volumes.

Details/Solution
The change caches the heartbeat data into the module and performs the following calculation for each request:
( 100 - ( 100 * current_sessions + sessions_since_last_heartbeat / max_sessions ) ) * CPU Idle factor

This disregards the dialog profile counts and allocates simply based on the last known call stats and any changes that have been made locally. The intention is to distribute calls to the last known least loaded server whilst not overloading a single system given the latency of the heartbeat data. AFAIK the minimum on both sides is 1s. For a system handling hundreds of calls per second to shared destinations this aims to balance the individual routing decisions more evenly.

Compatibility
This should not impact the other module features.

Closing issues
Closes #3297

liviuchircu

@spacetourist, I finally took some time to review both the initial issue, as well as this PR. Thanks for the rich description, as well as the data tables, they were really useful in understanding what's wrong here.

As I see it, the bigger issue here is around the "r" flag, freeswitch-enabled destinations set aside. There is something intuitively off about the 100 - 100 / MAX * 100 formula which I cannot find a solid explanation for (we should plot it!), but it seems like any relative disparity of the inputs seems to be softened or normalized by this formula, effectively bringing the outputs a lot closer, relative to each other. For example, putting all your data in a single table:

Final	Float	Transf-2	Max-Load	Transf-1	Sessions
91	91.11	100 - 100 / 1125 * 100	1125	.75 * (2500 - (1100 - 100))	1100
91	91.66	100 - 100 / 1200 * 100	1200	.75 * (2500 - (1000 - 100))	1000
92	92.15	100 - 100 / 1275 * 100	1275	.75 * (2500 - (900 - 100))	900
92	92.59	100 - 100 / 1350 * 100	1350	.75 * (2500 - (800 - 100))	800
92	92.98	100 - 100 / 1425 * 100	1425	.75 * (2500 - (700 - 100))	700

Like, how on earth did we obtain a 1% difference between least-loaded/most-loaded in the output, coming from a 57% difference between least-loaded/most-loaded in the inputs? The reduction was done in two steps: from 57% -> 26.6% -> 1%. What is intrinsically wrong with this formula and can we mathematically change it in order to obtain better weights?

Now, you also sensed this problem based on your empirical evidence (why are my calls going to the more loaded FS?!) and the PERCENT_WITH_CPU approach is a two-fold improvement:

first, you change the computation from 2 steps into 1 step. There is no more of that "pseudo max_load" intermediary value, which helps preserve more of the original ratios.
secondly, you perform the .cpu_idle multiplication in the last step, after the 100 - 100 * X formula, which will also help to reflect more of the FS instance data into the final weight.

While I am 100% for merging this new "c" (CPU) flag / exclusive with "r" right away, I will leave you a fun question about the "r" mode in general and whether we should enable it in the first place: If I give you two FS instances, one running on a raspberry PI at 1/2 calls (50%) and another one on a super-server at 500/1000 calls (50%), would you "relatively" balance a call to any of them? Or is the situation not so relative, after all? :)

modules/load_balancer/load_balancer.c

spacetourist · 2024-04-03T09:38:21Z

Morning @liviuchircu - I'm now happy with the state of this PR with no immediate plans for further changes. Having said that, I have the following ideas for future improvements to this module:

Synchronising event_heartbeat_interval/fetch_freeswitch_stats (details above, not sure how that could be achieved across the separate modules)
CPU weighting - at present the CPU idle factor is heavily weighted, this is a snapshot provided by FS HB and may not be representative of the actual system load throughout the interval. A module parameter of value 0-100 (100=default) which reduces the weighting of this factor could provide a useful tunable
Presenting the sessions since last heartbeat in MI lb_list function

There are also a number of algorithm changes which might be worth a look in time, in particular addressing issues such as your fun question from above. To accomodate systems of wildly different capacities we ought to be looking at the impact of the allocation as well as the state preallocation. Using your example, the Pi may have the lowest pre-allocation load but with that call occupying a whopping 50% of the overall capacity we'd be able to reevaluate the decision. Obviously anyone actually running that wild mix of instance sizes would be asking for trouble anyway but there is clearly a lot more that can be done to increase the module flexibility.

At this time I'm reasonably confident that these changes will solve for my problem so I'm keen to get your feedback, cheers

liviuchircu

Hello @spacetourist,

We've discussed this internally and while the new "Integrated Estimation" mode seems to solve your concrete problem, the module will still lack flexibility when it comes to different ratios of sessions to max-sessions. For example, with a sufficiently high max_sessions (e.g. in the order of thousands), and current_sessions in the order of hundreds, the new mode's formula will still output relatively similar weights, without giving the user any control to change it.

So we backtracked a bit and concluded that the problem can be alleviated while the max_load is being computed, during lb_update_max_loads(). In order to give full control to the user over their FreeSWITCH sessions scaling (some want 100 max sessions, others 500 or even 2500!), we could add a new freeswitch_sessions_exponent (default: 1, no change), that would be applied as a power to the current Sessions value. Here is how such an exponent would modify the output max_load:

The picture shows 4 possible exponent settings: 1, 1.01, 1.05 and 1.1, which already create a dramatic change in the relative difference between the output max_load values.

In your case, probably a "1.1" value of the modparam would suffice, and it would fix your scenario with typical Max-Sessions values of 2500. The exponentiation would be added to this code section:

            if (psz < dst->fs_sock->stats.max_sess) {
                dst->rmap[ri].max_load =
                (dst->fs_sock->stats.id_cpu / (float)100) *
                    (dst->fs_sock->stats.max_sess -
                     (powf(dst->fs_sock->stats.sess, new_modparam) - psz));

This is just a working example as we get closer to the final solution, but the idea remains: the relative mode should be made to work as-is, rather than inventing new, obscure flags. And there is no need to leak all kinds of random information (CPU load? current_sessions? etc.) into get_dst_load(), which is ultimately meant to provide a couple algorithms of interpreting the max_load of a destination, nothing more.

spacetourist · 2024-04-15T15:55:57Z

Hi @liviuchircu - that's an interesting idea but I'm not sure it solves some key aspects of my issue.

The main issue is the sheer volume of calls I'm dealing with - I exceed 200/cps on a couple of instances so must take that into account between heartbeats to avoid allocating all of those calls to the same instance until the next execution of lb_update_max_loads()

I also have several OpenSIPs instances feeding into the same bank of FreeSWITCH servers meaning that the profile size is not really relevant to the calculation. In some ways having the max load score being a close contest isn't a bad thing here provided the "s" flag is also enabled as calls will be distributed randomly to those instances until the next heartbeat clarifies the real active session counts.

I'll give this some more thought as I agree there are further improvements we could make to:

handle instances of different sizes
modify the impact of the CPU factor (likely similar to your proposal here)
respect FreeSWITCH setting for Session-Per-Sec (also in heartbeat value)

It may be that what I'm looking for falls too far outside of the scope of the module authors intentions to implement something general here but I'm keen to work towards a solution which has the flexibility needed to both be tunable and applicable to the wider community.

spacetourist · 2024-04-16T12:41:00Z

Having given this some more thought I'm actually feeling more confident that the PR is along the right tracks as a general solution. The calculations manage hosts of varying sizes by reflecting the overall percentage of that capacity used and between heartbeats we allocate proportionally to the capacity available on each instance.

Excluding the CPU factor for now, starting with all systems at 10% of max sessions used we'd have an even score and the system would pick one at random (provided we supply the "s" flag):

If we allocate 20 sessions and by chance it adds one to each instance we'd see the first instance score drop far enough to not be included in subsequent operations:

Continuing to deliver based on the score we'd start to see it preferring the larger instances before all instances are again equally scored:

This proceeds nicely as more sessions are allocated:

Assume the heartbeat data is updated and all instances have reached 50% load, at this point we get the same allocation pattern:

This is all quite simple but hinges on having a score re-calculated for each allocation, without that the scores would remain fixed for too long and most likely it will overload the smaller instances.

Am I missing something or does that make sense?

Regarding tunability I think it would be appropriate to expose options for tuning how the CPU factor is implemented - in practical use the CPU running at 20% to 40% utilisation seems very normal and I imagine the figures in the heartbeats vary quite a bit according the time they were captured, personally I don't see the need to always factor that in as it will skew the score quite dramatically. I suggest we might wish to provide params for the following:

a setting to eliminate the CPU factor from these calculations
a threshold value over which we will either include it as a factor, i.e. if CPU is >75% util then kick in to reduce the change of that instance being selected

* ✨ new operational mode - percent with CPU * 🐛 syntax errors * 🐛 cherry pick duplicate * 🐛 prevent divide by zero * 📝 improve log message * 📝 improve log message * 🐛 incorrect type * 📝 improve log message * 📝 improve log message * 📝 improve log message * 📝 improve log message * Dev cpufactor (#1) * new operational mode - percent with CPU * 🐛 fix print of str type * remove comment * modify character choice for new flag * document new integrated estimation flag usage * ✨ create CPU factor option flag * document new integrated estimation flag usage * 📝 CPU factor is optional, improve description * capture docs in template rather than README directly

spacetourist · 2024-08-02T13:32:11Z

I wanted to follow up on this PR now that I have finally moved forward and have this running in production. My results will of course be anecdotal however I am pleased to report that the modifications have had the desired effect and even with four distinct OpenSIPs instances (no data sharing) in front of a bank of 13 media servers I'm seeing call loads balance to within 50 calls (4k-8k concurrent).

I'm only currently pushing ~25% of the traffic through the load balancer so I expect the lines to converge further in the coming weeks. The smoothing effect of having even this proportion of my load allocated is really beneficial and outside of peak moments I see very flat load across the media servers.

Thanks for all the assistance in getting this patch working properly, I'll provide another update and some charts once I have all the traffic using this system. 🚀

spacetourist · 2024-08-12T14:52:13Z

As mentioned above here are the charts that illustrate the performance of this revised algorithm. Charts are before and after the load balancer changes on the same list of 13 servers:

Before (using manual allocations by service relationship):

After (using the load balancer in the new mode):

Note that the extra blue line on the second chart is simply an external tool allocating outbound calls to one specific instance on top of the LB call load.

new operational mode - percent with CPU

4575762

spacetourist mentioned this pull request Mar 28, 2024

[BUG] load_balancer algorithm weaknesses #3297

Closed

🐛 fix print of str type

d3ee028

bogdan-iancu assigned liviuchircu Apr 1, 2024

liviuchircu requested changes Apr 1, 2024

View reviewed changes

modules/load_balancer/load_balancer.c Show resolved Hide resolved

liviuchircu added the improvement label Apr 1, 2024

spacetourist added 3 commits April 2, 2024 10:24

remove comment

6b3b9e9

modify character choice for new flag

8528357

document new integrated estimation flag usage

2026185

spacetourist requested a review from liviuchircu April 3, 2024 10:02

liviuchircu requested changes Apr 15, 2024

View reviewed changes

spacetourist and others added 2 commits June 10, 2024 11:35

fix indentation

adb8c1c

spacetourist mentioned this pull request Sep 12, 2024

[BUG] segfault in freeswitch/load_balancer module after connection loss #3468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new operational mode - percent with CPU #3351

new operational mode - percent with CPU #3351

spacetourist commented Mar 28, 2024

liviuchircu left a comment •

edited

Loading

spacetourist commented Apr 3, 2024

liviuchircu left a comment •

edited

Loading

spacetourist commented Apr 15, 2024

spacetourist commented Apr 16, 2024

spacetourist commented Aug 2, 2024

spacetourist commented Aug 12, 2024

new operational mode - percent with CPU #3351

Are you sure you want to change the base?

new operational mode - percent with CPU #3351

Conversation

spacetourist commented Mar 28, 2024

liviuchircu left a comment • edited Loading

Choose a reason for hiding this comment

spacetourist commented Apr 3, 2024

liviuchircu left a comment • edited Loading

Choose a reason for hiding this comment

spacetourist commented Apr 15, 2024

spacetourist commented Apr 16, 2024

spacetourist commented Aug 2, 2024

spacetourist commented Aug 12, 2024

liviuchircu left a comment •

edited

Loading

liviuchircu left a comment •

edited

Loading