Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new operational mode - percent with CPU #3351

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 139 additions & 23 deletions modules/load_balancer/doc/load_balancer_admin.xml
Original file line number Diff line number Diff line change
Expand Up @@ -91,23 +91,26 @@
<itemizedlist>
<listitem>
<para>
<emphasis>Dialog</emphasis> - Dialog module
<emphasis>dialog</emphasis> - Dialog module
</para>
</listitem>
<listitem>
<para>
<emphasis>freeswitch</emphasis>. - only if
"fetch_freeswitch_stats" is enabled.
<emphasis>freeswitch</emphasis> - only if
"fetch_freeswitch_stats" is enabled
(required for integrated estimation mode)
</para>
</listitem>
<listitem>
<para>
<emphasis>dialog</emphasis> - TM module (only if probing is
<emphasis>tm</emphasis> - TM module (only if probing is
enabled)
</para>
</listitem>
<listitem>
<para>
<emphasis>clusterer</emphasis> - only if "cluster_id"
option is enabled.
option is enabled
</para>
</listitem>
<listitem>
Expand Down Expand Up @@ -332,12 +335,7 @@ modparam("load_balancer", "lb_define_blacklist", "blist2= 2,10,6")
using statistics pushed by the FreeSWITCH box.
</para>
<para>
The max value of a resource is updated every <emphasis>event_heartbeat_interval</emphasis>
seconds (see the "freeswitch" OpenSIPS module for more details
regarding this setting), as the stats arrive from FreeSWITCH.
</para>
<para>
Given the following format for FreeSWITCH heartbeat messages:
FreeSWITCH heartbeat messages provide the following statistics:
<programlisting format="linespecific">
{
...
Expand All @@ -349,16 +347,83 @@ modparam("load_balancer", "lb_define_blacklist", "blist2= 2,10,6")
...
}
</programlisting>
, the load balancer uses the following formula in order to periodically
update its "max_load" values for each FreeSWITCH box (FreeSWITCH data
is highlighted in bold):
</para>
<para>
The current/maximum sessions and CPU idle data for each instance
are updated as the stats arrive from FreeSWITCH every
event_heartbeat_interval seconds (see the "freeswitch" OpenSIPS
module for more details regarding this setting).
</para>
<para>
These are used according to the operational mode used in the
load balancing function calls.
</para>
<para>
<emphasis role='bold'>
Relative mode
</emphasis>
</para>
<para>
The max load score for each instance is updated every
fetch_freeswitch_stats seconds. In relative mode, the load balancer
uses the following formula in order to periodically update its
"max_load" values for each FreeSWITCH box (FreeSWITCH data is
highlighted in bold):
</para>
<para>
<emphasis>max_load = (<emphasis role='bold'>Idle-CPU</emphasis> / 100)
* (<emphasis role='bold'>Max-Sessions</emphasis> -
(<emphasis role='bold'>Session-Count</emphasis> -
current_load))</emphasis>
</para>

<para>
<emphasis role='bold'>
Integrated estimation mode
</emphasis>
</para>
<para>
This mode is intended to be used in high throughput environments where
not all inbound and outbound sessions are tracked on the local
OpenSIPs instance. The heartbeat data is used as the primary source of
truth for server load.
</para>
<para>
In addition to the data collected in the most recent heartbeat the module
will count sessions allocated to each instance and use this data in
each subsequent calculation to track sessions and distribute the load.
Each fetch_freeswitch_stats interval the sessions since last heartbeat counters
are reset as up to data load data has been provided. It is advisable to set
event_heartbeat_interval and fetch_freeswitch_stats low to improve session
data synchronisation.
</para>
<para>
In integrated estimation mode, the load balancer uses the collects the session
data for each FreeSWITCH box every fetch_freeswitch_stats seconds. Rather than
maintaining a max load score this mode performs the following calculation at
the time a call is selecting a destination (FreeSWITCH data is highlighted in bold):
</para>
<para>
<emphasis>load_score = (100 - (100 * <emphasis role='bold'>Session-Count</emphasis>
+ sessions_since_last_heartbeat / <emphasis role='bold'>Max-Sessions</emphasis>))
* (<emphasis role='bold'>Idle-CPU</emphasis>/100)</emphasis>
</para>
<para>
<programlisting format="linespecific">
<emphasis role='bold'>Warning - heartbeat processing is asynchronous to this module</emphasis>

Heartbeat data is collected in the freeswitch module upon arrival
from each FreeSWITCH instance as controlled by both the minimum interval setting
on the instance and the event_heartbeat_interval module setting. This module
will refresh its internal calculations at intervals defined by
fetch_freeswitch_stats.

When using integrated estimation mode the sessions
since last heartbeat counter will be reset every fetch_freeswitch_stats
seconds. Keeping these values low and the same is advised for more accurate
load estimations according to your throughput requirements.
</programlisting>
</para>
<para>
<emphasis>
Default value is <quote>0</quote> (disabled).
Expand Down Expand Up @@ -405,7 +470,7 @@ modparam("load_balancer", "initial_freeswitch_load", 200)
of the destinations and for controlling the pinging to destinations.
</para>
<para>
If clustering enbled, the module will automatically share changes
If clustering enabled, the module will automatically share changes
over the status of the destinations with the other
OpenSIPS instances that are part of a cluster. Whenever such a status
changes (following an MI command, a probing result, a script command),
Expand Down Expand Up @@ -474,6 +539,35 @@ modparam("load_balancer", "cluster_sharing_tag", "vip")
</example>
</section>

<section id="param_use_cpu_factor" xreflabel="use_cpu_factor">
<title><varname>use_cpu_factor</varname> (integer)</title>
<para>
This is only relevant for "integrated estimation" mode.
</para>
<para>
If enabled, the CPU factor collected in the most recent heartbeat
will be used to reduce the capacity of each FreeSWITCH instance.
</para>
<para>
When disabled, no CPU factor will be applied in the calculation.
</para>
<para>

</para>

<emphasis>
Default value is <quote>empty (disabled)</quote>.
</emphasis>
<example>
<title>Set <varname>use_cpu_factor</varname> parameter</title>
<programlisting format="linespecific">
...
modparam("load_balancer", "use_cpu_factor", 1)
...
</programlisting>
</example>
</section>

</section>


Expand Down Expand Up @@ -510,7 +604,7 @@ modparam("load_balancer", "cluster_sharing_tag", "vip")
</para>
<itemizedlist>
<listitem>
<para><emphasis>n</emphasis> - Negative availability - use
<para><emphasis>n</emphasis> - Negative availability - use
destinations with negative availability (exceeded capacity);
do not ignore resources with negative availability, and thus
able to select for load balancing destinations with exceeded
Expand All @@ -519,13 +613,30 @@ modparam("load_balancer", "cluster_sharing_tag", "vip")
important/high-priority calls.
</para>
</listitem>
<listitem>
<para><emphasis>i</emphasis> - Integrated estimation -
intended for use in deployments
where many separate SIP proxies are feeding calls into
a pool of FreeSWITCH servers. Load calculations are
performed using the most recent heartbeat data and a
counter of all sessions allocated since the last heartbeat.
Profile counting is unused in the calculation. The reported
CPU load value is optionally used to reduce session load on systems
with high CPU utilisation. Mutually exclusive with flag "r".

This is well suited to high performance systems where many calls
may arrive within the heartbeat period (which should be set to the
minimum value 1s when used with this algorithm).
</para>
</listitem>
<listitem>
<para><emphasis>r</emphasis> - Relative value - the relative
available load (how many percentages are free) is used in
computing the load of each pear/resource; Without this flag,
the Absolute value is assumed - the effective
available load ( maximum_load - current_load) is used in
computing the load of each pear/resource.
computing the load of each pear/resource. Mutually exclusive
with flag "i".
</para>
</listitem>
<listitem>
Expand Down Expand Up @@ -574,6 +685,11 @@ modparam("load_balancer", "cluster_sharing_tag", "vip")
(requested resources do not exist)
</para>
</listitem>
<listitem>
<para><emphasis>-5 (false)</emphasis> - mutually exclusive flags
"i" and "r" were both set
</para>
</listitem>
</itemizedlist>
<para>
This function can be used from REQUEST_ROUTE, BRANCH_ROUTE and
Expand All @@ -583,7 +699,7 @@ modparam("load_balancer", "cluster_sharing_tag", "vip")
<title><function>lb_start</function> usage</title>
<programlisting format="linespecific">
...
if (lb_start(1,"trascoding;conference")) {
if (lb_start(1,"transcoding;conference")) {
# dst URI points to the new destination
xlog("sending call to $du\n");
t_relay();
Expand Down Expand Up @@ -630,8 +746,8 @@ if (lb_start(1,"trascoding;conference")) {
</listitem>
<listitem>
<para><emphasis>-2 (false)</emphasis> - no capacity available
(detinations are up and available, but they do not have any
availabe channels)</para>
(destinations are up and available, but they do not have any
available channels)</para>
</listitem>
<listitem>
<para><emphasis>-3 (false)</emphasis> - no more destinations
Expand Down Expand Up @@ -695,7 +811,7 @@ if (t_check_status("(408)|(5[0-9][0-9])")) {
<para>
Function to stop and flush a current LB session. To be used in
failure route, if you want to stop the current LB session (not to try
any other destinations from this session) and to start a completly new
any other destinations from this session) and to start a completely new
one.
</para>
<para>
Expand Down Expand Up @@ -882,7 +998,7 @@ if (lb_is_destination($si,$sp) ) {
<title><function>lb_count_call</function> usage</title>
<programlisting format="linespecific">
...
# count as load also the calls orgininated by lb destinations
# count as load also the calls originated by lb destinations
if (lb_is_destination($si,$sp) ) {
# inbound call from destination
lb_count_call($si,$sp,-1,"conference");
Expand Down Expand Up @@ -911,7 +1027,7 @@ if (lb_is_destination($si,$sp) ) {
<section id="mi_lb_reload" xreflabel="lb_reload">
<title><function moreinfo="none">lb_reload</function></title>
<para>
Trigers the reload of the load balancing data from the DB.
Triggers the reload of the load balancing data from the DB.
</para>
<para>
MI FIFO Command Format:
Expand Down
40 changes: 37 additions & 3 deletions modules/load_balancer/lb_data.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
/* dialog stuff */
extern struct dlg_binds lb_dlg_binds;

extern int use_cpu_factor;
extern int fetch_freeswitch_stats;
extern int initial_fs_load;
extern struct fs_binds fs_api;
Expand Down Expand Up @@ -308,6 +309,10 @@ int add_lb_dsturi( struct lb_data *data, int id, int group, char *uri,
fs_url = r->fs_url;
dst->rmap[i].max_load = initial_fs_load;
dst->rmap[i].fs_enabled = 1;

dst->rmap[i].current_sessions = 0;
dst->rmap[i].max_sessions = 0;
dst->rmap[i].cpu_idle = 100;
} else {
dst->rmap[i].max_load = r->val;
}
Expand Down Expand Up @@ -424,6 +429,18 @@ static int get_dst_load(struct lb_resource **res, unsigned int res_no,
if( flags & LB_FLAGS_RELATIVE ) {
if( dst->rmap[l].max_load )
av = 100 - (100 * lb_dlg_binds.get_profile_size(res[k]->profile, &dst->profile_id) / dst->rmap[l].max_load);
} else if( flags & LB_FLAGS_PERCENT_WITH_CPU ) {
if( dst->rmap[l].max_sessions ) {
if(use_cpu_factor) {
/* generate score based on the percentage of channels occupied, reduced by CPU idle factor */
av = ( 100 - ( 100 * ( dst->rmap[l].current_sessions + dst->rmap[l].sessions_since_last_heartbeat ) / dst->rmap[l].max_sessions ) ) * dst->rmap[l].cpu_idle;
LM_DBG("destination %d <%.*s> availability score %d (sessions=%d since_last_hb=%d max_sess=%d cpu_idle=%.2f)", dst->id, dst->uri.len, dst->uri.s, av, dst->rmap[l].current_sessions, dst->rmap[l].sessions_since_last_heartbeat, dst->rmap[l].max_sessions, dst->rmap[l].cpu_idle);
} else {
/* generate score based on the percentage of channels occupied */
av = 100 - ( 100 * ( dst->rmap[l].current_sessions + dst->rmap[l].sessions_since_last_heartbeat ) / dst->rmap[l].max_sessions );
LM_DBG("destination %d <%.*s> availability score %d (sessions=%d since_last_hb=%d max_sess=%d)", dst->id, dst->uri.len, dst->uri.s, av, dst->rmap[l].current_sessions, dst->rmap[l].sessions_since_last_heartbeat, dst->rmap[l].max_sessions);
}
}
} else {
av = dst->rmap[l].max_load - lb_dlg_binds.get_profile_size(res[k]->profile, &dst->profile_id);
}
Expand Down Expand Up @@ -490,7 +507,7 @@ int lb_route(struct sip_msg *req, int group, struct lb_res_str_list *rl,
struct lb_resource *it_r;
int load, it_l;
int i, j, cond, cnt_aval_dst;

unsigned int k, l;

/* init control vars state */
res_cur = NULL;
Expand Down Expand Up @@ -756,8 +773,7 @@ int lb_route(struct sip_msg *req, int group, struct lb_res_str_list *rl,
cnt_aval_dst = 0;
for( it_d=data->dsts,i=0,j=0 ; it_d ; it_d=it_d->next ) {
if( it_d->group == group ) {
if( (dst_bitmap_cur[i] & (1 << j)) &&
((it_d->flags & LB_DST_STAT_DSBL_FLAG) == 0) ) {
if( (dst_bitmap_cur[i] & (1 << j)) && ((it_d->flags & LB_DST_STAT_DSBL_FLAG) == 0) ) {
/* valid destination (group & resources & status) */
cnt_aval_dst++;
if( get_dst_load(res_cur, res_cur_n, it_d, flags, &it_l) ) {
Expand Down Expand Up @@ -818,11 +834,29 @@ int lb_route(struct sip_msg *req, int group, struct lb_res_str_list *rl,


if( dst != NULL ) {

LM_DBG("%s call of LB - winning destination %d <%.*s> selected "
"for LB set with free=%d\n",
(reuse ? "sequential" : "initial"),
dst->id, dst->uri.len, dst->uri.s, load );

if ( flags & LB_FLAGS_PERCENT_WITH_CPU ) {

// find all resources used by this call, increment on each
for( k=0 ; k<res_cur_n ; k++ ) {
for (l=0 ; l<dst->rmap_no ; l++ ) {
if( res_cur[k] == dst->rmap[l].resource ) {
dst->rmap[l].sessions_since_last_heartbeat++;

LM_DBG("incrementing sess since last HB for winning destination %d <%.*s> (sessions_since_last_heartbeat=%d)\n",
dst->id, dst->uri.len, dst->uri.s, dst->rmap[l].sessions_since_last_heartbeat );

break; // exit the loop
}
}
}
}

/* add to the profiles */
for( i=0 ; i<res_cur_n ; i++ ) {
if( lb_dlg_binds.set_profile(dlg, &dst->profile_id,
Expand Down
17 changes: 13 additions & 4 deletions modules/load_balancer/lb_data.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,11 @@
#include "../freeswitch/fs_api.h"
#include "lb_parser.h"

#define LB_FLAGS_RELATIVE (1<<0) /* do relative versus absolute estimation. default is absolute */
#define LB_FLAGS_NEGATIVE (1<<1) /* do not skip negative loads. default to skip */
#define LB_FLAGS_RANDOM (1<<2) /* pick a random destination among all selected dsts with equal load */
#define LB_FLAGS_DEFAULT 0
#define LB_FLAGS_RELATIVE (1<<0) /* do relative versus absolute estimation. default is absolute */
#define LB_FLAGS_NEGATIVE (1<<1) /* do not skip negative loads. default to skip */
#define LB_FLAGS_RANDOM (1<<2) /* pick a random destination among all selected dsts with equal load */
#define LB_FLAGS_PERCENT_WITH_CPU (1<<3) /* score as percentage of max sessions used + CPU util factor */
#define LB_FLAGS_DEFAULT 0

#define LB_DST_PING_DSBL_FLAG (1<<0)
#define LB_DST_PING_PERM_FLAG (1<<1)
Expand All @@ -62,6 +63,14 @@ struct lb_resource_map {
struct lb_resource *resource;
unsigned int max_load;

/* data received in last heartbeat */
unsigned int max_sessions;
unsigned int current_sessions;
float cpu_idle;

/* count of sessions allocated since last FS heartbeat */
unsigned int sessions_since_last_heartbeat;

int fs_enabled;
};

Expand Down
Loading
Loading