-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add traffic_rule label to router metrics #280
Conversation
|
||
// select primary route and fallbacks, based on the results of testing | ||
// given request against traffic-splitting rules | ||
for i := 0; i < len(results); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have some misconception here,
I would had thought because of the orthogonality checks on the rules, there should always be at most one rule that can be matched given a request, and that should be the reason of the removal (and probably previously), here instead of
we may not have enough time to (sequentially) execute multiple matches
didnt catch the timing part since to me there should only be 1 rule matched for any request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh you're right about your assumption but it's missing one detail - the idea of overlapping rules at different priorities (kind of similar to XP experiments). This is the relevant validation.
So, if there were 2 rules:
- {"a": [1]}
- {"a": [1], "b": [2]}
A request that matches # 2, will also match # 1. In this case, we will just go with the order in which the rules are defined, to pick the appropriate rule (per this example, # 2 will never be applied because it comes after the broader rule).
Left a comment, the rest LGTM! |
Thanks for the review, @leonlnj. @caraml-dev/turing-dev, if there are no other comments, could one of you hit the "approve" button to satisfy the CI ? :) |
LGTM! Thanks for this! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a clarifying question, the rest LGTM, thanks @krithika369 !
// Given request hasn't satisfied any of the rules configured on this routing strategy; | ||
// check if default route exists. | ||
if defaultRoute, exist := routes[s.DefaultRouteID]; exist { | ||
return defaultRoute, []fiber.Component{}, labels.WithLabel(TrafficRuleLabel, defaultRoute.ID()), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally when we return orderedRoutes[1:]
, that value was never used right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct.
Thanks for the reviews, all! Merging. |
This PR introduces a new label
traffic_rule
to the following Prometheus metrics produced by the Turing router:RouteRequestDurationMs
TuringComponentRequestDurationMs
, where the component isroute
Eg:
Changes
engines/router/missionctl/fiberapi/traffic_splitting_strategy.go
- Settraffic-rule
label inSelectRoute
, to hold the matched traffic rule. Further, this function is refactored to only pick the first traffic rule matched as realistically, we may not have enough time to (sequentially) execute multiple matches. This also simplifies the matched rule metrics logging (otherwise, when there are multiple matched rules, we need more intricate wiring within Fiber to know which rule was ultimately executed).engines/router/missionctl/fiberapi/interceptors.go
- Associatetraffic_rule
label to theRouteRequestDurationMs
metricengines/router/missionctl/mission_control.go
,engines/router/missionctl/mission_control_upi.go
- Associatetraffic_rule
label to theTuringComponentRequestDurationMs
metric for theroute
step.Known issues
There is a limitation as of the current implementation in this PR (and the relevant Fiber PR): Issue #281
This will be addressed separately, with other improvements to timeout scenarios.
TODO