-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Opinionated OpenTelemetry Operator Sampling CR #3297
base: main
Are you sure you want to change the base?
Conversation
edaf054
to
2e2acce
Compare
Signed-off-by: Benedikt Bongartz <[email protected]>
2e2acce
to
abd2c76
Compare
components: | ||
loadbalancer: | ||
# Defines if the component is managed by the operator | ||
managementState: managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed to set this field at the component level instead of just doing it at the full object level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually Ive no strong opinion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts. Overall, I think i'm unclear how the CR solves the goals / use cases. It feels like this is very cluster admin focused but doesn't give users many options for customization. What if we used some annotations on the pods to determine something about the sampling configuration? Also, how would a user opt into using the created sampler architecture?
I think I would prefer a design that's a bit more composable for cluster admins / automatic for users. I.e.
Cluster admins:
- determine which services should be instrumented via a label selector (maybe this is in its own CR)
- define a
SamplerPolicy
CR- I like the components section, but I want this CR to be more flexible in terms of architecture based on sampler type
- it would also be nice of users could bring their own collectors in the mix.
- I also want support for sampling at the SDK level in here, as IMO that's the biggest gains in terms of efficiency
- I like the components section, but I want this CR to be more flexible in terms of architecture based on sampler type
Users:
- Optionally add an otel-instrumentation specific label to get instrumentation if the cluster admin requires on
- Adds annotations to override the instrumentation policy within their pod template. The cluster admin should be able to define the limits for these (something like a valid range for sample % or max cardinality, etc.)
LMK if you want to talk through this more :) I like where this is going though.
## Goals and non-goals | ||
|
||
**Goals** | ||
- Provide an opinionated CR to simply sampling configuration in distributed environments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Provide an opinionated CR to simply sampling configuration in distributed environments | |
- Provide an opinionated CR to simplify sampling configuration in distributed environments |
|
||
### CASE 3 | ||
|
||
As a user I want to be able to filter relevant data without much specific open telemetry knowledge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user I want to be able to filter relevant data without much specific open telemetry knowledge. | |
As a user I want to be able to filter relevant data without much specific OpenTelemetry knowledge. |
|
||
![sampling arch](./images/arch_sampling_crd.png) | ||
|
||
This custom resource creates an environment that allows us to apply e.g. tailbased sampling in a distributed environment. The operator takes care of creating an optional otel LB service and sampling instances similar to the figure shown above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we prefer tail sampling in the collector versus a global sampling policy that we apply to instrumentations? I guess not everything will be otel instrumented...
|
||
LB instances will be pre-configured to distribute traces based on a given routing key like the traceID to the sampler instances. | ||
|
||
A policy is used to define which telemetry data should be sampled. Available policies can be found in the [tailbased sampling description](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also want us to be able to support the probabilistic sampler which requires no sticky load balancing configuration https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/probabilisticsamplerprocessor. How would this design work with that?
1. Introduction of the CRD in v1alpha1. | ||
2. First controller implementation. | ||
3. Implementation of e2e tests. | ||
4. CRD becomes part of the operator bundle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's adjust this to include some more iteration and design changes prior to being in the main bundle.
|
||
**Goals** | ||
- Provide an opinionated CR to simply sampling configuration in distributed environments | ||
- Allow managing access using RBAC to different parts of the collector configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how the design solves this goal.
Description:
Link to tracking Issue(s):
Testing:
Documentation:
https://github.com/frzifus/opentelemetry-operator/blob/rfc/sampling/docs/rfcs/opinionated_sampling_cr.md