-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log-Receiver: Investigation #3567
Comments
Hey @giantswarm/team-atlas so I've been doing some investigation on this topic for a while as I was getting into open-telemetry and I think we could do it in multiple ways:
I definitely would be in favor of solution 2 because I think it's the most useful one future wise but it will most likely take longer. Now, to my point about the api keys, that is something we could start thinking about today. Do we think it makes sense to move to some kind of PKI for this? |
I would also support the second solution because it's the most secure and future-safe approach. Don't want to make the platform less secure and put a legacy agent we wanted to get rid of for such a niché feature. |
I need to draw something yes so we can discuss it as a team tomorrow and find the end state we all want, I will try to do it later tonight to explain where I think our observability platform should go to be able to support more features and ideally otel OOTB. I wanted to do it today but life got in the way. Once we have this, we can agree on steps we want to do in the implementation phase :) |
@giantswarm/team-atlas
Maybe you can think of an easier way to make this work for now? The main idea here is to make sure @QuantumEnigmaa and @TheoBrigitte can work on the implementation phase if we think this is legit :) |
Can we shift the perspective slightly and look at it from a customer journey perspective as well? |
In that journey, they could create the CR with whatever name they want and thé operators would generate a secret they would need get to configure their logshipper. It's thé best we can do without any ui integration |
So they need a logshipper that sends the data to alloy which only receives but doesn't scrape? :) For example: customer A wants to get the logs of a Cloud Service Database that is connected to their app in the cluster. So they set up fluentbit or whatever tool they like with access to the DB app, then they create the cloudDB-CR which generates a secret. Now where do they access that secret? Once they accessed it and have the secret they add it to fluentbit, with the target to send it to (where do they get that target?) and finally babam, logs in Grafana? |
So yes they create a CR on the MC and they check the status of that cr on thé mc to get the name of the secret and get that secret value on the mc as well. Maybe it's not the best user journey, but i'm not sure any other would bé approved by security. Once they have thé secret they should send data to our alloy on thé mc, which is one of the main reason why a single ingress for observability would bé helpful |
I like the idea of the gateways: observability-gateway (Alloy for now IIUC) in the MC and o11y-data-gateway in the WC. I'm fine with the Source CRD to allow the customer adding new data sources and get credentials to send their data to us. I like the way you propose to configure the observability bundle. The tenant/organization configuration is still not clear to me. The topology type is a nice idea, but not sure it's the priority. |
So coming back to use cases for @Rotfuks because I'm on my laptop today and I can explain better :D I'll call the CRD How to send logs to our Managed Loki1. Generate an API Key
Remarks
2. Configure the application:
Remarks
3. Go to grafana and see your logs :)RemarksIt could be nice to be able to have a view of that pipeline in some kind of blocks in grafana I guess like @Rotfuks maybe something for you: @Rotfuks this is definitely something for you: |
Interesting idea that came from a discussion with honeybadger, we should probably use the external storage operator to push the secret back to customers (https://external-secrets.io/latest/api/pushsecret/) and most likely to create the api key as well https://external-secrets.io/latest/api/generator/password/ (less code for us) |
Also, maybe a better Idea, let's not use api keys but oidc in front of the gateway so tokens are rotated? |
Let's wait for feedback from @giantswarm/team-bigmac https://gigantic.slack.com/archives/C053JHJC99Q/p1722015045075429 |
Alright, big mac is sadly completely overloaded with topics already. Let's talk once you're back what exactly we need from BigMac and how we can reduce the dependency to them. Maybe we can boil it down to a kickoff workshop so we can do the PoC on our own. I'll discuss it further. |
I like the idea of using Alloy as our OpenTelemetry gateway, it does support a wide variety of receivers (OpenTelemetry, Datadog, Jaeger, Kafka, etc ...). But I would also like to know what are the use cases and which receiver should we support. I would also be interested on defining a high level user journey with this new solution.
I would rather have the user create a new Omega CR to get a new API key, rather than have them delete the secret attached to the current Omega CR, this make things complicated IMO.
We first need to figure out how we implement Authentication and how we add support for the different protocols (http, grpc, thrift_http, and the like)
What do you mean by this?
I think it would be good to point user on where and how they can visualize their data in Grafana.
This would be a view for us for debugging purposes right ? |
So originally, at least for receiving logs, I wanted to reduce the surface by only opening the default OLTP port (http and GRPC) as it is usually supported pretty well and we see with time if we need to open more things.
We can discuss that at the end of the week for sure. We are currently having discussions to see if we can use OIDC instead of API keys to actually make it more secure so I did not spend too much time investigating this. The recreaction part is also because our operator use it so we cannot really create a new secret so easily
I linked this page in the early discussions https://grafana.com/docs/alloy/latest/reference/components/otelcol/otelcol.receiver.otlp/ that explains how to enable the oltp receiver and there are also extra components with auth in them https://grafana.com/docs/alloy/latest/reference/components/otelcol/otelcol.auth.bearer/ but ongoing discussions ar e going towards OIDC and either DEX or SPIFFE/SPIRE but we need to see if that is achievable in a possible timeline which I highly doubt.
This was me explaining how our operators will also use the Omega CRD
Yes and I hope this gets built into backstage
For us and cutomers yes, kinda like the alloy ui but in grafana :) |
Just to echo my comments in the internal threads, I think a homegrown API key mechanism is the wrong way to go here. The tools exist in the ecosystem to use an identity-based authn/z scheme. Aside from being more secure, we already have other use cases for it, it is a great platform feature, and it ends up being less work in the long run anyway |
I totally agree here @stone-z, but you know I'm a bit skeptical when it comes to a possibe timeline for say Spire :D |
Currently moving the grafana-multi-tenant-proxy in front of loki to be used with the alloy gateway in multiple PRs: |
If we cannot have spire running, we could theoretically use this https://kubernetes.github.io/ingress-nginx/examples/auth/client-certs/ |
@Rotfuks I think the investigation is done if we thinkg about security as a next step right?
And we can figure out security in the implementation phase? I'm asking because I don't know where to go from here |
Do you have a list of security risks already that we have to figure out/keep in mind when we implement it? |
I think the main thing is we need workload identity or something kind of cert authentication in the MVP but we Can start with the implémentation and deploy alloy as a gateway and replace our current pipeline first with oltp and maybe Loki protocol and check the différence in resource usage |
Good, would love if you could update the implementation ticket with your findings from here and what our steps to implement the log receiver would be (maybe also a smol architecture graphic would help to drive what we want to achieve here?) Then we can close this. Thanks! |
See write up here #3568 (comment), I'm closing in favor of implementation specific questions |
Motivation
We need to make sure customers can receive logs from outside the installations. For this we need to first find out, how exactly we can achieve this. Do we need a new component, can we reuse an already existing one, or do we have to create our own thing?
Investigation
Outcome
The text was updated successfully, but these errors were encountered: