Flux (and apps) in MC logs visibility to customers #948

gianfranco-l · 2022-03-24T17:14:36Z

From a customer request:

In the context of Management cluster auth via Dex against Okta , the customer requested:

Need logs visibility for Flux (as an admin)

kubectl logs -n flux-system kustomize-controller-5bd7f98f56-frkd
TODO: check whether this is possible

The text was updated successfully, but these errors were encountered:

gianfranco-l · 2022-03-24T17:19:49Z

@marians @pipo02mix if some context is missing, please addi it :)
cc @giantswarm/team-honeybadger @MarcelMue

kubasobon · 2022-03-25T08:56:31Z

I'm quite strongly against allowing the (unfiltered) log access. It's a managed solution - customers should not have access to anything in flux-system by principle.
I'd rather us expose the Flux-related Prometheus metrics, preferably through an exporter which grants us ability to filter them, than give direct access to logs.

marians · 2022-03-25T09:03:11Z

(1) and (2) are very much separate requests. I suggest to split them. (2) is quite customer specific, so potentially worth discussing in a non-public repo.

pipo02mix · 2022-03-25T10:50:09Z

IMO makes sense to give them view access for the flux that managed the apps/cluster they install (not the one manage our components). If we put that in a dashboard is even better

piontec · 2022-03-25T13:15:02Z

The problem is that even if we figure out the permissions problem reasonably, there's no way we can do org separation in logs. I really don't see much value in here, as really flux's CR statuses are pretty verbose. Do you have as specific use case, where logs were really needed to debug a problem?

marians · 2022-03-25T13:23:01Z

To my understanding, org separation is not an issue here. This is about access for customer admins, who are bound to the cluster-admin clusterrole currently via the customer's default admin group (in this case customer:rg_okta_giantswarm-admin).

gianfranco-l · 2022-03-28T12:34:58Z

we will bring this to SIG product to discuss if we shall provide access or not to logs for apps on the MC to the customers

gianfranco-l · 2022-03-31T10:58:15Z

What do you think of this @JosephSalisbury @weatherhog @puja108 ?

JosephSalisbury · 2022-03-31T12:27:02Z

I reckon this can be done with #311, sure

up to @weatherhog for priority, but iirc it's soon

puja108 · 2022-03-31T12:55:23Z

From reading the above it is still unclear to me what the customer requested (what logs do they want to see exactly? which Flux? what resources are they interested in?) and more importantly what their use case for seeing those logs is, i.e. what are they trying to achieve, how is it failing, why do they need log access to debug that,.... Only when the use case is clear can we realistically evaluate what makes sense going forward and if log access in any form would really solve it and/or what alternative we'd rather offer.

pipo02mix · 2022-04-01T07:29:17Z

the use case would be

As a cluster admin, I would like to debug why some (managed) apps are not installed in my workload cluster

I am not sure if they need logs or status show enough. But IMO getting logs from flux is not something will hurt us

marians · 2022-04-01T07:39:01Z

Regarding which logs the customer wanted to see, that info is very detailed: those of the kustomize-controller-* pod in the flux-system namespace.

JosephSalisbury · 2022-04-04T09:49:14Z

Discussion in SIG Product Sync:

Customers should be able to access logs for components in MC
Pinging @weatherhog & @JosephSalisbury, this is dependent on Atlas' larger logging story
We want to be aware that access to logging (as with metrics) is only available to an admin group (not all users)
@gianfranco-l / @PrashantR31 it might be most straightforwards to work out who should give access to the customer, and then create an appropriate Role / RoleBinding, to get the customer issue sorted - it's not a product solution, but solves for the customer until the product solution is sorted
currently, we don't store these logs

teemow · 2022-04-04T09:54:27Z

From a product point of view:

Admins on the customer side should have access to all logs on the management cluster. We want to be transparent. Admins should be able to understand what is going on. They should also be able to debug their own actions or the managment cluster misbehaving.

So an admin role should have the permission to access logs of all pods.

ghost · 2022-04-04T10:49:54Z

@giantswarm/team-atlas , tangent question: does our Loki support SSO ?

weatherhog · 2022-04-04T12:03:18Z

@PrashantR31 Loki itself does not provide any authentication mechanisms. So in the End this a no to SSO support. If authorisation is needed for loki this is mainly dependent on the customers preferred way. It could be a combination out of loki and oauth2-proxy or it also could be loki saas.

ghost · 2022-04-04T12:45:50Z

Thanks @weatherhog , just too bad Loki does not have SSO support

But what I was thinking that:

If Loki can be installed on the MC's where Flux is installed and all the logs will go there for admin access only.
And if Grafana is on the MC we could provide just a few basic dashboards views for the customer, or even use the Grafana cloud (if this is possible)
Extending that, if we add alerting on logs that could give a complete observability solution for the customer Admin users.
Grafana has SSO capabilities, so we could allow only allow the admin group to certain dashboards.
These are some thoughts and do not know how much in that is feasible or is in line of what we are providing.

teemow · 2022-04-04T14:01:14Z

Fyi @PrashantR31 Atlas already has a story about what you described #311

ghost · 2022-04-08T09:17:46Z

Action from refinement in Rainbow :

Rainbow create a detailed specs so as to help HoneyBadger to create the Role and Role-binding to the default Admin group in flux app via config.
Handover to @giantswarm/team-honeybadger for implementation

paurosello · 2022-09-22T13:58:10Z

Yeah, that is it, a new ClusterRole. Not sure we can limit by pods inside a namespace though.

Not sure how we can evaluate what logs come of every operator but I would assume no credentials are leaked at all, maybe account IDs and other infra resources names but I think those are also available in the CRs. In any case will try to find out if we can limit to specific pods and then we add as needed.

piontec · 2022-09-22T14:25:04Z

If we leak no credentials, I think we're good - that was my only concern 👍

teemow · 2022-09-24T07:00:46Z

Imo your last comment @piontec should be our only concern. The "why" isn't the problem. I think @paurosello described it well and imo customers should have access to all logs on the management cluster. There will also be different roles in the future. But next to central logging we need to have native logging (kubectl logs) as well.

At the moment I would see four use cases and we might already separate them

Flux logs
CAPI controllers logs
Organization namespace (eg customer controllers)
All MC logs

Going one by one component is a first option. And then we need a way to become confident to create a role that opens up all logs.

@piontec @gianfranco-l depending on the amount of work that is necessary, can you check if you can enable flux logs so this story becomes unblocked?

@TheoBrigitte @weatherhog can you take the general story around access to MC logs into Atlas and think about it more holistically? Afaik you already have a story about letting customers monitor their own controllers on the management clusters too.

QuentinBisson · 2022-09-26T09:31:37Z

@teemow We will provide access to the logs to customers once we have a logging solution up an running (one central Loki).

The main issues will arise from how we want to do multi-teancy but this is in refinement.

teemow · 2022-09-26T10:01:41Z

@QuentinBisson what is the reason to wait for central logging? As I mentioned I think customers should be able to use a simple kubectl logs for all components in the MC too.

QuentinBisson · 2022-09-26T10:18:28Z

I was answering about the log story regarding atlas and after rereading, I understand how it was misinterpreted 😅 .
I agree customers should be able to access the logs using kubectl logs on the MC anyway.

One interesting thing for us would be to ask customers how they are using MC logs as it could be useful for use once we have loki up and running

teemow · 2022-09-26T11:18:13Z

I agree customers should be able to access the logs using kubectl logs on the MC anyway.

My question was if Atlas can take this too. Somebody needs to think about the general concept of how we structure access to logs on the management cluster.

piontec · 2022-09-26T11:25:57Z

Flux has a strict set of pods, so targeting them with RBAC shouldn't be that hard. @paurosello , have you already started, do you have something you can share?

@gianfranco-l , as we need to prioritize this, I'm putting it back into our Inbox.

@teemow Generally speaking, we can "just" grant the (Cluster)Role access to pods/log everywhere, but I really don't feel comfortable doing that - personally, I have no idea what we might leak in the logs. Is there a reasonable (time, effort) way to check this? Or are we sure that operators we run (ours, external, upstream) never log anything like access tokens?

QuentinBisson · 2022-09-26T12:07:07Z

I agree customers should be able to access the logs using kubectl logs on the MC anyway.

My question was if Atlas can take this too. Somebody needs to think about the general concept of how we structure access to logs on the management cluster.

Yes we can create the cluster role with potential restrictions if we think it's needed but the binding part from customer to role should be elsewhere (i.e. done by another team).

@giantswarm/team-honeybadger how do you currently create cluster roles at the MC level so we could do the same?

teemow · 2022-09-26T20:01:42Z

@QuentinBisson my point was that Atlas should think about whether and which roles are needed and takes full ownership of how we allow customers to access logs on management clusters 🙏

@piontec as mentioned above (#948 (comment)) we need to assess and build up confidence. A simple role to access all pods/log is not what I want. But I would leave the assessment and the structure of the roles to Atlas. I only asked if Honey Badger can unblock this by adding a role for flux.

mproffitt · 2022-10-05T08:52:00Z

A new role has been introduced int he flux-system namespace with RoleBinding to default:automation and the customer cluster admin group.

This has been tested by Adam from THG and provides him with the log access he requires for Flux.

gianfranco-l · 2022-10-05T08:58:30Z

added Atlas label and in Atlas board so that they can work on their part of the scope as outlined here

teemow · 2022-10-13T14:27:23Z

@gianfranco-l @mproffitt we have another request to give access to app-operator logs to debug the installation of an app (ArgoCD) on the management cluster. Can @mproffitt enable this similarly to how he did this for flux?

@TheoBrigitte as you can see this is getting more and more important. Imo customers would also like to inspect logs of capi controllers soon. So a more general set of permissions that allow customers to read the logs on the management cluster would be good. And I don't mean yet via loki. Just via kubectl logs.

mproffitt · 2022-10-14T06:58:27Z

@teemow Exposing the logs from pods within the giantswarm namespace isn't a problem and I already had an internal ticket for this at https://github.com/giantswarm/giantswarm/issues/24033 although it is not possible to restrict to just the affected pods ((app | chart)-operator-*) - customers will gain access to all logs inside that namespace and we have to question if that is something we want to allow, or if there are valid reasons we shouldn't be allowing access to all logs inside that namespace

IMHO requests seem to be leaning towards granting more and more privileges to things we own on the management clusters and this is something I consider to be potentially dangerous. We need to consider how we can best protect our resources whilst enabling the customer to achieve their own product vision.

teemow · 2022-10-14T17:09:14Z

I hear your worries. But for me the benefits and our general philosophy around shared responsibility, knowledge sharing and providing a transparent platform and not a blackbox to the customers outweights this imo. This enables customers to solve their own problems much quicker and not having to wait for our support to respond. This is good for the customers and good for us.

piontec · 2022-10-18T10:10:36Z

But to be able to do that, we have to re-work and fix part of our deployment configs, namely secrets shared across all the installations. If we don't do that, we risk one customer finding out secrets to other installations - I still find this unacceptable.

On the other hand: we talked within the team and we think keeping customers from accessing everything on the MC doesn't make sense anymore. It's too hard too check if every possible escalation path is secured. It's super time-consuming to check every custom operator customers want to run against RBAC policies and what they can access. I think we should separate access credentials between installations "soon" and then throw the towel and give up: just give customers MC admin permissions, with them accepting the associated risk. WDYT?

teemow · 2022-10-18T12:16:53Z

Why does giving access to logs give access to secrets? Can you explain this a bit better?

I disagree that it is a "throw a towel" situation. We need to think about what is important. And UX and security need to go hand in hand. This is a challenge and yes it needs thorough investigation. Just simplifying the role of the customer to admin permissions is not what we should do imo. This would leave all the work and mistakes to the customer. Yes we maybe want to give admin permissions in general. But we also need to make sure to have well defined roles so it is easy for the customer to follow a least privilege principle for both users and applications.

piontec · 2022-10-19T14:59:15Z

I agree about well-defined roles. What I mean that is hard or maybe even impossible for us is to judge if our MCs are still secure giving the following facts:

we allow them to directly use our components, like app-operator and chart-operator, which run on admin premissions
we're introducing more and more permissions for customers:
- logs access
- custome operator roles
incoming crossplane
plus everything we already have on MC.

What I want to say is that it is already super hard to tell if the sum of all mentioned above is secure or not. With each custom role, we make it even more complex and we increase the probability that an attack vector exists that allows you to effectively get cluster-admin as a customer anyway.

As such, maybe we shouldn't protect our Secrets at all cost, but instead make sure that secrets we get there are safe to read by s customer. Then, we can tell the customers "it's OK to run this or that with cluster-admin, but please be aware of consequences; still, if you really want to, you can". And for us, since the customer already has potential admin access, we don't have to make sure that it can't read something from the giantswarm namespace, but only that it's not "wildely" possible for no reason.

teemow · 2022-10-20T09:29:34Z

💯. We aren't there yet. First step is to work on our secret management and second is to write more documentation about the management cluster structure etc. Customers need to fully understand how the management cluster works before they do something with admin permissions. And our goal should still be to define proper roles so that admin permissions for the customers are only necessary in very specific (and currently unsupported) use-cases.

This would enable customers and they aren't blocked by us. Less pressure on our teams. But it also adds responsibility to the customer as they can break more things. We need to make this clear and also build a safety net.

But I like this as a goal to work towards. @puja108 wdyt? I'll add a note to our board so that we can create a proper epic about it in horizon first.

puja108 · 2022-10-20T12:17:50Z

Yeah, I'm also evaluating and talking to upbound and Redhat around kcp towards such use cases. It could enable better interfaces and tighter security within our MCs. Some of these topics also go towards a possible MC team within KaaS to improve architecture of MCs. Definitely a lot of sub-stories to it, but let's start with the bigger picture in Horizon for now.

piontec · 2023-04-11T12:18:23Z

Closing, as it's done.

gianfranco-l added topic/release-engineering component/dex component/flux feature-request labels Mar 24, 2022

gianfranco-l changed the title ~~Management cluster auth via Dex against Okta in GitOps context~~ Management cluster auth via Dex against Okta in GitOps context: flux logs visibility Mar 25, 2022

gianfranco-l changed the title ~~Management cluster auth via Dex against Okta in GitOps context: flux logs visibility~~ Flux (and managed apps) logs visibility Mar 28, 2022

gianfranco-l changed the title ~~Flux (and managed apps) logs visibility~~ Flux (and managed apps) logs visibility to customers Mar 28, 2022

gianfranco-l added the topic/monitoring label Mar 28, 2022

JosephSalisbury added the team/honeybadger Team Honey Badger label Apr 4, 2022

gianfranco-l changed the title ~~Flux (and managed apps) logs visibility to customers~~ Flux (and apps) in MC logs visibility to customers Apr 4, 2022

ghost added the needs/refinement Needs refinement in order to be actionable label Apr 4, 2022

gianfranco-l moved this from Future (> 6 months) to Near Term (1-3 months) in Roadmap Sep 26, 2022

gianfranco-l assigned mproffitt and unassigned marians Sep 27, 2022

kubasobon closed this as completed Oct 5, 2022

Repository owner moved this from Near Term (1-3 months) to Released in Roadmap Oct 5, 2022

kubasobon reopened this Oct 5, 2022

mproffitt mentioned this issue Oct 21, 2022

Add pod logs giantswarm/rbac-operator#350

Merged

2 tasks

teemow mentioned this issue Nov 30, 2022

Cluster-admin on management cluster for customers #1711

Closed

piontec closed this as completed Apr 11, 2023

Flux (and apps) in MC logs visibility to customers #948

Flux (and apps) in MC logs visibility to customers #948

Comments

gianfranco-l commented Mar 24, 2022 • edited Loading

gianfranco-l commented Mar 24, 2022

kubasobon commented Mar 25, 2022

marians commented Mar 25, 2022

pipo02mix commented Mar 25, 2022

piontec commented Mar 25, 2022

marians commented Mar 25, 2022 • edited Loading

gianfranco-l commented Mar 28, 2022 • edited Loading

gianfranco-l commented Mar 31, 2022

JosephSalisbury commented Mar 31, 2022

puja108 commented Mar 31, 2022

pipo02mix commented Apr 1, 2022

marians commented Apr 1, 2022

JosephSalisbury commented Apr 4, 2022

teemow commented Apr 4, 2022

ghost commented Apr 4, 2022

weatherhog commented Apr 4, 2022

ghost commented Apr 4, 2022

teemow commented Apr 4, 2022

ghost commented Apr 8, 2022 • edited by ghost Loading

paurosello commented Sep 22, 2022

piontec commented Sep 22, 2022

teemow commented Sep 24, 2022

QuentinBisson commented Sep 26, 2022

teemow commented Sep 26, 2022

QuentinBisson commented Sep 26, 2022 • edited Loading

teemow commented Sep 26, 2022

piontec commented Sep 26, 2022

QuentinBisson commented Sep 26, 2022 • edited Loading

teemow commented Sep 26, 2022

mproffitt commented Oct 5, 2022

gianfranco-l commented Oct 5, 2022 • edited Loading

teemow commented Oct 13, 2022

mproffitt commented Oct 14, 2022

teemow commented Oct 14, 2022

piontec commented Oct 18, 2022

teemow commented Oct 18, 2022

piontec commented Oct 19, 2022

teemow commented Oct 20, 2022

puja108 commented Oct 20, 2022

piontec commented Apr 11, 2023

gianfranco-l commented Mar 24, 2022 •

edited

Loading

marians commented Mar 25, 2022 •

edited

Loading

gianfranco-l commented Mar 28, 2022 •

edited

Loading

ghost commented Apr 8, 2022 •

edited by ghost

Loading

QuentinBisson commented Sep 26, 2022 •

edited

Loading

QuentinBisson commented Sep 26, 2022 •

edited

Loading

gianfranco-l commented Oct 5, 2022 •

edited

Loading