Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux (and apps) in MC logs visibility to customers #948

Closed
gianfranco-l opened this issue Mar 24, 2022 · 57 comments
Closed

Flux (and apps) in MC logs visibility to customers #948

gianfranco-l opened this issue Mar 24, 2022 · 57 comments
Assignees
Labels
component/dex component/flux feature-request needs/refinement Needs refinement in order to be actionable needs/spec A specification is needed in order to proceed team/atlas Team Atlas topic/monitoring topic/release-engineering

Comments

@gianfranco-l
Copy link

gianfranco-l commented Mar 24, 2022

From a customer request:

In the context of Management cluster auth via Dex against Okta , the customer requested:

Need logs visibility for Flux (as an admin)

  • kubectl logs -n flux-system kustomize-controller-5bd7f98f56-frkd
  • TODO: check whether this is possible
@gianfranco-l
Copy link
Author

@marians @pipo02mix if some context is missing, please addi it :)
cc @giantswarm/team-honeybadger @MarcelMue

@kubasobon
Copy link

I'm quite strongly against allowing the (unfiltered) log access. It's a managed solution - customers should not have access to anything in flux-system by principle.
I'd rather us expose the Flux-related Prometheus metrics, preferably through an exporter which grants us ability to filter them, than give direct access to logs.

@marians
Copy link
Member

marians commented Mar 25, 2022

(1) and (2) are very much separate requests. I suggest to split them. (2) is quite customer specific, so potentially worth discussing in a non-public repo.

@gianfranco-l gianfranco-l changed the title Management cluster auth via Dex against Okta in GitOps context Management cluster auth via Dex against Okta in GitOps context: flux logs visibility Mar 25, 2022
@pipo02mix
Copy link
Contributor

IMO makes sense to give them view access for the flux that managed the apps/cluster they install (not the one manage our components). If we put that in a dashboard is even better

@piontec
Copy link

piontec commented Mar 25, 2022

The problem is that even if we figure out the permissions problem reasonably, there's no way we can do org separation in logs. I really don't see much value in here, as really flux's CR statuses are pretty verbose. Do you have as specific use case, where logs were really needed to debug a problem?

@marians
Copy link
Member

marians commented Mar 25, 2022

To my understanding, org separation is not an issue here. This is about access for customer admins, who are bound to the cluster-admin clusterrole currently via the customer's default admin group (in this case customer:rg_okta_giantswarm-admin).

@gianfranco-l
Copy link
Author

gianfranco-l commented Mar 28, 2022

we will bring this to SIG product to discuss if we shall provide access or not to logs for apps on the MC to the customers

@gianfranco-l gianfranco-l changed the title Management cluster auth via Dex against Okta in GitOps context: flux logs visibility Flux (and managed apps) logs visibility Mar 28, 2022
@gianfranco-l gianfranco-l changed the title Flux (and managed apps) logs visibility Flux (and managed apps) logs visibility to customers Mar 28, 2022
@gianfranco-l
Copy link
Author

What do you think of this @JosephSalisbury @weatherhog @puja108 ?

@JosephSalisbury
Copy link
Contributor

I reckon this can be done with #311, sure

up to @weatherhog for priority, but iirc it's soon

@puja108
Copy link
Member

puja108 commented Mar 31, 2022

From reading the above it is still unclear to me what the customer requested (what logs do they want to see exactly? which Flux? what resources are they interested in?) and more importantly what their use case for seeing those logs is, i.e. what are they trying to achieve, how is it failing, why do they need log access to debug that,.... Only when the use case is clear can we realistically evaluate what makes sense going forward and if log access in any form would really solve it and/or what alternative we'd rather offer.

@pipo02mix
Copy link
Contributor

the use case would be

As a cluster admin, I would like to debug why some (managed) apps are not installed in my workload cluster

I am not sure if they need logs or status show enough. But IMO getting logs from flux is not something will hurt us

@marians
Copy link
Member

marians commented Apr 1, 2022

Regarding which logs the customer wanted to see, that info is very detailed: those of the kustomize-controller-* pod in the flux-system namespace.

@JosephSalisbury JosephSalisbury added the team/honeybadger Team Honey Badger label Apr 4, 2022
@gianfranco-l gianfranco-l changed the title Flux (and managed apps) logs visibility to customers Flux (and apps) in MC logs visibility to customers Apr 4, 2022
@JosephSalisbury
Copy link
Contributor

Discussion in SIG Product Sync:

  • Customers should be able to access logs for components in MC
  • Pinging @weatherhog & @JosephSalisbury, this is dependent on Atlas' larger logging story
  • We want to be aware that access to logging (as with metrics) is only available to an admin group (not all users)
  • @gianfranco-l / @PrashantR31 it might be most straightforwards to work out who should give access to the customer, and then create an appropriate Role / RoleBinding, to get the customer issue sorted - it's not a product solution, but solves for the customer until the product solution is sorted
  • currently, we don't store these logs

@teemow
Copy link
Member

teemow commented Apr 4, 2022

From a product point of view:

Admins on the customer side should have access to all logs on the management cluster. We want to be transparent. Admins should be able to understand what is going on. They should also be able to debug their own actions or the managment cluster misbehaving.

So an admin role should have the permission to access logs of all pods.

@ghost
Copy link

ghost commented Apr 4, 2022

@giantswarm/team-atlas , tangent question: does our Loki support SSO ?

@weatherhog
Copy link

@PrashantR31 Loki itself does not provide any authentication mechanisms. So in the End this a no to SSO support. If authorisation is needed for loki this is mainly dependent on the customers preferred way. It could be a combination out of loki and oauth2-proxy or it also could be loki saas.

@ghost
Copy link

ghost commented Apr 4, 2022

Thanks @weatherhog , just too bad Loki does not have SSO support

But what I was thinking that:

  1. If Loki can be installed on the MC's where Flux is installed and all the logs will go there for admin access only.
  2. And if Grafana is on the MC we could provide just a few basic dashboards views for the customer, or even use the Grafana cloud (if this is possible)
  3. Extending that, if we add alerting on logs that could give a complete observability solution for the customer Admin users.
  4. Grafana has SSO capabilities, so we could allow only allow the admin group to certain dashboards.
    These are some thoughts and do not know how much in that is feasible or is in line of what we are providing.

@teemow
Copy link
Member

teemow commented Apr 4, 2022

Fyi @PrashantR31 Atlas already has a story about what you described #311

@ghost ghost added the needs/refinement Needs refinement in order to be actionable label Apr 4, 2022
@ghost
Copy link

ghost commented Apr 8, 2022

Action from refinement in Rainbow :

  • Rainbow create a detailed specs so as to help HoneyBadger to create the Role and Role-binding to the default Admin group in flux app via config.
  • Handover to @giantswarm/team-honeybadger for implementation

@paurosello
Copy link

Yeah, that is it, a new ClusterRole. Not sure we can limit by pods inside a namespace though.

Not sure how we can evaluate what logs come of every operator but I would assume no credentials are leaked at all, maybe account IDs and other infra resources names but I think those are also available in the CRs. In any case will try to find out if we can limit to specific pods and then we add as needed.

@piontec
Copy link

piontec commented Sep 22, 2022

If we leak no credentials, I think we're good - that was my only concern 👍

@teemow
Copy link
Member

teemow commented Sep 24, 2022

Imo your last comment @piontec should be our only concern. The "why" isn't the problem. I think @paurosello described it well and imo customers should have access to all logs on the management cluster. There will also be different roles in the future. But next to central logging we need to have native logging (kubectl logs) as well.

At the moment I would see four use cases and we might already separate them

  • Flux logs
  • CAPI controllers logs
  • Organization namespace (eg customer controllers)
  • All MC logs

Going one by one component is a first option. And then we need a way to become confident to create a role that opens up all logs.

@piontec @gianfranco-l depending on the amount of work that is necessary, can you check if you can enable flux logs so this story becomes unblocked?

@TheoBrigitte @weatherhog can you take the general story around access to MC logs into Atlas and think about it more holistically? Afaik you already have a story about letting customers monitor their own controllers on the management clusters too.

@QuentinBisson
Copy link

@teemow We will provide access to the logs to customers once we have a logging solution up an running (one central Loki).

The main issues will arise from how we want to do multi-teancy but this is in refinement.

@teemow
Copy link
Member

teemow commented Sep 26, 2022

@QuentinBisson what is the reason to wait for central logging? As I mentioned I think customers should be able to use a simple kubectl logs for all components in the MC too.

@QuentinBisson
Copy link

QuentinBisson commented Sep 26, 2022

I was answering about the log story regarding atlas and after rereading, I understand how it was misinterpreted 😅 .
I agree customers should be able to access the logs using kubectl logs on the MC anyway.

One interesting thing for us would be to ask customers how they are using MC logs as it could be useful for use once we have loki up and running

@teemow
Copy link
Member

teemow commented Sep 26, 2022

I agree customers should be able to access the logs using kubectl logs on the MC anyway.

My question was if Atlas can take this too. Somebody needs to think about the general concept of how we structure access to logs on the management cluster.

@piontec
Copy link

piontec commented Sep 26, 2022

Flux has a strict set of pods, so targeting them with RBAC shouldn't be that hard. @paurosello , have you already started, do you have something you can share?

@gianfranco-l , as we need to prioritize this, I'm putting it back into our Inbox.

@teemow Generally speaking, we can "just" grant the (Cluster)Role access to pods/log everywhere, but I really don't feel comfortable doing that - personally, I have no idea what we might leak in the logs. Is there a reasonable (time, effort) way to check this? Or are we sure that operators we run (ours, external, upstream) never log anything like access tokens?

@QuentinBisson
Copy link

QuentinBisson commented Sep 26, 2022

I agree customers should be able to access the logs using kubectl logs on the MC anyway.

My question was if Atlas can take this too. Somebody needs to think about the general concept of how we structure access to logs on the management cluster.

Yes we can create the cluster role with potential restrictions if we think it's needed but the binding part from customer to role should be elsewhere (i.e. done by another team).

@giantswarm/team-honeybadger how do you currently create cluster roles at the MC level so we could do the same?

@gianfranco-l gianfranco-l moved this from Future (> 6 months) to Near Term (1-3 months) in Roadmap Sep 26, 2022
@teemow
Copy link
Member

teemow commented Sep 26, 2022

@QuentinBisson my point was that Atlas should think about whether and which roles are needed and takes full ownership of how we allow customers to access logs on management clusters 🙏

@piontec as mentioned above (#948 (comment)) we need to assess and build up confidence. A simple role to access all pods/log is not what I want. But I would leave the assessment and the structure of the roles to Atlas. I only asked if Honey Badger can unblock this by adding a role for flux.

@gianfranco-l gianfranco-l assigned mproffitt and unassigned marians Sep 27, 2022
Repository owner moved this from Near Term (1-3 months) to Released in Roadmap Oct 5, 2022
@kubasobon kubasobon reopened this Oct 5, 2022
@mproffitt
Copy link

A new role has been introduced int he flux-system namespace with RoleBinding to default:automation and the customer cluster admin group.

This has been tested by Adam from THG and provides him with the log access he requires for Flux.

@gianfranco-l
Copy link
Author

gianfranco-l commented Oct 5, 2022

added Atlas label and in Atlas board so that they can work on their part of the scope as outlined here

@teemow
Copy link
Member

teemow commented Oct 13, 2022

@gianfranco-l @mproffitt we have another request to give access to app-operator logs to debug the installation of an app (ArgoCD) on the management cluster. Can @mproffitt enable this similarly to how he did this for flux?

@TheoBrigitte as you can see this is getting more and more important. Imo customers would also like to inspect logs of capi controllers soon. So a more general set of permissions that allow customers to read the logs on the management cluster would be good. And I don't mean yet via loki. Just via kubectl logs.

@mproffitt
Copy link

@teemow Exposing the logs from pods within the giantswarm namespace isn't a problem and I already had an internal ticket for this at https://github.com/giantswarm/giantswarm/issues/24033 although it is not possible to restrict to just the affected pods ((app | chart)-operator-*) - customers will gain access to all logs inside that namespace and we have to question if that is something we want to allow, or if there are valid reasons we shouldn't be allowing access to all logs inside that namespace

IMHO requests seem to be leaning towards granting more and more privileges to things we own on the management clusters and this is something I consider to be potentially dangerous. We need to consider how we can best protect our resources whilst enabling the customer to achieve their own product vision.

@teemow
Copy link
Member

teemow commented Oct 14, 2022

I hear your worries. But for me the benefits and our general philosophy around shared responsibility, knowledge sharing and providing a transparent platform and not a blackbox to the customers outweights this imo. This enables customers to solve their own problems much quicker and not having to wait for our support to respond. This is good for the customers and good for us.

@piontec
Copy link

piontec commented Oct 18, 2022

But to be able to do that, we have to re-work and fix part of our deployment configs, namely secrets shared across all the installations. If we don't do that, we risk one customer finding out secrets to other installations - I still find this unacceptable.

On the other hand: we talked within the team and we think keeping customers from accessing everything on the MC doesn't make sense anymore. It's too hard too check if every possible escalation path is secured. It's super time-consuming to check every custom operator customers want to run against RBAC policies and what they can access. I think we should separate access credentials between installations "soon" and then throw the towel and give up: just give customers MC admin permissions, with them accepting the associated risk. WDYT?

@teemow
Copy link
Member

teemow commented Oct 18, 2022

Why does giving access to logs give access to secrets? Can you explain this a bit better?

I disagree that it is a "throw a towel" situation. We need to think about what is important. And UX and security need to go hand in hand. This is a challenge and yes it needs thorough investigation. Just simplifying the role of the customer to admin permissions is not what we should do imo. This would leave all the work and mistakes to the customer. Yes we maybe want to give admin permissions in general. But we also need to make sure to have well defined roles so it is easy for the customer to follow a least privilege principle for both users and applications.

@piontec
Copy link

piontec commented Oct 19, 2022

I agree about well-defined roles. What I mean that is hard or maybe even impossible for us is to judge if our MCs are still secure giving the following facts:

  • we allow them to directly use our components, like app-operator and chart-operator, which run on admin premissions
  • we're introducing more and more permissions for customers:
    • logs access
    • custome operator roles
  • incoming crossplane
  • plus everything we already have on MC.

What I want to say is that it is already super hard to tell if the sum of all mentioned above is secure or not. With each custom role, we make it even more complex and we increase the probability that an attack vector exists that allows you to effectively get cluster-admin as a customer anyway.

As such, maybe we shouldn't protect our Secrets at all cost, but instead make sure that secrets we get there are safe to read by s customer. Then, we can tell the customers "it's OK to run this or that with cluster-admin, but please be aware of consequences; still, if you really want to, you can". And for us, since the customer already has potential admin access, we don't have to make sure that it can't read something from the giantswarm namespace, but only that it's not "wildely" possible for no reason.

@teemow
Copy link
Member

teemow commented Oct 20, 2022

💯. We aren't there yet. First step is to work on our secret management and second is to write more documentation about the management cluster structure etc. Customers need to fully understand how the management cluster works before they do something with admin permissions. And our goal should still be to define proper roles so that admin permissions for the customers are only necessary in very specific (and currently unsupported) use-cases.

This would enable customers and they aren't blocked by us. Less pressure on our teams. But it also adds responsibility to the customer as they can break more things. We need to make this clear and also build a safety net.

But I like this as a goal to work towards. @puja108 wdyt? I'll add a note to our board so that we can create a proper epic about it in horizon first.

@puja108
Copy link
Member

puja108 commented Oct 20, 2022

Yeah, I'm also evaluating and talking to upbound and Redhat around kcp towards such use cases. It could enable better interfaces and tighter security within our MCs. Some of these topics also go towards a possible MC team within KaaS to improve architecture of MCs. Definitely a lot of sub-stories to it, but let's start with the bigger picture in Horizon for now.

@piontec
Copy link

piontec commented Apr 11, 2023

Closing, as it's done.

@piontec piontec closed this as completed Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/dex component/flux feature-request needs/refinement Needs refinement in order to be actionable needs/spec A specification is needed in order to proceed team/atlas Team Atlas topic/monitoring topic/release-engineering
Projects
Archived in project
Development

No branches or pull requests