-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix organization order when requeuing #197
Conversation
There's something I don't understand : the way it's done currently should be the following : when the operator reconciles a grafana organization CR it also checks whether it has an orgID and if it does and this ID do not match any existing org in Grafana it just creates the org and updates the CR with the newly created org's ID right ? This happens whenever grafana org CRs are present and grafana itself is restarted. So I don't see the importance of reconciling those in order. |
So the issue is not when it creates but when it updates the orgs. Imagine you have 2 orgs, giantswarm and testorg If you create an org, say giantswarm, you create it in grafana and get org_id = 2, then grafana restarts (because user values is updated and org mapping is set). Thé operator sees this and tries to recreate orgs in wathever order it wants so it could create testorg. Grafana retiens org_id=2 because it's the first org after a restarts. Then the operator reconciles the giantswarm org, but this org has an org id of 2 in the status so it actually updates testorg and rename it giantswarm and the both have org_id 2 in the status |
I share Zirko's confusion 😅
So, the problem is only in the status? Maybe we don't need to show the org ID in the status then? |
how do you expect to push dashboards to the correct org without the org id? |
Maybe we should consider we can't rely on orgids, and always FindByName to retrieve the org's current ID? |
So we cannot rely on it because we do not use persistence :D The issue with relying on FindByName is that you have no way to know if a user did not rename the org in Grafana by hand or updated the display name in the CR |
Whether we rely on the orgid, the CR's name or the display name, I think these will still be an issue. |
With persistence, no because then org_id will be static |
The current issue is that we have to restart grafana and recreate the orgs. We need the display name for the org mapping so we need it to be accurate and we need org id for dashboards and so on |
All right, then we accept some "strangeness" in the operator's behaviour and plan on future persistence to stabilize it all. This also means we accept users may change org's names on the Web UI, so we can't rely on org names? |
I would rather we do not have any strangeness to be honest but there's no better way imo or we well, we could force a restart of all our grafanas at say midnight and recreate all orgs but is that not stranger? Maybe we can discuss it live today The thing is, you cannot prevent users from changing the org name in the UI unless they are not admin, but then they cannot create datasources |
|
What happens when an org gets deleted? I guess they are "defragmented"? ie we have these orgs:
And we delete org-b, when grafana restarts this results in:
...I guess that's ok since no org uses an already-existing org ID. |
That's something worth trying out but yes we do not expect ids to remain the same until persistence is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, let's do it this way! Then we'll see if we get some more strange corner cases!
From what I understand the Oauth2 settings are provisioned via a configmap to Grafana which does requires a restart each time we need to update them. I just found out that there is an API in Grafana which allows just for that https://grafana.com/docs/grafana/latest/developers/http_api/sso-settings/#update-sso-settings, so we could configure the Oauth2 org mapping directly via the API. |
Oh it should bé really recent because a month ago it only supported saml. If that works that would bé awesome. We would still need the ordering fix in case a of pod/node rollout but it's less likely to happen |
What this PR does / why we need it
This PR ensures we requeue organizations in the correct order to avoid having multiple organizations with the same org-id
What I think happens is that we create some organizations and they are created in grafana like so:
giantswarm with orgId 2
testorg with orgId 3
This restarts grafana of course so the organizations are enqueued and reconciled one by one.
The issue here is that this order is undeterministic as it rely on the controller-runtime cache that can return things in whatever order.
The best solution imo is to make sure we enqueue organizations in the proper order of their status orgID.
Another possible solution is to remove the org id from all orgs status when we know we restart grafana but that looks a it too cumbersome.
The only downside I see to this is that we will have to fix existing issues manually but this should only be the case on glean and golem so I think we should be fine.
Checklist