Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix organization order when requeuing #197

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

QuentinBisson
Copy link
Contributor

What this PR does / why we need it

This PR ensures we requeue organizations in the correct order to avoid having multiple organizations with the same org-id

What I think happens is that we create some organizations and they are created in grafana like so:

giantswarm with orgId 2
testorg with orgId 3

This restarts grafana of course so the organizations are enqueued and reconciled one by one.

The issue here is that this order is undeterministic as it rely on the controller-runtime cache that can return things in whatever order.
The best solution imo is to make sure we enqueue organizations in the proper order of their status orgID.

Another possible solution is to remove the org id from all orgs status when we know we restart grafana but that looks a it too cumbersome.

The only downside I see to this is that we will have to fix existing issues manually but this should only be the case on glean and golem so I think we should be fine.

Checklist

  • Update changelog in CHANGELOG.md.
  • Follow deployment test procedure in the tests/manual_e2e directory and have a working branch.

@QuentinBisson QuentinBisson self-assigned this Dec 11, 2024
@QuentinBisson QuentinBisson requested a review from a team as a code owner December 11, 2024 14:34
@QuentinBisson QuentinBisson changed the title fix organization reordering fix organization order when requeuing Dec 11, 2024
@QuantumEnigmaa
Copy link
Contributor

There's something I don't understand : the way it's done currently should be the following : when the operator reconciles a grafana organization CR it also checks whether it has an orgID and if it does and this ID do not match any existing org in Grafana it just creates the org and updates the CR with the newly created org's ID right ? This happens whenever grafana org CRs are present and grafana itself is restarted.

So I don't see the importance of reconciling those in order.

@QuentinBisson
Copy link
Contributor Author

So the issue is not when it creates but when it updates the orgs.

Imagine you have 2 orgs, giantswarm and testorg

If you create an org, say giantswarm, you create it in grafana and get org_id = 2, then grafana restarts (because user values is updated and org mapping is set).

Thé operator sees this and tries to recreate orgs in wathever order it wants so it could create testorg. Grafana retiens org_id=2 because it's the first org after a restarts.

Then the operator reconciles the giantswarm org, but this org has an org id of 2 in the status so it actually updates testorg and rename it giantswarm and the both have org_id 2 in the status

@hervenicol
Copy link
Contributor

I share Zirko's confusion 😅

Then the operator reconciles the giantswarm org, but this org has an org id of 2 in the status so it actually updates testorg and rename it giantswarm and the both have org_id 2 in the status

So, the problem is only in the status? Maybe we don't need to show the org ID in the status then?
What do we need the OrgID for?

@QuentinBisson
Copy link
Contributor Author

how do you expect to push dashboards to the correct org without the org id?

@hervenicol
Copy link
Contributor

Maybe we should consider we can't rely on orgids, and always FindByName to retrieve the org's current ID?

@QuentinBisson
Copy link
Contributor Author

So we cannot rely on it because we do not use persistence :D

The issue with relying on FindByName is that you have no way to know if a user did not rename the org in Grafana by hand or updated the display name in the CR

@hervenicol
Copy link
Contributor

Whether we rely on the orgid, the CR's name or the display name, I think these will still be an issue.

@QuentinBisson
Copy link
Contributor Author

With persistence, no because then org_id will be static

@QuentinBisson
Copy link
Contributor Author

The current issue is that we have to restart grafana and recreate the orgs.

We need the display name for the org mapping so we need it to be accurate and we need org id for dashboards and so on

@hervenicol
Copy link
Contributor

All right, then we accept some "strangeness" in the operator's behaviour and plan on future persistence to stabilize it all.

This also means we accept users may change org's names on the Web UI, so we can't rely on org names?
Which means our annotations to load datasources and dashboards in a specific org should mention the CR's name?
Which then means we don't support loading them in web-ui-created orgs?

@QuentinBisson
Copy link
Contributor Author

I would rather we do not have any strangeness to be honest but there's no better way imo or we well, we could force a restart of all our grafanas at say midnight and recreate all orgs but is that not stranger? Maybe we can discuss it live today

The thing is, you cannot prevent users from changing the org name in the UI unless they are not admin, but then they cannot create datasources

@QuentinBisson
Copy link
Contributor Author

Which means our annotations to load datasources and dashboards in a specific org should mention the CR's name?
Yes this makes more sense anyway

Which then means we don't support loading them in web-ui-created orgs?
We do not allow anyone that feature

@hervenicol
Copy link
Contributor

What happens when an org gets deleted? I guess they are "defragmented"?

ie we have these orgs:

  • org-a: ID 2
  • org-b: ID 3
  • org-c: ID 4

And we delete org-b, when grafana restarts this results in:

  • org-a: ID 2
  • org-c: ID 3

...I guess that's ok since no org uses an already-existing org ID.
But we must remember we don't expect IDs to remain the same!

@QuentinBisson
Copy link
Contributor Author

That's something worth trying out but yes we do not expect ids to remain the same until persistence is enabled

Copy link
Contributor

@hervenicol hervenicol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, let's do it this way! Then we'll see if we get some more strange corner cases!

@QuentinBisson QuentinBisson merged commit b094eea into main Dec 12, 2024
9 checks passed
@QuentinBisson QuentinBisson deleted the fix-organization-reordering branch December 12, 2024 12:37
@TheoBrigitte
Copy link
Member

From what I understand the Oauth2 settings are provisioned via a configmap to Grafana which does requires a restart each time we need to update them.

I just found out that there is an API in Grafana which allows just for that https://grafana.com/docs/grafana/latest/developers/http_api/sso-settings/#update-sso-settings, so we could configure the Oauth2 org mapping directly via the API.

@QuentinBisson
Copy link
Contributor Author

Oh it should bé really recent because a month ago it only supported saml. If that works that would bé awesome. We would still need the ordering fix in case a of pod/node rollout but it's less likely to happen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants