You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded
Steps to Reproduce
operator v0.107.0 was deployed using helm chart, to be migrated to OLM deployment and upgraded to v0.113.0.
Also, the default-auto-instrumentation-java-image version should be upgraded.
When v0.113.0 was deployed after removing v0.107.0, the mutate webhook failed during the upgrade of Instrumentation because the service was not ready.
As a result, auto-instrumentation-java-image was not upgraded.
I don't think controller-runtime lets us specify dependencies between runnables, so it won't help us in this case. It does sound like we have a race condition at startup, where the upgraders depend on webhooks to run, but we don't guarantee that the webhooks are started first.
Maybe the solution is to just run the upgraders periodically forever? That they're only run at startup doesn't make that much sense to me. @pavolloffay@jaronoff97 wdyt?
I thought we run the upgraders on the reconcile loop which should be called every 15mins?
On second look, we don't do this on the webhook or otherwise, which seems wrong so yeah i think we should run them just with the reconcile loop most likely.
I thought we run the upgraders on the reconcile loop which should be called every 15mins?
On second look, we don't do this on the webhook or otherwise, which seems wrong so yeah i think we should run them just with the reconcile loop most likely.
Upgraders are just runnables added to the controller runtime manager, and right now they run once at operator startup. When they should run exactly isn't entirely clear to me. I don't like the idea of running them during reconciliation, because reconciliation already does a lot, and it not modifying the CR (other than the status) makes reasoning about it much simpler. Running it in the webhook is safe, but arguably not that useful. It makes most sense to me to upgrade any eligible CR as soon as possible, and that's most in-line with how the feature works right now, but is also quite invasive.
Component(s)
auto-instrumentation
What happened?
Description
Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded
Steps to Reproduce
operator v0.107.0 was deployed using helm chart, to be migrated to OLM deployment and upgraded to v0.113.0.
Also, the default-auto-instrumentation-java-image version should be upgraded.
When v0.113.0 was deployed after removing v0.107.0, the mutate webhook failed during the upgrade of Instrumentation because the service was not ready.
As a result, auto-instrumentation-java-image was not upgraded.
Expected Result
Actual Result
Kubernetes Version
v1.29.2
Operator version
v0.113.0
Collector version
v0.108.1
Environment information
No response
Log output
Additional context
Should I add retry logic to upgrade Instrumentation?
The text was updated successfully, but these errors were encountered: