-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster: Remove all the lease from cluster #3886
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -497,13 +497,16 @@ func DeleteMCOLeaderLease(ctx context.Context, ocConfig oc.Config) error { | |
if err := WaitForOpenshiftResource(ctx, ocConfig, "configmap"); err != nil { | ||
return err | ||
} | ||
if _, _, err := ocConfig.RunOcCommand("delete", "-n", "openshift-machine-config-operator", "configmap", "machine-config-controller"); err != nil { | ||
return err | ||
|
||
if _, stderr, err := ocConfig.RunOcCommand("delete", "-n", "openshift-machine-config-operator", "configmap", "machine-config-controller"); err != nil { | ||
if !strings.Contains(stderr, "\"machine-config-controller\" not found") { | ||
return err | ||
} | ||
} | ||
// https://issues.redhat.com/browse/OCPBUGS-7583 as workaround | ||
if err := WaitForOpenshiftResource(ctx, ocConfig, "lease"); err != nil { | ||
return err | ||
} | ||
_, _, err := ocConfig.RunOcCommand("delete", "-n", "openshift-machine-config-operator", "lease", "--all") | ||
_, _, err := ocConfig.RunOcCommand("delete", "-A", "lease", "--all") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
were you able to find out what extra lease(s) have to be removed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It remove all the leases in all the namespaces, I didn't count but lease creation is not time consuming. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But we can't know that removing all the leases is 100% harmless now, and/or won't cause issues in the future, can we? Shouldn't we be more specific in what we remove? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
kubernetes lease object lock shared resources and coordinate activity between members of a set (https://kubernetes.io/docs/concepts/architecture/leases/), I am hoping the lease objects would be created as soon it deleted due to cluster reconcile.
I tried to remove mco specific leases but that didn't work at all. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
sounds like we don't know for sure this won't cause any problems?
Since removing all the leases help, there must be a set of leases which fix the problem when you remove them? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the leases which are being removed by the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We don't know for sure if it is harmless, and we are not fully sure what fixes/avoid the problem we are seeing. I don't feel really confident that this won't be causing unrelated issues, which will cost more time to debug/fix than spending a bit more time now to at least only remove the necessary leases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, we are not sure but we also don't know from where to debug and if we want to go with OCP-4.14.x in this release then I don't think any alternative for time being. Option is either allocate more time to this and again have some conversation with internal team (if they willing to help us to debug) or have separate issue to track and put effort when have time without blocking current release. @crc-org/crc-team wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should at least be doable without impacting our release schedule too much to not use |
||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
oc
make a difference between "not found" and "network error" (for example) through the error code it returns?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same error code