-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://issues.redhat.com/browse/ACM-15319 CIM regression KI #7296
Conversation
=== Nodes shut down after removing `BareMetalHost` resource | ||
//2.12:ACM-15319 | ||
|
||
If you remove the `BareMetalHost` resource from a managed cluster, the nodes shut down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the BMH in the hub cluster, right?
cc @trewest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. This issue occurs when you remove the relevant spoke BMH from the hub cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that, updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carbonin @trewest @oafischer -- for a simple known issue like this, we still need to ask ourselves what the user needs to know and when so that they don't have to stop and think: What did I miss?
- There is no information about how to get out of this.
- If there is no way out, we really need to say at some point in the doc:
Do not remove the BareMetalHost
resource.
If you already say that in the main documentation and they are still doing it, we need to make it clearer there and not here in the known issues.
- Let's say only a few users would do this and get caught up, but that most are no impacted and that is why we choose to doc it here. I think we still need to say something more:
You must reinstall the resource to get your nodes to run....
You can manually restart the nodes by....
Something to tell them how to get out of it or what the next step is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug so ideally we would allow the user to remove that BMH and not have the node shut down so I don't think it makes sense for us to tell them not to remove the BMH in the main doc.
The user shouldn't need to reinstall anything to get the node back, just powering it back on will do the job, but we don't know anything about how they do power management outside of the BMO integration. But if you think it's valuable to say "power the node back on" then I think it's fine to put that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should also prevent a bug; prevent a customer call. We also don't want to make the user have to stop and think what they did wrong or how they start over. @oafischer let me know what you decide to add here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in meeting, I'll add the recovery step of powering the node back on, so that we don't have to remember to remove any notes in the main doc once this issue is fixed.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: carbonin, oafischer, swopebe The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Co-authored-by: swope <[email protected]>
New changes are detected. LGTM label has been removed. |
2.12 only