-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added checks for HCO webhook components and improve status handling #74
Added checks for HCO webhook components and improve status handling #74
Conversation
geetikakay
commented
Dec 2, 2024
- Introduced retries and checks for the hyperconverged-cluster-webhook pod to ensure it is running before proceeding.
- Added check for the hco-webhook-service to confirm its existence and to avoid webook related errors.
- Improved handling of HyperConverged status by adding checks for resource presence and defined status field.
I have tested the changes in one version so it can be reviewd.I need to try it on different cnv versions to make sure it works everywhere. This ensures that we never fail in between. |
6b3a5bc
to
dc2cca0
Compare
@newkit @sean-freeman Hello, could you please help in reviewing this patch |
register: hco_webhook_pod | ||
retries: 5 | ||
delay: 60 | ||
until: hco_webhook_pod.resources | selectattr('status.phase', 'equalto', 'Running') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this wait for all pods to be Running? IIUC this will already return once the first pod is running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default there is one pod that gets created with label hyperconverged-cluster-webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, it looks unclean to me because it is a list and there might be more Pods to wait for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0xFelix if we consider multiple pods , maybe we will hit this bug ansible-collections/kubernetes.core#697
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rewrite it to not use wait_condition
but to respect all potential list items then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@@ -91,12 +112,12 @@ | |||
namespace: "{{ sap_hypervisor_node_preconfigure_ocpv_namespace }}" | |||
register: hyperconverged_status | |||
until: > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can wait_condition
be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test it and change
93b4a49
to
d1ef4ae
Compare
- Introduced retries and checks for the hyperconverged-cluster-webhook pod to ensure it is running before proceeding. - Added check for the hco-webhook-service to confirm its existence and to avoid webook related errors. - Improved handling of HyperConverged status by adding checks for resource presence and defined status field. Signed-off-by: Geetika Kapoor <[email protected]>
d1ef4ae
to
a6dc20d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm