Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverts deprecated ClusterClaims for odf-info #2686

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

raaizik
Copy link
Contributor

@raaizik raaizik commented Jul 3, 2024

Changes

TODO

  • Simplify by moving ClusterClaim to be reconciled in OCSInitialization
  • Delete the cluster claim once the odf-info CM is removed

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2024
Copy link
Contributor

openshift-ci bot commented Jul 3, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@raaizik raaizik force-pushed the fix_odfinfo branch 2 times, most recently from 3f248ba to b29856a Compare July 4, 2024 10:16
@raaizik raaizik marked this pull request as ready for review July 4, 2024 10:17
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 4, 2024
@raaizik
Copy link
Contributor Author

raaizik commented Jul 4, 2024

/cc @rewantsoni @umangachapagain

Comment on lines 87 to 91
client, err := clusterclientv1alpha1.NewForConfig(kubeconfig)
if err != nil {
creator.Logger.Error(err, "Failed to create ClusterClaim client.")
return reconcile.Result{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really a need to create a new non-cached client? Why can't we use the client which is already installed on the reconciler object?

As a general note, we should only use non-cached clients in cases where there is underlying reason to do so.

controllers/ocsinitialization/clusterclaims.go Outdated Show resolved Hide resolved
Copy link
Member

@vbnrh vbnrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

Copy link
Member

@vbnrh vbnrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add another claim which will indicate which clusters are client and provider for the UI ?

@raaizik
Copy link
Contributor Author

raaizik commented Jul 9, 2024

Can we add another claim which will indicate which clusters are client and provider for the UI ?

@vbnrh This would break the concept of the odf-info CM which is supposed to be a replacement for cluster claims (other than the one that's added here, which I'd classify as the CM's metadata). What is an option is to add this to the CM itself.

@raaizik
Copy link
Contributor Author

raaizik commented Jul 9, 2024

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

@vbnrh The cluster claims used to be reconciled via storage cluster because they contained information about it. This is no longer the case, plus the cluster claim needs to be coupled with the operator's lifecycle where it is created once and removed manually (since there's no uninstall for OCSInit).

@vbnrh
Copy link
Member

vbnrh commented Jul 10, 2024

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

@vbnrh The cluster claims used to be reconciled via storage cluster because they contained information about it. This is no longer the case, plus the cluster claim needs to be coupled with the operator's lifecycle where it is created once and removed manually (since there's no uninstall for OCSInit).

If the StorageCluster is uninstalled at a later point of time, the CM will still be present on the hub. How are we going to indicate that cluster is not available for RDR on hub ? Are we planning or have we added the StorageCluster's status on the CM ?

@raaizik
Copy link
Contributor Author

raaizik commented Jul 10, 2024

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

@vbnrh The cluster claims used to be reconciled via storage cluster because they contained information about it. This is no longer the case, plus the cluster claim needs to be coupled with the operator's lifecycle where it is created once and removed manually (since there's no uninstall for OCSInit).

If the StorageCluster is uninstalled at a later point of time, the CM will still be present on the hub. How are we going to indicate that cluster is not available for RDR on hub ? Are we planning or have we added the StorageCluster's status on the CM ?

@vbnrh Whenever a StorageCluster is uninstalled, its respective key in the CM gets removed. If that key was the last one to get uninstalled, then the whole CM gets removed.

@nb-ohad
Copy link
Contributor

nb-ohad commented Jul 11, 2024

@vbnrh The responsibility of the claim is to broadcast (to ACM) the existence of the configmap not the existence of a specific storage cluster. The existence, state, and info of the various storage clusters can be found as part of the config map data (which will be available on the ACM hub)

As @raaizik already mentioned, when a storage cluster is deleted the config map is updated to remove all information related to that storage cluster from the config map, and that change will be synced back to the hub

So as you know, clients are not standalone entities, and the information about them is specified under the storage cluster that supports them. We don't need to have a separate claim for them.

@raaizik raaizik requested a review from nb-ohad July 11, 2024 13:34
@raaizik
Copy link
Contributor Author

raaizik commented Jul 11, 2024

/retest

controllers/ocsinitialization/clusterclaims.go Outdated Show resolved Hide resolved
Comment on lines 61 to 68
creator := ClusterClaimCreator{
Logger: r.Log,
Context: ctx,
Client: r.Client,
OcsInit: instance,
}

operatorNamespace, err := creator.getOperatorNamespace()
if err != nil {
r.Log.Error(err, "failed to get operator's namespace. retrying again")
return reconcile.Result{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need of this. Simply fetch the operator namespace and proceed. This pattern was used earlier when there were a lot of things to fetch and create.

Comment on lines 267 to 282
clusterClaimCrdPredicate := predicate.Funcs{
CreateFunc: func(e event.TypedCreateEvent[client.Object]) bool {
panic("ClusterClaim CRD was found. Restarting pod to initiate creation")
},
DeleteFunc: func(e event.TypedDeleteEvent[client.Object]) bool {
panic("ClusterClaim CRD was found. Restarting pod to initiate deletion")
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid setting panic to restart the process as much as possible. There are ways to set watches dynamically, we should try those first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain They don't work because they do not re-popolate the cache. The only reliable way we found was to panic out.

@raaizik I would not use panic, I will use os.exit with a specific code. That way we can warp the operator invocation with a small bash command that invokes the operator process if it sees that particular exit code.
Doing it that way will prevent unnecessary pos restarts

@@ -74,6 +77,7 @@ type OCSInitializationReconciler struct {
// +kubebuilder:rbac:groups="monitoring.coreos.com",resources={alertmanagers,prometheuses},verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups="monitoring.coreos.com",resources=servicemonitors,verbs=get;list;watch;update;patch;create;delete
// +kubebuilder:rbac:groups=operators.coreos.com,resources=clusterserviceversions,verbs=get;list;watch;delete;update;patch
// +kubebuilder:rbac:groups=cluster.open-cluster-management.io,resources=clusterclaims,verbs=get;list;watch;create;update;delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not deleting clusterclaims, can we remove the rbac for it?

Comment on lines 176 to 188
clusterClaim := &ocsClusterClaim{}
reconcileResult, err := clusterClaim.ensureCreated(r, instance)
if err != nil {
r.Log.Error(err, "Failed to ensure odf-info namespacedname ClusterClaim")
return reconcileResult, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use this pattern. We can just create a "ensureClusteClaimExists" function and call it directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is to make sure it exists, we can have an IsClusterClaimExist function that returns a bool.
Using an error to indicate a valid result (does not exists) is a semantic mistake

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsClusterClaimExist is a check that returns True or False. What we want is to take action to EnsureClusterClaimExists and return an error if failed. That is semantically correct.

func ensureClusterClaimExists() error {
  if IsClusterClaimExist(){
     return nil
  }
  err := create.clusterclaim()
  return err
}

@raaizik raaizik force-pushed the fix_odfinfo branch 4 times, most recently from 1ea2463 to b6bd338 Compare July 15, 2024 09:23
@raaizik raaizik requested a review from umangachapagain July 15, 2024 09:34
@raaizik
Copy link
Contributor Author

raaizik commented Jul 15, 2024

/retest

@raaizik
Copy link
Contributor Author

raaizik commented Aug 1, 2024

@nb-ohad @umangachapagain QE tests for MDR and RDR is getting blocked as this PR is not in 4.17

Please take a look ASAP

It depends on #2712 @vbnrh

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2024
@raaizik
Copy link
Contributor Author

raaizik commented Aug 1, 2024

/test ocs-operator-bundle-e2e-aws

@raaizik raaizik force-pushed the fix_odfinfo branch 3 times, most recently from 8ca0b49 to 8a0dc2f Compare August 8, 2024 12:13
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2024
@raaizik raaizik force-pushed the fix_odfinfo branch 3 times, most recently from 2181505 to a118a4a Compare August 8, 2024 12:48
@raaizik raaizik changed the title [WIP] Reverts deprecated ClusterClaims for odf-info Reverts deprecated ClusterClaims for odf-info Aug 8, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2024
@nb-ohad
Copy link
Contributor

nb-ohad commented Aug 8, 2024

LGTM
@iamniting @umangachapagain Do you want to take a look before merging?

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2024
Copy link
Contributor

openshift-ci bot commented Aug 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raaizik, umangachapagain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit bedc58d into red-hat-storage:main Aug 9, 2024
11 checks passed
@rewantsoni
Copy link
Member

/cherry-pick release-4.17

@openshift-cherrypick-robot

@rewantsoni: new pull request created: #2742

In response to this:

/cherry-pick release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants