Reverts deprecated ClusterClaims for odf-info #2686

raaizik · 2024-07-03T19:18:32Z

Changes

Re-adds a single ClusterClaim that stores the NamespacedName of the odf-info CM so that MCO will be able to fetch it.
Skips uninstall/delete so this cluster claim will live as long as the operator does.
RHSTOR-5117
RDR: ODF info CM logic #2440
Depends on Available CRDs check feature #2712

TODO

Simplify by moving ClusterClaim to be reconciled in OCSInitialization
Delete the cluster claim once the odf-info CM is removed

openshift-ci · 2024-07-03T19:18:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

raaizik · 2024-07-04T10:28:26Z

/cc @rewantsoni @umangachapagain

controllers/ocsinitialization/ocsinitialization_controller.go

nb-ohad · 2024-07-04T10:53:41Z

controllers/ocsinitialization/clusterclaims.go

+	client, err := clusterclientv1alpha1.NewForConfig(kubeconfig)
+	if err != nil {
+		creator.Logger.Error(err, "Failed to create ClusterClaim client.")
+		return reconcile.Result{}, err
+	}


Is there really a need to create a new non-cached client? Why can't we use the client which is already installed on the reconciler object?

As a general note, we should only use non-cached clients in cases where there is underlying reason to do so.

controllers/ocsinitialization/clusterclaims.go

vbnrh

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

vbnrh

Can we add another claim which will indicate which clusters are client and provider for the UI ?

raaizik · 2024-07-09T14:32:19Z

Can we add another claim which will indicate which clusters are client and provider for the UI ?

@vbnrh This would break the concept of the odf-info CM which is supposed to be a replacement for cluster claims (other than the one that's added here, which I'd classify as the CM's metadata). What is an option is to add this to the CM itself.

raaizik · 2024-07-09T14:38:32Z

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

@vbnrh The cluster claims used to be reconciled via storage cluster because they contained information about it. This is no longer the case, plus the cluster claim needs to be coupled with the operator's lifecycle where it is created once and removed manually (since there's no uninstall for OCSInit).

vbnrh · 2024-07-10T05:48:00Z

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

@vbnrh The cluster claims used to be reconciled via storage cluster because they contained information about it. This is no longer the case, plus the cluster claim needs to be coupled with the operator's lifecycle where it is created once and removed manually (since there's no uninstall for OCSInit).

If the StorageCluster is uninstalled at a later point of time, the CM will still be present on the hub. How are we going to indicate that cluster is not available for RDR on hub ? Are we planning or have we added the StorageCluster's status on the CM ?

raaizik · 2024-07-10T08:27:12Z

@raaizik Why does the claim need to be removed manually ? Can we not set it to be removed alongside StorageCluster?

@vbnrh The cluster claims used to be reconciled via storage cluster because they contained information about it. This is no longer the case, plus the cluster claim needs to be coupled with the operator's lifecycle where it is created once and removed manually (since there's no uninstall for OCSInit).

If the StorageCluster is uninstalled at a later point of time, the CM will still be present on the hub. How are we going to indicate that cluster is not available for RDR on hub ? Are we planning or have we added the StorageCluster's status on the CM ?

@vbnrh Whenever a StorageCluster is uninstalled, its respective key in the CM gets removed. If that key was the last one to get uninstalled, then the whole CM gets removed.

nb-ohad · 2024-07-11T10:46:13Z

@vbnrh The responsibility of the claim is to broadcast (to ACM) the existence of the configmap not the existence of a specific storage cluster. The existence, state, and info of the various storage clusters can be found as part of the config map data (which will be available on the ACM hub)

As @raaizik already mentioned, when a storage cluster is deleted the config map is updated to remove all information related to that storage cluster from the config map, and that change will be synced back to the hub

So as you know, clients are not standalone entities, and the information about them is specified under the storage cluster that supports them. We don't need to have a separate claim for them.

raaizik · 2024-07-11T14:56:56Z

/retest

controllers/ocsinitialization/clusterclaims.go

umangachapagain · 2024-07-12T11:10:15Z

controllers/ocsinitialization/clusterclaims.go

+	creator := ClusterClaimCreator{
+		Logger:  r.Log,
+		Context: ctx,
+		Client:  r.Client,
+		OcsInit: instance,
+	}
+
+	operatorNamespace, err := creator.getOperatorNamespace()
+	if err != nil {
+		r.Log.Error(err, "failed to get operator's namespace. retrying again")
+		return reconcile.Result{}, err
+	}


No need of this. Simply fetch the operator namespace and proceed. This pattern was used earlier when there were a lot of things to fetch and create.

umangachapagain · 2024-07-12T11:17:18Z

controllers/ocsinitialization/ocsinitialization_controller.go

+	clusterClaimCrdPredicate := predicate.Funcs{
+		CreateFunc: func(e event.TypedCreateEvent[client.Object]) bool {
+			panic("ClusterClaim CRD was found. Restarting pod to initiate creation")
+		},
+		DeleteFunc: func(e event.TypedDeleteEvent[client.Object]) bool {
+			panic("ClusterClaim CRD was found. Restarting pod to initiate deletion")
+		},
+	}


I'd like to avoid setting panic to restart the process as much as possible. There are ways to set watches dynamically, we should try those first.

@umangachapagain They don't work because they do not re-popolate the cache. The only reliable way we found was to panic out.

@raaizik I would not use panic, I will use os.exit with a specific code. That way we can warp the operator invocation with a small bash command that invokes the operator process if it sees that particular exit code.
Doing it that way will prevent unnecessary pos restarts

umangachapagain · 2024-07-12T11:31:07Z

controllers/ocsinitialization/ocsinitialization_controller.go

@@ -74,6 +77,7 @@ type OCSInitializationReconciler struct {
 // +kubebuilder:rbac:groups="monitoring.coreos.com",resources={alertmanagers,prometheuses},verbs=get;list;watch;create;update;patch;delete
 // +kubebuilder:rbac:groups="monitoring.coreos.com",resources=servicemonitors,verbs=get;list;watch;update;patch;create;delete
 // +kubebuilder:rbac:groups=operators.coreos.com,resources=clusterserviceversions,verbs=get;list;watch;delete;update;patch
+// +kubebuilder:rbac:groups=cluster.open-cluster-management.io,resources=clusterclaims,verbs=get;list;watch;create;update;delete


Since we are not deleting clusterclaims, can we remove the rbac for it?

umangachapagain · 2024-07-12T11:33:26Z

controllers/ocsinitialization/ocsinitialization_controller.go

+	clusterClaim := &ocsClusterClaim{}
+	reconcileResult, err := clusterClaim.ensureCreated(r, instance)
+	if err != nil {
+		r.Log.Error(err, "Failed to ensure odf-info namespacedname ClusterClaim")
+		return reconcileResult, err
+	}


We don't need to use this pattern. We can just create a "ensureClusteClaimExists" function and call it directly.

If the intent is to make sure it exists, we can have an IsClusterClaimExist function that returns a bool.
Using an error to indicate a valid result (does not exists) is a semantic mistake

IsClusterClaimExist is a check that returns True or False. What we want is to take action to EnsureClusterClaimExists and return an error if failed. That is semantically correct.

func ensureClusterClaimExists() error { if IsClusterClaimExist(){ return nil } err := create.clusterclaim() return err }

raaizik · 2024-07-15T12:06:56Z

/retest

raaizik · 2024-08-01T10:39:35Z

@nb-ohad @umangachapagain QE tests for MDR and RDR is getting blocked as this PR is not in 4.17

Please take a look ASAP

It depends on #2712 @vbnrh

raaizik · 2024-08-01T12:28:33Z

/test ocs-operator-bundle-e2e-aws

controllers/ocsinitialization/ocsinitialization_controller.go

controllers/storagecluster/odfinfoconfig.go

Signed-off-by: raaizik <[email protected]>

nb-ohad · 2024-08-08T15:15:00Z

LGTM
@iamniting @umangachapagain Do you want to take a look before merging?

openshift-ci · 2024-08-09T07:15:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raaizik, umangachapagain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [umangachapagain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rewantsoni · 2024-08-09T08:11:18Z

/cherry-pick release-4.17

openshift-cherrypick-robot · 2024-08-09T08:12:00Z

@rewantsoni: new pull request created: #2742

In response to this:

/cherry-pick release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2024

raaizik force-pushed the fix_odfinfo branch 2 times, most recently from 3f248ba to b29856a Compare July 4, 2024 10:16

raaizik marked this pull request as ready for review July 4, 2024 10:17

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 4, 2024

openshift-ci bot requested review from rewantsoni and umangachapagain July 4, 2024 10:28

nb-ohad requested changes Jul 4, 2024

View reviewed changes

openshift-ci bot assigned nb-ohad Jul 4, 2024

vbnrh reviewed Jul 9, 2024

View reviewed changes

raaizik force-pushed the fix_odfinfo branch from b29856a to a9479ed Compare July 11, 2024 13:27

raaizik requested a review from nb-ohad July 11, 2024 13:34

vbnrh mentioned this pull request Jul 12, 2024

Implement reconciliation for ManagedCluster and create ManagedClusterView red-hat-storage/odf-multicluster-orchestrator#219

Merged

umangachapagain requested changes Jul 12, 2024

View reviewed changes

openshift-ci bot assigned umangachapagain Jul 12, 2024

raaizik force-pushed the fix_odfinfo branch 4 times, most recently from 1ea2463 to b6bd338 Compare July 15, 2024 09:23

raaizik requested a review from umangachapagain July 15, 2024 09:34

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2024

raaizik force-pushed the fix_odfinfo branch from 33e7154 to 71dc397 Compare August 1, 2024 10:46

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2024

GowthamShanmugam mentioned this pull request Aug 2, 2024

Bug 2302448: Refactor imports in odfinfoconfig.go #2729

Merged

raaizik force-pushed the fix_odfinfo branch 3 times, most recently from 8ca0b49 to 8a0dc2f Compare August 8, 2024 12:13

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2024

raaizik force-pushed the fix_odfinfo branch from 8a0dc2f to 35d3ec5 Compare August 8, 2024 12:31

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2024

raaizik force-pushed the fix_odfinfo branch 3 times, most recently from 2181505 to a118a4a Compare August 8, 2024 12:48

raaizik changed the title ~~[WIP] Reverts deprecated ClusterClaims for odf-info~~ Reverts deprecated ClusterClaims for odf-info Aug 8, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2024

nb-ohad requested changes Aug 8, 2024

View reviewed changes

controllers/ocsinitialization/ocsinitialization_controller.go Outdated Show resolved Hide resolved

controllers/ocsinitialization/ocsinitialization_controller.go Show resolved Hide resolved

controllers/storagecluster/odfinfoconfig.go Outdated Show resolved Hide resolved

Reverts deprecated ClusterClaims for odf-info

e32baee

Signed-off-by: raaizik <[email protected]>

raaizik force-pushed the fix_odfinfo branch from a118a4a to e32baee Compare August 8, 2024 13:58

raaizik requested a review from nb-ohad August 8, 2024 14:32

umangachapagain approved these changes Aug 9, 2024

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2024

openshift-merge-bot bot merged commit bedc58d into red-hat-storage:main Aug 9, 2024
11 checks passed

openshift-cherrypick-robot mentioned this pull request Aug 9, 2024

Bug 2303829: [[release-4.17] Reverts deprecated ClusterClaims for odf-info](https://github.com/red-hat-storage/ocs-operator/pull/2742#top) #2742

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverts deprecated ClusterClaims for odf-info #2686

Reverts deprecated ClusterClaims for odf-info #2686

raaizik commented Jul 3, 2024 •

edited

Loading

openshift-ci bot commented Jul 3, 2024

raaizik commented Jul 4, 2024

nb-ohad Jul 4, 2024

vbnrh left a comment

vbnrh left a comment

raaizik commented Jul 9, 2024 •

edited

Loading

raaizik commented Jul 9, 2024 •

edited

Loading

vbnrh commented Jul 10, 2024

raaizik commented Jul 10, 2024

nb-ohad commented Jul 11, 2024

raaizik commented Jul 11, 2024

umangachapagain Jul 12, 2024

umangachapagain Jul 12, 2024

nb-ohad Jul 12, 2024

umangachapagain Jul 12, 2024

umangachapagain Jul 12, 2024

nb-ohad Jul 12, 2024

umangachapagain Jul 12, 2024

raaizik commented Jul 15, 2024

raaizik commented Aug 1, 2024 •

edited

Loading

raaizik commented Aug 1, 2024

nb-ohad commented Aug 8, 2024

openshift-ci bot commented Aug 9, 2024

rewantsoni commented Aug 9, 2024

openshift-cherrypick-robot commented Aug 9, 2024

Reverts deprecated ClusterClaims for odf-info #2686

Reverts deprecated ClusterClaims for odf-info #2686

Conversation

raaizik commented Jul 3, 2024 • edited Loading

Changes

TODO

openshift-ci bot commented Jul 3, 2024

raaizik commented Jul 4, 2024

Choose a reason for hiding this comment

vbnrh left a comment

Choose a reason for hiding this comment

vbnrh left a comment

Choose a reason for hiding this comment

raaizik commented Jul 9, 2024 • edited Loading

raaizik commented Jul 9, 2024 • edited Loading

vbnrh commented Jul 10, 2024

raaizik commented Jul 10, 2024

nb-ohad commented Jul 11, 2024

raaizik commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raaizik commented Jul 15, 2024

raaizik commented Aug 1, 2024 • edited Loading

raaizik commented Aug 1, 2024

nb-ohad commented Aug 8, 2024

openshift-ci bot commented Aug 9, 2024

rewantsoni commented Aug 9, 2024

openshift-cherrypick-robot commented Aug 9, 2024

raaizik commented Jul 3, 2024 •

edited

Loading

raaizik commented Jul 9, 2024 •

edited

Loading

raaizik commented Jul 9, 2024 •

edited

Loading

raaizik commented Aug 1, 2024 •

edited

Loading