Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: report ArgoCD errors via .status.conditions field #1608

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jparsai
Copy link
Collaborator

@jparsai jparsai commented Nov 26, 2024

What type of PR is this?
/kind enhancement

What does this PR do / why we need it:
This PR is to add a new Conditions field in Status of ArgoCD CR which would contain a single condition of 'Type: Reconciled' for any error that occurs.

Which issue(s) this PR fixes:
Red Hat issue tracker :- https://issues.redhat.com/browse/GITOPS-5876

How to test changes / Special notes to the reviewer:

  • Create ArgoCD CR.
  • After reconciliation .status.conditions field should be updated based on error or success.

@jparsai jparsai marked this pull request as ready for review November 27, 2024 06:12
Copy link
Member

@jgwest jgwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @jparsai, one additional request in addition to what is described below.

Part of this story is ensuring that the error messages that we are setting are user friendly.

AFAICT if you search for the following strings in the code, you will find cases where we are creating new errors:
errors.New
fmt.Errorf

I did that, and one thing I noticed is there are cases where there are error log statement are very user-friendly, and then below them is an error which is very user unfriendly:

  • reqLogger.Info(fmt.Sprintf("the ArgoCD instance '%s' does not match the label selector '%s' and skipping for reconciliation", request.NamespacedName, r.LabelSelector))
  • log.Error(err, fmt.Sprintf("Arg %s is already part of the default command arguments", arg))

    In these cases, I think we should make the error returned just as 'friendly' as the log statement logging the error case. AFAICT there is no reason to not have the error message be the same as the logged error message, as the logger error message provided much more datail.

It would also be beneficial to look at the error messages in general, by searching for the strings mentioned above. See if you can decide if they can be improved to provide better context for what the Argo CD operator was doing when the error occurred. (where possible)

api/v1alpha1/argocd_types.go Show resolved Hide resolved
@@ -117,11 +117,26 @@ func (r *ReconcileArgoCD) Reconcile(ctx context.Context, request ctrl.Request) (
labelSelector, err := labels.Parse(r.LabelSelector)
if err != nil {
reqLogger.Info(fmt.Sprintf("error parsing the labelSelector '%s'.", labelSelector))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce the amount of code duplication and/or places in the code where we need to update the .status.conditions fields, in this file? reconcileRolloutsManager function in Rollouts operator would be an example of this.

For example:

func (r *ReconcileArgoCD) Reconcile(ctx context.Context, request ctrl.Request) (ctrl.Result, error) {
    // (...)
    result, err := internalReconcile(...)

    if err := updateStatusConditionOfArgoCD(ctx, createCondition(err.Error()), argocd, r.Client, log); err != nil {
        log.Error(error, "unable to update status of ArgoCD")
        return reconcile.Result{}, error
    }  
    return result, err
}

func (r *ReconcileArgoCD) internalReconcile(ctx context.Context, request ctrl.Request) (ctrl.Result, error) {
    // the contents of the existing Reconcile function
}

Or you could do it how we do it in Argo Rollouts operator, in reconcileRolloutsManager, which is basically:

func (r *ReconcileArgoCD) internalReconcile(ctx context.Context, request ctrl.Request) (metav1.Condition, error) {
    // the contents of the existing Reconcile function
}

@@ -10,6 +10,11 @@ metadata:
name: example-argocd
status:
phase: Available
conditions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there a few failing cases that we can trigger and check, as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it before, but in existing logic it is not possible to intentionally fail the reconciliation. At few places we can make invalid configs in CR, which will log error, but operator doesn't return it, since in next cycle it will fail again until user changes the CR configs. I checked existing test cases as well, but there are no negative test cases.

assert.Equal(t, a.Status.Conditions[0].Type, argoproj.ArgoCDConditionType)
assert.Equal(t, a.Status.Conditions[0].Reason, argoproj.ArgoCDConditionReasonErrorOccurred)
assert.Equal(t, a.Status.Conditions[0].Message, "the ArgoCD instance 'argocd/argo-test-2' does not match the label selector 'foo=bar' and skipping for reconciliation")
assert.Equal(t, a.Status.Conditions[0].Status, metav1.ConditionFalse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify that fixing the error, and then calling reconcile again, will remove the condition?

}

// insertOrUpdateConditionsInSlice is a generic function for inserting/updating metav1.Condition into a slice of []metav1.Condition
func insertOrUpdateConditionsInSlice(newCondition metav1.Condition, existingConditions []metav1.Condition) (bool, []metav1.Condition) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need unit tests for this, you can likely use the ones from argo-rollouts-manager, and port them here.

@jparsai jparsai force-pushed the conditions branch 3 times, most recently from 401cc52 to e1224cb Compare December 16, 2024 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants