Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-JIRA: chore(nbcs): update tooling in controller's Makefiles - kubernetes and kustomize #432

Merged
merged 2 commits into from
Nov 8, 2024

Conversation

jstourac
Copy link
Member

@jstourac jstourac commented Oct 25, 2024

Some tooling updates and fix. I bumped into this during my work on https://issues.redhat.com/browse/RHOAIENG-14687.

This fix is part of https://issues.redhat.com/browse/RHOAIENG-15141.

Description

The update of the kubernetes version isn't mandatory and I can remove it from here if you want to. But I think it's reasonable.
The fix for the kustomize package is necessary to make our e2e tests pass (at least partially) again. This change is a followup of the #391.

How Has This Been Tested?

Try to run in components/odh-notebook-controller:

  • make test - all should pass.
  • make run-ci-e2e-tests - 15 pass, 2 fail - you need to be logged into some cluster to make this work

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@jstourac jstourac self-assigned this Oct 25, 2024
@jstourac jstourac requested a review from jiridanek October 25, 2024 14:11
@openshift-ci openshift-ci bot requested review from caponetto and harshad16 October 25, 2024 14:11
@jiridanek
Copy link
Member

/lgtm

What's that we need to be running in ci and we aren't? Because, both GitHub actions and ocp-ci are passing now, right? Bit the customize line is clearly wrong...

@jstourac
Copy link
Member Author

/lgtm

What's that we need to be running in ci and we aren't? Because, both GitHub actions and ocp-ci are passing now, right? Bit the customize line is clearly wrong...

Hm, but we're not running e2e tests anywhere in our CI here in GitHub, are we? I didn't investigate the failure very much, I can do so later maybe.

@jiridanek
Copy link
Member

jiridanek commented Oct 25, 2024

The e2e that need you to bring your own openshift? Right, we don't run that in gha ci. We should, eventually. Thought it runs in ocp či, but apparently really not.

@jstourac
Copy link
Member Author

ci/prow/odh-notebook-controller-e2e

Hm, @jiridanek you are right, we're truly running them: https://github.com/openshift/release/blob/master/ci-operator/config/opendatahub-io/kubeflow/opendatahub-io-kubeflow-main.yaml#L107.

And it is green in your PR #430 (this run) even though they failed, see the results here:

/go/src/github.com/opendatahub-io/kubeflow/components/odh-notebook-controller/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
GOBIN=/go/src/github.com/opendatahub-io/kubeflow/components/odh-notebook-controller/bin go install sigs.k8s.io/kustomize/v3/cmd/[email protected]
go: sigs.k8s.io/kustomize/v3/cmd/[email protected]: sigs.k8s.io/kustomize/v3/cmd/[email protected]: invalid version: unknown revision v3/cmd/kustomize/v5.0.2
make[1]: *** [kustomize] Error 1
make[1]: Leaving directory `/go/src/github.com/opendatahub-io/kubeflow/components/odh-notebook-controller'
make[1]: Entering directory `/go/src/github.com/opendatahub-io/kubeflow/components/odh-notebook-controller'
go test ./e2e/ -run ^TestE2ENotebookController -v \
	--nb-namespace=odh-notebook-controller-system \
	--deploymentMode=oauth \
	"--skip-deletion=false" 
go: downloading github.com/stretchr/testify v1.8.4
go: downloading github.com/pmezard/go-difflib v1.0.0
=== RUN   TestE2ENotebookController
=== RUN   TestE2ENotebookController/validate_controllers
=== RUN   TestE2ENotebookController/validate_controllers/Validate_Kubeflow_notebook_controller
    notebook_controller_test.go:32: 
        	Error Trace:	/go/src/github.com/opendatahub-io/kubeflow/components/odh-notebook-controller/e2e/notebook_controller_test.go:32
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestE2ENotebookController/validate_controllers/Validate_Kubeflow_notebook_controller
        	Messages:   	error getting notebook-controller configmap
=== RUN   TestE2ENotebookController/validate_controllers/Validate_ODH_notebook_controller
    notebook_controller_test.go:45: 
        	Error Trace:	/go/src/github.com/opendatahub-io/kubeflow/components/odh-notebook-controller/e2e/notebook_controller_test.go:45
        	Error:      	Received unexpected error:
        	            	timed out waiting for the condition
        	Test:       	TestE2ENotebookController/validate_controllers/Validate_ODH_notebook_controller
        	Messages:   	error in validating odh notebook controller
--- FAIL: TestE2ENotebookController (60.29s)
    --- FAIL: TestE2ENotebookController/validate_controllers (60.29s)
        --- FAIL: TestE2ENotebookController/validate_controllers/Validate_Kubeflow_notebook_controller (0.20s)
        --- FAIL: TestE2ENotebookController/validate_controllers/Validate_ODH_notebook_controller (60.08s)
FAIL
FAIL	github.com/opendatahub-io/kubeflow/components/odh-notebook-controller/e2e	60.321s
FAIL
make[1]: *** [e2e-test-oauth] Error 1

We probably don't propagate the return code in that place properly or don't fail as we should.

My run here failed even before the e2e tests were triggered, so I'll try to re-run it to see whether I'll see what I saw during my local testing:

/retest

@jstourac
Copy link
Member Author

We probably don't propagate the return code in that place properly or don't fail as we should.

I raised the #433 to address this issue. Also created https://issues.redhat.com/browse/RHOAIENG-15141 to track these E2E test failures so we can plan on it.

My run here failed even before the e2e tests were triggered, so I'll try to re-run it to see whether I'll see what I saw during my local testing:

So, retest didn't work - the failure was the very same and again - it even didn't get to the point of the tests execution --- no e2e directory in here. This is the failure, something related to the cluster preparation:

{"component":"entrypoint","error":"wrapped process failed: exit status 6","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:84","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.internalRun","level":"error","msg":"Error executing test process","severity":"error","time":"2024-10-25T18:47:20Z"}
error: failed to execute wrapped command: exit status 6

@jstourac
Copy link
Member Author

Let’s try again

/retest

@jstourac
Copy link
Member Author

/retest

@jstourac
Copy link
Member Author

Okay, so all the failures of the odh-notebook-controller-e2e job are the same (during the preparation of the testing env). I don't understand how, but maybe these changes are the culprit in the end 🤷 will investigate tomorrow in my worktime and let's see.

@jiridanek
Copy link
Member

jiridanek commented Oct 28, 2024

Got similar problems with #430, I'm guessing infra causes are possible, since it's affecting both. Also, Diamond's CVE fix for 2.10 has ocp-ci troubles.

Maybe bad day, we need Czech engineering to get back to work and fix it ;p

@jstourac
Copy link
Member Author

Okay, Jiri, so one last time 😀

/retest

@jstourac
Copy link
Member Author

FTR - the CI failures are caused by some infra issue due to a recent GCP configuration change, see this.

The openshift/release#58312 is an attempt to workaround it by using unaffected OCP version (4.15+).

@jiridanek
Copy link
Member

jiridanek commented Oct 31, 2024

Screenshot 2024-10-31 at 10 49 33 AM

No joy so far

@jstourac
Copy link
Member Author

jstourac commented Nov 5, 2024

I just realized we haven't progress with the openshift-ci failures since last week, did we? Let's check now:

/retest

@jiridanek
Copy link
Member

Imo not; we have it as task for next sprint ;P

The version 1.23 is no longer supported [1] and only releases from
1.28 - 1.31 versions are supported. Based on the [2], the
OpenShift 4.13 (oldest we care about right now) is based on Kubernetes
in version 1.26, so let's allign to that version.

* [1] https://kubernetes.io/releases/
* [2] https://access.redhat.com/solutions/4870701
This is a followup of opendatahub-io#391 (1b2dd4f).

Without this, we can see the following failure during the e2e tests
execution:
```
make run-ci-e2e-tests
...
GOBIN=/home/jstourac/workspace/rhosai/odh/kubeflow/components/odh-notebook-controller/bin go install sigs.k8s.io/kustomize/v3/cmd/[email protected]
go: sigs.k8s.io/kustomize/v3/cmd/[email protected]: sigs.k8s.io/kustomize/v3/cmd/[email protected]: invalid version: unknown revision v3/cmd/kustomize/v5.0.2
make[1]: *** [Makefile:243: kustomize] Error 1
...
```
This caused that all e2e tests failed. After this fix, some e2e tests
pass. The other seem to fail on a problem unrelated to this one.
@openshift-ci openshift-ci bot removed the lgtm label Nov 8, 2024
@jiridanek
Copy link
Member

/lgtm
ocp-ci issue has fixed itself!

@openshift-ci openshift-ci bot added the lgtm label Nov 8, 2024
@jstourac
Copy link
Member Author

jstourac commented Nov 8, 2024

Thank you for your review, Jiri!

Let's get it in - it affects tests only (and enables them to run), so it shouldn't keep anyone extremely unhappy, I guess 🙂

/approve

Copy link

openshift-ci bot commented Nov 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jstourac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 8, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 1104a11 into opendatahub-io:main Nov 8, 2024
15 checks passed
@jstourac jstourac deleted the k8sToolTestUpdate branch November 8, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants