-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAIENG-8390: feat(odh-nbc): search for imagestreams only in the namespace the controller runs in #375
Conversation
Hi @shalberd. Thanks for your PR. I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This does not even compile, but I think it's reasonable to give it ok to test now so the images build after issues are fixed. |
working on it, one reason is the collegue from Red Hat Zurich @bartoszmajsak also declared this in his networkpolicy and istio work at albeit suboptimal in my opinion. Working on it, will fix these issues |
@jiridanek not sure what do do about unit test failure with regards to this part of getControllerNamespace
What I prefer not to do is have a fallback namespace like redhat-odh-applications ... |
components/odh-notebook-controller/controllers/notebook_webhook.go
Outdated
Show resolved
Hide resolved
components/odh-notebook-controller/controllers/notebook_webhook.go
Outdated
Show resolved
Hide resolved
ff5c01a
to
8065cdb
Compare
I need to have a closer look at why the unit tests are failing. Kind of weird that test suite const does not help in unit test, as it looks like also according to preexisting comment |
Any idea what's up with the unittests? Also, why it's passing on github actions (https://github.com/opendatahub-io/kubeflow/actions/runs/10218934134/job/28276148238?pr=375#step:4:211) but failing on openshift-ci? I guess I can take a look too, if necessary. I never properly understood what openshift-ci is doing, maybe it's time for me to change that. |
@harshad16 can you please also have a look at why the CI test in openshift-ci failed in the past? Seems ok now suddenly. https://prow.ci.openshift.org/pr-history/?org=opendatahub-io&repo=kubeflow&pr=375 I don't see I did anything wrong conceptually in my work here, anyway. Images seems to build now since August 2, too. https://quay.io/repository/opendatahub/odh-notebook-controller?tab=tags&tag=pr-375 https://quay.io/repository/opendatahub/kubeflow-notebook-controller?tab=tags&tag=pr-375 |
/cherrypick v1.9-branch |
@jiridanek: once the present PR merges, I will cherry-pick it on top of v1.9-branch in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cherrypick stable |
@jiridanek: once the present PR merges, I will cherry-pick it on top of stable in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
same here, shall I change this to v1.9 branch? |
same answer, |
/retest |
@harshad16 |
weird indeed; the tests should not be this flaky, also, they seem to be flaky only for you ;P /test odh-notebook-controller-unit see what happens now |
The e2e job seems to be failing in the same way here as it does on https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_kubeflow/469/pull-ci-opendatahub-io-kubeflow-main-odh-notebook-controller-e2e/1861352298442133504 So that's fine |
/retest |
/override ci/prow/odh-notebook-controller-e2e The e2e tests won't pass as the 2 test failures are expected as discussed above. |
@jstourac: Overrode contexts on behalf of jstourac: ci/prow/odh-notebook-controller-e2e In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
LGTM in general, but for the whole context - I still don't think I have enough knowledge about the whole codebase here 🙂 /lgtm |
It implements my suggestions from #375 (comment), so I have to be happy, right? And one of the ideas of Go was that the code should be unsurprising and one should be able to reason about parts of the codebase in isolation https://go.dev/doc/faq#overloading Guess that's well and good, until you start building on top of Kubernetes ;p |
/override ci/prow/odh-notebook-controller-e2e |
@jiridanek: Overrode contexts on behalf of jiridanek: ci/prow/odh-notebook-controller-e2e In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thank you Sven for you patience waiting a review! 🙂
Folks, feel free to use this patch on the operator to check the new behavior. |
all good, just one small comment with regards to notebook_controller_test unit test ... there is a comment that in that form is not correct I suggest changing the lines
to
see and I mean, unit test namespace is always |
/override ci/prow/odh-notebook-controller-e2e same as before |
@jiridanek: Overrode contexts on behalf of jiridanek: ci/prow/odh-notebook-controller-e2e In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
All changes look good after my inspection across several scenarios. 💯 I also tested with the internal registry enabled and disabled to ensure full functionality, and everything worked as expected. However, I did notice a slight latency in behavior when switching the internal registry to Removed. Specifically, after triggering the creation of a new workbench (just a few seconds after status.dockerImageRepository was updated to empty), the logs indicated that the controller hit the if condition, resulting in an ImagePullOff state. I retried shortly after, and this time, it correctly handled the case without a registry. /lgtm |
That would be a latency in https://github.com/opendatahub-io/odh-dashboard, then, which did not notice yet that registry disappeared and still used |
Thanks for the reviews and ofc for the work! + 💯 /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jiridanek The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Let's test more as Jira moves to testing, then! |
bb2dd26
into
opendatahub-io:main
@jiridanek: new pull request created: #474 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…espace the controller runs in (opendatahub-io#375) * [Fix] search for imagestreams only in the namespace the controller runs in. - Take namespace that the controller runs in - Log that dynamically-determined namespace once at startup. - Use hard-coded namespace for unit tests.
@jiridanek: #375 failed to apply on top of branch "stable":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
we don't care for ^^^, that was set when we had a different process, we don't cherrypick anymore |
https://issues.redhat.com/browse/RHOAIENG-8390
We would like to think whether we can improve the way how we search for the ImageStreams in case the cluster doesn't have an internal image registry enabled. Not all cluster deployments run with the standard name
opendatahub
orredhat-ods-applications
Description
Instead of hardcoding the central namespace to look for imagestreams, since all central components run in the main namespace where also by default the imagestreams are located:
https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/#directly-accessing-the-rest-api
"the default namespace to be used for namespaced API operations is placed in a file at /var/run/secrets/kubernetes.io/serviceaccount/namespace in each container."
How Has This Been Tested?
not tested yet. I propose testing with built PR image ...
Merge criteria: