Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resolve errors in e2e tests for cypress in Katib UI #2384

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

tariq-hasan
Copy link
Contributor

@tariq-hasan tariq-hasan commented Jul 15, 2024

What this PR does / why we need it: This PR fixed the errors in e2e tests for cypress.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2381

Checklist:

  • Docs included if any changes are user facing

@tariq-hasan tariq-hasan force-pushed the fix-katib-ui-tests branch 3 times, most recently from 4fc9949 to 620e7b8 Compare July 15, 2024 08:08
@tariq-hasan
Copy link
Contributor Author

tariq-hasan commented Jul 15, 2024

This PR relies on a commit in my forked repository: kubeflow/kubeflow@master...tariq-hasan:kubeflow:fix-katib-ui-tests.

The commit needs to be merged into the kubeflow repository and the COMMIT file in this PR needs to be updated with the latest commit tag after the merge.

Also the addition of the process package in components/crud-web-apps/common/frontend/kubeflow-common-lib/package.json and the update to the associated package-lock.json may have repercussions for crud-web-apps. This needs to be assessed further.

@tariq-hasan
Copy link
Contributor Author

tariq-hasan commented Jul 15, 2024

The e2e tests involving Cypress have passed: https://github.com/kubeflow/katib/actions/runs/9935880934?pr=2384.

There is a failed test for Katib UI: https://github.com/kubeflow/katib/actions/runs/9935880964/job/27442941694?pr=2384.

This is coming from the npm ci commands in https://github.com/kubeflow/katib/blob/master/cmd/ui/v1beta1/Dockerfile.

I am investigating the issue and will have it fixed.

@andreyvelich
Copy link
Member

andreyvelich commented Jul 15, 2024

Thank you so much for investigating this and for this fix @tariq-hasan!
Please can you submit PR into kubeflow/kubeflow to add this library so we can take the appropriate commit for Katib repo?

@tariq-hasan
Copy link
Contributor Author

Thank you so much for investigating this and for this fix @tariq-hasan!
Please can you submit PR into kubeflow/kubeflow to add this library so we can take the appropriate commit for Katib repo?

Sounds good. I'll try to fix the failed Katib UI test so that I can identify if any other code changes are required for kubeflow/kubeflow. I am then creating a PR to merge the code changes in kubeflow/kubeflow and will then update this PR for review.

Failed Katib UI test: https://github.com/kubeflow/katib/actions/runs/9935880964/job/27442941694?pr=2384

@tariq-hasan tariq-hasan force-pushed the fix-katib-ui-tests branch 3 times, most recently from d0d4b5f to 734af5d Compare July 16, 2024 04:26
@andreyvelich
Copy link
Member

/rerun-workflow "Publish Katib Core Images"

@andreyvelich
Copy link
Member

/rerun-all

@andreyvelich
Copy link
Member

@tariq-hasan @tenzen-y It looks like /rerun-all is working only right now.
We might need to wait until this one will be merged to rerun specific workflow: estroz/rerun-actions#3

@tariq-hasan
Copy link
Contributor Author

tariq-hasan commented Jul 16, 2024

Thank you for the clarification.

I did some further investigation on the Publish Katib Core Images workflow.

I found that the workflow fails within the final stage of the Dockerfile due to the results coming from npm ci.

If I replace npm ci with npm install --no-package-lock the workflow succeeds.

That said, npm ci uses package-lock.json instead of package.json and is therefore a more stringent validation of the configuration in the frontend folder. In addition, npm ci is less resource-intensive compared to npm install. As such, I suppose that npm install --no-package-lock is not a feasible option.

As npm install --no-package-lock succeeds and npm ci fails the issue could be due to package-lock.json not working in the node:14 base image as it was generated in a different environment on my local.

I will try to do some more investigation to determine why the build is failing and find an appropriate fix.

@andreyvelich
Copy link
Member

Thank you for this deep investigation @tariq-hasan!

@kimwnasptd @elenzio9 @orfeas-k @kubeflow/wg-data-leads @ederign @alexcreasy
cc some UI folks on problems with Katib and Kubeflow UIs.
Please can you help us to resolve problems with npm and node ?

@orfeas-k
Copy link
Contributor

orfeas-k commented Jul 17, 2024

@tariq-hasan @andreyvelich I think the answer is already highlighted in that previous commnet

package-lock.json not working in the node:14 base image as it was generated in a different environment on my local.

If the build succeeds with npm install --no-package-lock, we 'd need to update package-lock.json file (and commit it) from the same node and npm version with the one being used in the CI.

@andreyvelich
Copy link
Member

What changes do we need to make for Kubeflow UI library ?

@ederign
Copy link
Member

ederign commented Jul 18, 2024

@andreyvelich I believe what @orfeas-k meant is that @tariq-hasan generated package-lock.json file in the PR needs to match the node (and npm) version of the node where the 'Publish Katib Core Images' is being executed.

@tariq-hasan, could you please double-check that?

Also, I'm new to the project, but is there a reason why we are upgrading to node 14 instead of node 16 like kubeflow/kubeflow is using?

@andreyvelich
Copy link
Member

Also, I'm new to the project, but is there a reason why we are upgrading to node 14 instead of node 16 like kubeflow/kubeflow is using?

I think, the common Kubeflow frontend component that we use doesn't use Node 16 today: https://github.com/kubeflow/kubeflow/blob/master/components/crud-web-apps/common/frontend/kubeflow-common-lib/package.json#L62

@tariq-hasan Did you check if cypress tests can be succeeded on Node 16 ? If yes, we should just update it.

@tariq-hasan
Copy link
Contributor Author

I am upgrading kubeflow-common-library in crud-webapps as well as katib-ui to node 16.

That said, the following components are still using node 12:

I will also try to reproduce the build from a Dockerfile on my local to ensure that the package-lock.json is correctly generated.

@tariq-hasan tariq-hasan force-pushed the fix-katib-ui-tests branch 2 times, most recently from db5c4f1 to c660b3c Compare July 29, 2024 22:46
@tariq-hasan
Copy link
Contributor Author

/rerun-all

1 similar comment
@tariq-hasan
Copy link
Contributor Author

/rerun-all

@tariq-hasan
Copy link
Contributor Author

tariq-hasan commented Jul 30, 2024

Using package-lock.json from a build of the Dockerfile did not fix the errors.

However, upgrades of both crud-webapps/common/frontend/kubeflow-common-library and katib-ui to node 16 fixed the errors.

Please let me know if there are any feedback.

I will then create a PR to move kubeflow-common-library to node 16.

@andreyvelich
Copy link
Member

Yes, @tariq-hasan please create PR into kubeflow/kubeflow to update Node version for common library.

@tariq-hasan
Copy link
Contributor Author

tariq-hasan commented Aug 14, 2024

I apologize for the delay. I've updated the configuration for tensorboards. I am working through jupyter, volumes and centraldashboard-angular, and will create a PR in kubeflow/kubeflow shortly.

@andreyvelich
Copy link
Member

Thank you @tariq-hasan, appreciate your time for it!

@tariq-hasan
Copy link
Contributor Author

/rerun-all

@tariq-hasan
Copy link
Contributor Author

I have created a PR for kubeflow/kubeflow: kubeflow/kubeflow#7637.

This PR upgrades the node version from 12 to 16 for kubeflow-common-lib and its dependent components: jupyter, tensorboards, volumes and centraldashboard-angular.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member

/remove-lifecycle stale

@tariq-hasan tariq-hasan force-pushed the fix-katib-ui-tests branch 3 times, most recently from 706d6cb to c4de1b8 Compare November 29, 2024 03:18
@tariq-hasan
Copy link
Contributor Author

@andreyvelich @tenzen-y I was wondering if you have any thoughts on why the E2E Test with enas-cnn-cifar10 could be failing.

I have re-run it multiple times but to no success.

I have not seen this test fail before so this is quite odd.

@tariq-hasan
Copy link
Contributor Author

/rerun-all

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great contribution to fix the Katib UI E2Es @tariq-hasan 🎉
We can finally unblock the PRs for Katib UI.

@andreyvelich
Copy link
Member

/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 5212949 into kubeflow:master Dec 3, 2024
66 checks passed
@tariq-hasan tariq-hasan deleted the fix-katib-ui-tests branch December 4, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Katib UI Tests
5 participants