Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test for create-pytorchjob.ipynb python notebook #2274

Merged

Conversation

saileshd1402
Copy link
Contributor

@saileshd1402 saileshd1402 commented Sep 29, 2024

What this PR does / why we need it:
This PR addresses the issue: #2246

Changes:

  • Added test for examples/pytorch/image-classification/create-pytorchjob.ipynb using papermill
  • Made it extendable so that new notebooks can be added easily
  • New notebook tests can be added to the test-example-notebooks.yaml file/workflow as jobs
  • Notebook tests should have the parameter kubeflow_python_sdk so that CI will install local Python SDK for testing
  • Updated gcr.io/kubeflow-ci/pytorch-dist-mnist-test:v1.0 to kubeflow/pytorch-dist-mnist:latest

Checklist:

  • Docs included if any changes are user facing

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coveralls
Copy link

coveralls commented Sep 29, 2024

Pull Request Test Coverage Report for Build 12236369721

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 12088918617: 0.0%
Covered Lines: 77
Relevant Lines: 77

💛 - Coveralls

@saileshd1402 saileshd1402 force-pushed the pytorch-job-notebook-test branch 2 times, most recently from 06c3b4b to 67a6b83 Compare September 29, 2024 23:00
@saileshd1402 saileshd1402 changed the title [WIP] Added test for create-pytorchjob.ipynb Added test for create-pytorchjob.ipynb Sep 29, 2024
},
"outputs": [],
"source": [
"kubeflow_python_sdk=\"git+https://github.com/kubeflow/training-operator.git#subdirectory=sdk/python\"\n",
Copy link
Contributor Author

@saileshd1402 saileshd1402 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For papermill, all the parameters should be in one cell. But this seems to be affecting the readability of the user. How should we improve this?
cc: @andreyvelich @tenzen-y @Electronic-Waste

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshaychitneni @shravan-achar Any thoughts on papermill usage ?
We can pass parameters using parameters_yaml only to the single Notebook cell ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich We can pass parameters to multiple Notebook cells. But they should have metadata parameters declared and only have parameter definition statements in case of miss injections:)

@saileshd1402 saileshd1402 changed the title Added test for create-pytorchjob.ipynb Added test for create-pytorchjob.ipynb python notebook Sep 30, 2024
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM. Thanks for your great contributions! @saileshd1402

I will leave it for @kubeflow/wg-training-leads.

/lgtm

},
"outputs": [],
"source": [
"kubeflow_python_sdk=\"git+https://github.com/kubeflow/training-operator.git#subdirectory=sdk/python\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich We can pass parameters to multiple Notebook cells. But they should have metadata parameters declared and only have parameter definition statements in case of miss injections:)

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort @saileshd1402 and sorry for the late reply!
Can you update the Notebook so it can be rendered in the GitHub view ?

Comment on lines 67 to 70
- name: Run Jupyter Notebook with Papermill
shell: bash
run: |
papermill ${{ inputs.notebook-input }} ${{ inputs.notebook-output }} -p kubeflow_python_sdk "./sdk/python" --parameters_yaml "${{ inputs.papermill-args-yaml }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a bash script that execute our Notebooks ? In that case, you don't need to have this template action, but rather just have another GitHub action that runs this bash script.
I think that would be useful for folks, who want to tests it locally in their environment.
I was thinking that for V2, we can put similar script to execute example Notebooks under /test/e2e/notebooks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor Author

@saileshd1402 saileshd1402 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to check which steps we should run in the bash script and which ones in github actions. For example: creating kind cluster, building, deploying training operator, etc is specific to CI. I think it would be a good idea to keep all the setup steps in github actions itself and the script can install papermill and run the notebooks.

We can also make a "Setup E2E" github action template with steps from integration-tests.yaml so that we can use it for notebooks as well and other e2e tests in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saileshd1402 I think, what @andreyvelich meant is that we can replace the papermill <arg1> <arg2> command in "Run Jupyter Notebook with Papermill" step with a bash script like ./test-notebook.sh <arg1> <arg2>. In this way, users can run this script directly with just a few necessary args passed into the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yes, just wanted to make sure. thanks!

Copy link
Member

@andreyvelich andreyvelich Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating kind cluster, building, deploying training operator, etc is specific to CI. I think it would be a good idea to keep all the setup steps in github actions itself and the script can install papermill and run the notebooks.

Yes, I agree with you @saileshd1402. Let's use the GitHub action to configure cluster and deploy Training Operator.
We should re-use the same template as we use for our e2e tests as you said.

@google-oss-prow google-oss-prow bot removed the lgtm label Dec 2, 2024
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM! I left some initial comments for you! @saileshd1402

Comment on lines 54 to 55
# Install Python dependencies to run Jupyter Notebooks with papermill
pip install jupyter ipykernel papermill==2.6.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, it might be better if we could move these dependencies to test-example-notebooks.yaml because users might have installed these dependencies and it would be better if we could maintain the version of these packages in CI. WDYT @saileshd1402 @kubeflow/wg-training-leads ?

Copy link
Contributor Author

@saileshd1402 saileshd1402 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll pin the versions. And set requirements as arguments

Copy link
Member

@Electronic-Waste Electronic-Waste Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe installing them in the test-example-notebooks.yaml is better, since we assume that users already have jupyter installed if they want to run the example. As for the papermill, I think we can notify them of the dependency on papermill with README file in the future, just FYR.

Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
@saileshd1402 saileshd1402 force-pushed the pytorch-job-notebook-test branch from 4c135d5 to caeffab Compare December 2, 2024 14:43
@google-oss-prow google-oss-prow bot added size/XXL and removed size/L labels Dec 2, 2024
Signed-off-by: sailesh duddupudi <[email protected]>
@google-oss-prow google-oss-prow bot added size/L and removed size/XXL labels Dec 2, 2024
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @saileshd1402!

strategy:
fail-fast: false
matrix:
kubernetes-version: ["v1.28.7"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubeflow/wg-training-leads @Electronic-Waste @saileshd1402 Do we want to run our Notebooks on all supported Kubernetes version similar to E2E tests:

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you. We should expand the tests to all supported Kubernetes versions.

run: |
./scripts/run-notebook.sh \
-i ./examples/pytorch/image-classification/create-pytorchjob.ipynb \
-o ./examples/pytorch/image-classification/create-pytorchjob-output.ipynb \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to use output in the tests ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the output notebook is not being used, I'll not set it here

case "$opt" in
i) NOTEBOOK_INPUT="$OPTARG" ;; # -i for notebook input path
o) NOTEBOOK_OUTPUT="$OPTARG" ;; # -o for notebook output path
p) PAPERMILL_PARAMS+=("$OPTARG") ;; # -p for papermill parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you name other papermill parameter as -k, should we name the namespace parameter as -n?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saileshd1402 Please can you check it, so we can merge the PR ?

Copy link
Contributor Author

@saileshd1402 saileshd1402 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it already I think. Can you please check the latest commits once?

Copy link
Member

@andreyvelich andreyvelich Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you should remove -p parameter from the flags of this script since it is no longer needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. I mean this part:

for param in "${PAPERMILL_PARAMS[@]}"; do
papermill_cmd="$papermill_cmd -p $param"
done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh understood, you are saying let's remove custom parameters of papermill in this script. I think we may use it in the future but I guess we can add if and when necessary. I'll remove it for this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's add them in the future once we need them.

echo " -o Output notebook (required)"
echo " -p Papermill parameters (optional), pass param name and value pair (in quotes whitespace separated)"
echo " -y Papermill parameters YAML file (optional)"
echo " -k Kubeflow Training Operator Python SDK (optional)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to name it as -sdk to make it clearer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation using "getopts" accepts only one char argument names. I used this so the args parsing is short and clean. I can do longer names as well, but should I update the other args to have longer names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I think, it's fine to keep it as -k in that case.

echo " -i Input notebook (required)"
echo " -o Output notebook (required)"
echo " -p Papermill parameters (optional), pass param name and value pair (in quotes whitespace separated)"
echo " -y Papermill parameters YAML file (optional)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pass the parameters to papermill through yaml file as well, we are not using right now, but might use in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce this parameter once we get feedback from users/developers that we need it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure okay, I will remove it for now

NOTEBOOK_OUTPUT=""
PAPERMILL_PARAMS=()
PAPERMILL_PARAM_YAML=""
TRAINING_PYTHON_SDK="git+https://github.com/kubeflow/training-operator.git#subdirectory=sdk/python"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we install the Python SDK directly from the repo by default, so developers can quickly test their SDK changes on one of the Notebooks?
Similar to how you run this script in the tests.

@@ -0,0 +1,82 @@
#!/bin/bash

# Copyright 2021 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the licence

@andreyvelich
Copy link
Member

/assign @akshaychitneni

Copy link

@andreyvelich: GitHub didn't allow me to assign the following users: akshaychitneni.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @akshaychitneni

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@saileshd1402 saileshd1402 force-pushed the pytorch-job-notebook-test branch from 1ee4d16 to 4ea4bde Compare December 2, 2024 16:17
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
@saileshd1402
Copy link
Contributor Author

@andreyvelich @Electronic-Waste, Thank you for the review comments. I've addressed them, please have a look and let me know if they look good. Thanks!

Comment on lines 4 to 10
inputs:
kubernetes-version:
required: true
description: kubernetes version
python-version:
required: true
description: Python version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the matrix with Kubernetes and Python versions as part of our setup-e2e-test template ?
Right now, we set it in the integration-tests.yaml.
So we can keep it consistent for our E2Es + Notebooks tests.
WDYT @tenzen-y @Electronic-Waste @saileshd1402 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing is that if there are other steps that need some of these versions, for example gang-scheduler-name here, we will need to export them as environments variables to access in subsequent steps using GITHUB_ENV. Is there any other way or is this fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I guess, we only use scheduler plugins for integration tests.
@kubeflow/wg-training-leads @saileshd1402 do we want to tests our Notebooks with various scheduling plugins as well ?
Or we want to limit the tests that we run with gang-scheduling ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the matrix with Kubernetes and Python versions as part of our setup-e2e-test template ?
Right now, we set it in the integration-tests.yaml.
So we can keep it consistent for our E2Es + Notebooks tests.
WDYT @tenzen-y @Electronic-Waste @saileshd1402 ?

I think so. It will be better if we could execute e2e tests with multiple Kubernetes and Python versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, maybe gang-scheduling could be limited to only intergration-test.yaml since it is a bit redundant to test them again in notebook tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saileshd1402 Maybe to unblock this PR, we can just use GITHUB_ENV for now and set the gang-scheduler only for integration tests.
For the V2 tests, we can come back to this discussion.

Copy link
Contributor Author

@saileshd1402 saileshd1402 Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that we can't use matrix inside a single composite action. It can only be used in jobs/workflows files. This is because composite action avoids duplication of steps but can't be used to create more jobs like a workflow file. There is also Reusable Workflows but that can't be used in this case since it's spawns separate workflow to run the template, which means that we can't use it to setup environment of current job. Related docs: matrix strategies and composite actions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for checking!

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM!

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Dec 5, 2024
Signed-off-by: sailesh duddupudi <[email protected]>
@google-oss-prow google-oss-prow bot removed the lgtm label Dec 9, 2024
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort @saileshd1402!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 56cbe60 into kubeflow:master Dec 9, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.