-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test for create-pytorchjob.ipynb python notebook #2274
Added test for create-pytorchjob.ipynb python notebook #2274
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Pull Request Test Coverage Report for Build 12236369721Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
06c3b4b
to
67a6b83
Compare
}, | ||
"outputs": [], | ||
"source": [ | ||
"kubeflow_python_sdk=\"git+https://github.com/kubeflow/training-operator.git#subdirectory=sdk/python\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For papermill, all the parameters should be in one cell. But this seems to be affecting the readability of the user. How should we improve this?
cc: @andreyvelich @tenzen-y @Electronic-Waste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshaychitneni @shravan-achar Any thoughts on papermill usage ?
We can pass parameters using parameters_yaml
only to the single Notebook cell ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich We can pass parameters to multiple Notebook cells. But they should have metadata parameters
declared and only have parameter definition statements in case of miss injections:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM. Thanks for your great contributions! @saileshd1402
I will leave it for @kubeflow/wg-training-leads.
/lgtm
}, | ||
"outputs": [], | ||
"source": [ | ||
"kubeflow_python_sdk=\"git+https://github.com/kubeflow/training-operator.git#subdirectory=sdk/python\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich We can pass parameters to multiple Notebook cells. But they should have metadata parameters
declared and only have parameter definition statements in case of miss injections:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this effort @saileshd1402 and sorry for the late reply!
Can you update the Notebook so it can be rendered in the GitHub view ?
- name: Run Jupyter Notebook with Papermill | ||
shell: bash | ||
run: | | ||
papermill ${{ inputs.notebook-input }} ${{ inputs.notebook-output }} -p kubeflow_python_sdk "./sdk/python" --parameters_yaml "${{ inputs.papermill-args-yaml }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create a bash script that execute our Notebooks ? In that case, you don't need to have this template action, but rather just have another GitHub action that runs this bash script.
I think that would be useful for folks, who want to tests it locally in their environment.
I was thinking that for V2, we can put similar script to execute example Notebooks under /test/e2e/notebooks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT @saileshd1402 @Electronic-Waste @kubeflow/wg-training-leads @saileshd1402 @seanlaii @varshaprasad96?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to check which steps we should run in the bash script and which ones in github actions. For example: creating kind cluster, building, deploying training operator, etc is specific to CI. I think it would be a good idea to keep all the setup steps in github actions itself and the script can install papermill and run the notebooks.
We can also make a "Setup E2E" github action template with steps from integration-tests.yaml so that we can use it for notebooks as well and other e2e tests in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saileshd1402 I think, what @andreyvelich meant is that we can replace the papermill <arg1> <arg2>
command in "Run Jupyter Notebook with Papermill" step with a bash script like ./test-notebook.sh <arg1> <arg2>
. In this way, users can run this script directly with just a few necessary args passed into the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay yes, just wanted to make sure. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating kind cluster, building, deploying training operator, etc is specific to CI. I think it would be a good idea to keep all the setup steps in github actions itself and the script can install papermill and run the notebooks.
Yes, I agree with you @saileshd1402. Let's use the GitHub action to configure cluster and deploy Training Operator.
We should re-use the same template as we use for our e2e tests as you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM! I left some initial comments for you! @saileshd1402
scripts/run-notebook.sh
Outdated
# Install Python dependencies to run Jupyter Notebooks with papermill | ||
pip install jupyter ipykernel papermill==2.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, it might be better if we could move these dependencies to test-example-notebooks.yaml
because users might have installed these dependencies and it would be better if we could maintain the version of these packages in CI. WDYT @saileshd1402 @kubeflow/wg-training-leads ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll pin the versions. And set requirements as arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe installing them in the test-example-notebooks.yaml
is better, since we assume that users already have jupyter
installed if they want to run the example. As for the papermill, I think we can notify them of the dependency on papermill
with README file in the future, just FYR.
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
4c135d5
to
caeffab
Compare
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @saileshd1402!
strategy: | ||
fail-fast: false | ||
matrix: | ||
kubernetes-version: ["v1.28.7"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kubeflow/wg-training-leads @Electronic-Waste @saileshd1402 Do we want to run our Notebooks on all supported Kubernetes version similar to E2E tests:
include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with you. We should expand the tests to all supported Kubernetes versions.
run: | | ||
./scripts/run-notebook.sh \ | ||
-i ./examples/pytorch/image-classification/create-pytorchjob.ipynb \ | ||
-o ./examples/pytorch/image-classification/create-pytorchjob-output.ipynb \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to use output in the tests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the output notebook is not being used, I'll not set it here
scripts/run-notebook.sh
Outdated
case "$opt" in | ||
i) NOTEBOOK_INPUT="$OPTARG" ;; # -i for notebook input path | ||
o) NOTEBOOK_OUTPUT="$OPTARG" ;; # -o for notebook output path | ||
p) PAPERMILL_PARAMS+=("$OPTARG") ;; # -p for papermill parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you name other papermill parameter as -k
, should we name the namespace parameter as -n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saileshd1402 Please can you check it, so we can merge the PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it already I think. Can you please check the latest commits once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, you should remove -p
parameter from the flags of this script since it is no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. I mean this part:
training-operator/scripts/run-notebook.sh
Lines 61 to 63 in d3e9031
for param in "${PAPERMILL_PARAMS[@]}"; do | |
papermill_cmd="$papermill_cmd -p $param" | |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh understood, you are saying let's remove custom parameters of papermill in this script. I think we may use it in the future but I guess we can add if and when necessary. I'll remove it for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's add them in the future once we need them.
echo " -o Output notebook (required)" | ||
echo " -p Papermill parameters (optional), pass param name and value pair (in quotes whitespace separated)" | ||
echo " -y Papermill parameters YAML file (optional)" | ||
echo " -k Kubeflow Training Operator Python SDK (optional)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to name it as -sdk
to make it clearer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation using "getopts" accepts only one char argument names. I used this so the args parsing is short and clean. I can do longer names as well, but should I update the other args to have longer names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I think, it's fine to keep it as -k in that case.
scripts/run-notebook.sh
Outdated
echo " -i Input notebook (required)" | ||
echo " -o Output notebook (required)" | ||
echo " -p Papermill parameters (optional), pass param name and value pair (in quotes whitespace separated)" | ||
echo " -y Papermill parameters YAML file (optional)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pass the parameters to papermill through yaml file as well, we are not using right now, but might use in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce this parameter once we get feedback from users/developers that we need it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure okay, I will remove it for now
scripts/run-notebook.sh
Outdated
NOTEBOOK_OUTPUT="" | ||
PAPERMILL_PARAMS=() | ||
PAPERMILL_PARAM_YAML="" | ||
TRAINING_PYTHON_SDK="git+https://github.com/kubeflow/training-operator.git#subdirectory=sdk/python" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we install the Python SDK directly from the repo by default, so developers can quickly test their SDK changes on one of the Notebooks?
Similar to how you run this script in the tests.
scripts/run-notebook.sh
Outdated
@@ -0,0 +1,82 @@ | |||
#!/bin/bash | |||
|
|||
# Copyright 2021 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the licence
/assign @akshaychitneni |
@andreyvelich: GitHub didn't allow me to assign the following users: akshaychitneni. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: sailesh duddupudi <[email protected]>
1ee4d16
to
4ea4bde
Compare
Signed-off-by: sailesh duddupudi <[email protected]>
Signed-off-by: sailesh duddupudi <[email protected]>
@andreyvelich @Electronic-Waste, Thank you for the review comments. I've addressed them, please have a look and let me know if they look good. Thanks! |
inputs: | ||
kubernetes-version: | ||
required: true | ||
description: kubernetes version | ||
python-version: | ||
required: true | ||
description: Python version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set the matrix with Kubernetes and Python versions as part of our setup-e2e-test
template ?
Right now, we set it in the integration-tests.yaml.
So we can keep it consistent for our E2Es + Notebooks tests.
WDYT @tenzen-y @Electronic-Waste @saileshd1402 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small thing is that if there are other steps that need some of these versions, for example gang-scheduler-name here, we will need to export them as environments variables to access in subsequent steps using GITHUB_ENV. Is there any other way or is this fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I guess, we only use scheduler plugins for integration tests.
@kubeflow/wg-training-leads @saileshd1402 do we want to tests our Notebooks with various scheduling plugins as well ?
Or we want to limit the tests that we run with gang-scheduling ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set the matrix with Kubernetes and Python versions as part of our setup-e2e-test template ?
Right now, we set it in the integration-tests.yaml.
So we can keep it consistent for our E2Es + Notebooks tests.
WDYT @tenzen-y @Electronic-Waste @saileshd1402 ?
I think so. It will be better if we could execute e2e tests with multiple Kubernetes and Python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, maybe gang-scheduling could be limited to only intergration-test.yaml
since it is a bit redundant to test them again in notebook tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saileshd1402 Maybe to unblock this PR, we can just use GITHUB_ENV for now and set the gang-scheduler only for integration tests.
For the V2 tests, we can come back to this discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found out that we can't use matrix inside a single composite action. It can only be used in jobs/workflows files. This is because composite action avoids duplication of steps but can't be used to create more jobs like a workflow file. There is also Reusable Workflows but that can't be used in this case since it's spawns separate workflow to run the template, which means that we can't use it to setup environment of current job. Related docs: matrix strategies and composite actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM!
/lgtm
Signed-off-by: sailesh duddupudi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this effort @saileshd1402!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR addresses the issue: #2246
Changes:
kubeflow_python_sdk
so that CI will install local Python SDK for testinggcr.io/kubeflow-ci/pytorch-dist-mnist-test:v1.0
tokubeflow/pytorch-dist-mnist:latest
Checklist: