-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test for create-pytorchjob.ipynb python notebook #2274
Merged
google-oss-prow
merged 66 commits into
kubeflow:master
from
saileshd1402:pytorch-job-notebook-test
Dec 9, 2024
+204
−51
Merged
Changes from all commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
c90bbaf
Added test for create-pytorchjob.ipynb
saileshd1402 f8fd24c
fix yaml syntax
saileshd1402 89023ce
Fix uses path
saileshd1402 62be575
Add actions/checkout
saileshd1402 9ea7155
Add bash to action.yaml
saileshd1402 da99ec8
Install pip dependencies step
saileshd1402 4595f32
Add quotes for args
saileshd1402 8b744b1
Add jupyter
saileshd1402 c6d1925
Add nbformat_minor: 5 to fix invalid format error
saileshd1402 1124ee8
Fix job name
saileshd1402 f882cf3
test papermill-args-yaml
saileshd1402 5494fb1
testing multi line args
saileshd1402 eb7c4be
testing multi line args1
saileshd1402 93b6c66
testing multi line args2
saileshd1402 e5aca68
testing multi line args3
saileshd1402 c8b1aff
Parameterize sdk install
saileshd1402 9145412
Remove unnecessary output
saileshd1402 e704b7f
nbformat normailze
saileshd1402 dc6a517
[SDK] Training Client Conditions related unit tests (#2253)
Bobbins228 c0b64e0
[SDK] test: add unit test for list_jobs method of the training_client…
seanlaii 2e7d3c2
KEP-2170: Generate clientset, openapi spec for the V2 APIs (#2273)
varshaprasad96 040ba8f
[SDK] Use torchrun to create PyTorchJob from function (#2276)
andreyvelich f20969b
[SDK] test: add unit test for get_job_logs method of the training_cli…
seanlaii 4ff5052
[v2alpha] Move GV related codebase (#2281)
varshaprasad96 24cea1b
KEP-2170: Implement runtime framework (#2248)
tenzen-y 936620d
Add DeepSpeed Example with Pytorch Operator (#2235)
Syulin7 cdbc22e
KEP-2170: Rename TrainingRuntimeRef to RuntimeRef API (#2283)
andreyvelich 5692b53
KEP-2170: Adding CEL validations on v2 TrainJob CRD (#2260)
akshaychitneni e6954eb
Upgrade Deepspeed demo dependencies (#2294)
Syulin7 009f207
KEP-2170: Add manifests for Kubeflow Training V2 (#2289)
andreyvelich 7793706
FSDP Example for T5 Fine-Tuning and PyTorchJob (#2286)
andreyvelich 7f61c50
KEP-2170: Implement TrainJob Reconciler to manage objects (#2295)
tenzen-y 13dcb6b
Remove Prometheus Monitoring doc (#2301)
sophie0730 b4c0d40
KEP-2170: Decouple JobSet from TrainJob (#2296)
tenzen-y d315aa2
KEP-2170: Strictly verify the CRD marker validation and defaulting in…
tenzen-y 4d4d2c8
KEP-2170: Initialize runtimes before the manager starts (#2306)
tenzen-y 82d0535
KEP-2170: Generate Python SDK for Kubeflow Training V2 (#2310)
andreyvelich 32854c0
KEP-2170: Create model and dataset initializers (#2303)
andreyvelich 6df87f9
KEP-2170: Implement JobSet, PlainML, and Torch Plugins (#2308)
andreyvelich ce2febf
KEP-2170: Implement Initializer builders in the JobSet plugin (#2316)
andreyvelich e1505ac
KEP-2170: Add the TrainJob state transition design (#2298)
tenzen-y ec176e3
Update tf job examples to tf v2 (#2270)
YosiElias cc0ef4d
KEP-2170: Add TrainJob conditions (#2322)
tenzen-y 3f5c458
Pin Gloo repository in JAX Dockerfile to a specific commit (#2329)
sandipanpanda 94b8414
[fix] Resolve v2alpha API exceptions (#2317)
varshaprasad96 ceb4369
Upgrade Kubernetes to v1.30.7 (#2332)
astefanutti 0c4a8d2
Ignore cache exporting errors in the image building workflows (#2336)
tenzen-y 83da2af
KEP-2170: Add Torch Distributed Runtime (#2328)
andreyvelich b5a8a72
Refine the server-side apply installation args (#2337)
tenzen-y 05baf72
Add openapi-generator CLI option to skip SDK v2 test generation (#2338)
astefanutti 618bf6e
Upgrade kustomization files to Kustomize v5 (#2326)
oksanabaza 1bb35da
Pin accelerate package version in trainer (#2340)
gavrissh 745c445
Replace papermill command with bash script
saileshd1402 0cd3791
Typo fix
saileshd1402 651672d
Move Checkout step outside action.yaml file
saileshd1402 e607e6d
Add newline EOF in script
saileshd1402 0540b90
Pass python dependencies as args and pin versions
saileshd1402 8c7f517
Update Usage
saileshd1402 caeffab
Install dependencies in yaml
saileshd1402 b545c80
merge conflit fix
saileshd1402 87999f1
fix ipynb
saileshd1402 0ee9ca5
set bash flags
saileshd1402 4ea4bde
Update script args and add more kubernetes versions for tests
saileshd1402 72dd617
add gang-scheduler-name to template
saileshd1402 d3e9031
move go setup to template
saileshd1402 21a6129
remove -p parameter from script
saileshd1402 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
name: Setup E2E test template | ||
description: A composite action to setup e2e tests | ||
|
||
inputs: | ||
kubernetes-version: | ||
required: true | ||
description: Kubernetes version | ||
python-version: | ||
required: true | ||
description: Python version | ||
gang-scheduler-name: | ||
required: false | ||
default: "none" | ||
description: Gang scheduler name | ||
|
||
runs: | ||
using: composite | ||
steps: | ||
- name: Free-Up Disk Space | ||
uses: ./.github/workflows/free-up-disk-space | ||
|
||
- name: Setup Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ inputs.python-version }} | ||
|
||
- name: Setup Go | ||
uses: actions/setup-go@v5 | ||
with: | ||
go-version-file: go.mod | ||
|
||
- name: Create k8s Kind Cluster | ||
uses: helm/kind-action@9fdad0686e6f19fcd572f62516f5e0436f562ee7 | ||
with: | ||
node_image: kindest/node:${{ inputs.kubernetes-version }} | ||
cluster_name: training-operator-cluster | ||
kubectl_version: ${{ inputs.kubernetes-version }} | ||
|
||
- name: Build training-operator | ||
shell: bash | ||
run: | | ||
./scripts/gha/build-image.sh | ||
env: | ||
TRAINING_CI_IMAGE: kubeflowtraining/training-operator:test | ||
|
||
- name: Deploy training operator | ||
shell: bash | ||
run: | | ||
./scripts/gha/setup-training-operator.sh | ||
docker system prune -a -f | ||
docker system df | ||
df -h | ||
env: | ||
KIND_CLUSTER: training-operator-cluster | ||
TRAINING_CI_IMAGE: kubeflowtraining/training-operator:test | ||
GANG_SCHEDULER_NAME: ${{ inputs.gang-scheduler-name }} | ||
KUBERNETES_VERSION: ${{ inputs.kubernetes-version }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
name: Test example notebooks | ||
|
||
on: | ||
- pull_request | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.ref }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
create-pytorchjob-notebook-test: | ||
runs-on: ubuntu-latest | ||
timeout-minutes: 30 | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
kubernetes-version: ["v1.28.7", "v1.29.2", "v1.30.6"] | ||
python-version: ["3.9", "3.10", "3.11"] | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup E2E Tests | ||
uses: ./.github/workflows/setup-e2e-test | ||
with: | ||
kubernetes-version: ${{ matrix.kubernetes-version }} | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install Python Dependencies | ||
run: | | ||
pip install papermill==2.6.0 jupyter==1.1.1 ipykernel==6.29.5 | ||
|
||
- name: Run Jupyter Notebook with Papermill | ||
shell: bash | ||
run: | | ||
./scripts/run-notebook.sh \ | ||
-i ./examples/pytorch/image-classification/create-pytorchjob.ipynb \ | ||
-n default \ | ||
-k ./sdk/python |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
#!/bin/bash | ||
|
||
# Copyright 2024 The Kubeflow Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# This bash script is used to run the example notebooks | ||
|
||
set -o errexit | ||
set -o nounset | ||
set -o pipefail | ||
|
||
NOTEBOOK_INPUT="" | ||
NOTEBOOK_OUTPUT="-" # outputs to console | ||
NAMESPACE="default" | ||
TRAINING_PYTHON_SDK="./sdk/python" | ||
|
||
usage() { | ||
echo "Usage: $0 -i <input_notebook> -o <output_notebook> [-p \"<param> <value>\"...] [-y <params.yaml>]" | ||
echo "Options:" | ||
echo " -i Input notebook (required)" | ||
echo " -o Output notebook (required)" | ||
echo " -k Kubeflow Training Operator Python SDK (optional)" | ||
echo " -n Kubernetes namespace used by tests (optional)" | ||
echo " -h Show this help message" | ||
echo "NOTE: papermill, jupyter and ipykernel are required Python dependencies to run Notebooks" | ||
exit 1 | ||
} | ||
|
||
while getopts "i:o:p:k:n:r:d:h:" opt; do | ||
case "$opt" in | ||
i) NOTEBOOK_INPUT="$OPTARG" ;; # -i for notebook input path | ||
o) NOTEBOOK_OUTPUT="$OPTARG" ;; # -o for notebook output path | ||
k) TRAINING_PYTHON_SDK="$OPTARG" ;; # -k for training operator python sdk | ||
n) NAMESPACE="$OPTARG" ;; # -n for kubernetes namespace used by tests | ||
h) usage ;; # -h for help (usage) | ||
*) usage; exit 1 ;; | ||
esac | ||
done | ||
|
||
if [ -z "$NOTEBOOK_INPUT" ]; then | ||
echo "Error: -i notebook input path is required." | ||
exit 1 | ||
fi | ||
|
||
papermill_cmd="papermill $NOTEBOOK_INPUT $NOTEBOOK_OUTPUT -p training_python_sdk $TRAINING_PYTHON_SDK -p namespace $NAMESPACE" | ||
|
||
if ! command -v papermill &> /dev/null; then | ||
echo "Error: papermill is not installed. Please install papermill to proceed." | ||
exit 1 | ||
fi | ||
|
||
echo "Running command: $papermill_cmd" | ||
$papermill_cmd | ||
|
||
if [ $? -ne 0 ]; then | ||
echo "Error: papermill execution failed." >&2 | ||
exit 1 | ||
fi | ||
|
||
echo "Notebook execution completed successfully" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to name it as
-sdk
to make it clearer ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation using "getopts" accepts only one char argument names. I used this so the args parsing is short and clean. I can do longer names as well, but should I update the other args to have longer names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I think, it's fine to keep it as -k in that case.