-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mix training and inference infra and manifests #1487
Mix training and inference infra and manifests #1487
Conversation
… gemma deployment manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the src
folder?
Done |
|
||
|
||
gcloud artifacts repositories add-iam-policy-binding fine-tuning \ | ||
--role=roles/artifactregistry.reader \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this anymore, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, not needed anymore. Corrected
#internalCertManagement: | ||
# enable: false | ||
# webhookServiceName: "" | ||
# webhookSecretName: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: There are some lines commented out through out this file. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the team felt it was important to keep the context of the comments.
containers: | ||
- name: gpu-job | ||
imagePullPolicy: Always | ||
image: us-docker.pkg.dev/google-samples/containers/gke/gemma-fine-tuning:v1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note for future developers (any passerby): This source code for this image exists at https://github.com/GoogleCloudPlatform/accelerated-platforms/tree/974f2eff748d00d2566024d6ec4dd7f309f641c5/use-cases/model-fine-tuning-pipeline/fine-tuning/pytorch/src.
|
||
|
||
cd gke-platform | ||
sed -ie 's/"deletion_protection": true/"deletion_protection": false/g' terraform.tfstate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick (feel free to ignore): Consider setting deletion_protection
to false
from the very start (in the google_container_cluster resources), since all Terraform in this repo are to be cleaned up at the end of the tutorial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought (no immediate action needed). This is out-of-scope for this pull-request, but we should consider modularizing the Terraform in this git repo. These gke_standard and gke_autopilot folders look similar to existing gke_standard and existing gke_autopilot. I'll add a comment in #861.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work on these samples! 👏
Looks good to me.
I trust you've tested these samples appropriately for functionality.
Left a few comments, but nothing major.
Judging from internal discussions, I'm guessing this is ready for merge (even though this PR is still in draft mode). Merging...
Description
This PR adds samples for a tutorial related to mixed training and inference in a single cluster.
Tasks