-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grantham/add-attach_shm-template #3020
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3020 +/- ##
===========================================
+ Coverage 51.35% 75.72% +24.37%
===========================================
Files 204 203 -1
Lines 21446 21280 -166
Branches 2729 2733 +4
===========================================
+ Hits 11014 16115 +5101
+ Misses 9834 4361 -5473
- Partials 598 804 +206 ☔ View full report in Codecov by Sentry. |
Code Review Agent Run #dd1f99Actionable Suggestions - 3
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
from flytekit.core.pod_template import PodTemplate | ||
|
||
|
||
def attach_shm(name: str, size: str) -> PodTemplate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attach_shm
function parameters lack type validation. Consider validating that size
follows Kubernetes resource quantity format (e.g., '1Gi', '500Mi') to prevent runtime errors.
Code suggestion
Check the AI-generated fix before applying
def attach_shm(name: str, size: str) -> PodTemplate: | |
def attach_shm(name: str, size: str) -> PodTemplate: | |
import re | |
size_pattern = r'^[0-9]+(Gi|Mi|Ki|G|M|K)?$' | |
if not re.match(size_pattern, size): | |
raise ValueError(f"Invalid size format: {size}. Expected format like '1Gi', '500Mi'") |
Code Review Run #dd1f99
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
return PodTemplate( | ||
primary_container_name=name, | ||
pod_spec=V1PodSpec( | ||
containers=[V1Container(name=name, volume_mounts=[V1VolumeMount(mount_path="/dev/shm", name="dshm")])], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded path '/dev/shm' could potentially be insecure. Consider making this path configurable or validating it before use.
Code suggestion
Check the AI-generated fix before applying
@@ -4,6 +4,6 @@
def attach_shm(name: str, size: str) -> PodTemplate:
- containers=[V1Container(name=name, volume_mounts=[V1VolumeMount(mount_path="/dev/shm", name="dshm")])],
+ containers=[V1Container(name=name, volume_mounts=[V1VolumeMount(mount_path="/dev/shm", name="dshm", read_only=True)])],
Code Review Run #dd1f99
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Co-authored-by: Flyte Bot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be API for PodTemplate
itself, to allow for mutating an existing pod template?
from flytekit.core.pod_template import PodTemplate
pod_template = (
PodTemplate()
.with_container(...)
.with_shim(...)
)
Code Review Agent Run #8c9989Actionable Suggestions - 1
Review Details
|
pass | ||
|
||
# Verify pod template is attached to task | ||
assert my_task.pod_template == shm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a decorator to attach the pod template to my_task
before asserting. Currently, the test may fail as pod_template
might not be properly attached.
Code suggestion
Check the AI-generated fix before applying
@@ -10,5 +10,6 @@
def my_task():
pass
+ my_task = task(pod_template=shm)(my_task)
# Verify pod template is attached to task
assert my_task.pod_template == shm
Code Review Run #8c9989
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
I had put this together as a MVP to add SHM to a pod, but I would like this more extensible solution. I think that it could allow for the more intuitive ability to define SHM within flytekit.Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding shm to resources sounds like a good idea but it's a bigger change (i think i'd want to implement it on the backend) so a helper function in the sdk i think is good until we have that.
from flytekit.core.pod_template import PodTemplate | ||
|
||
|
||
def attach_shm(name: str, size: str) -> PodTemplate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make name -> primary_container_name
? i thought it was the name of the mount
from flytekit.core.pod_template import PodTemplate | ||
|
||
|
||
def attach_shm(name: str, size: str) -> PodTemplate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re thomas's comment, i was also thinking you were mutating a pod template. if it's just generating a default pod template with an shm, should we rename the function?
Why are the changes needed?
Adding SHM is a necessity for multi-GPU ML training workloads. It is currently not immediately obvious how to do this.
What changes were proposed in this pull request?
This PR simply adds a convenience function to generate a
PodTemplate
configured to attach SHM to a task.Additionally, this PR adds a directory for future contributions around similar
PodTemplate
wrappers in the future.How was this patch tested?
I have used this function for my workflows to attach SHM.
More tests to be added soon.
Check all the applicable boxes
Summary by Bito
This PR implements and validates shared memory (SHM) attachment functionality for multi-GPU ML training workloads. It introduces a pod_templates package with attach_shm utility function for configuring shared memory in tasks. The implementation includes core functionality and comprehensive testing suite that verifies SHM pod template properties including name, size (5Gi), and proper template attachment.Unit tests added: True
Estimated effort to review (1-5, lower is better): 1