Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grantham/add-attach_shm-template #3020

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

granthamtaylor
Copy link
Contributor

@granthamtaylor granthamtaylor commented Dec 26, 2024

Why are the changes needed?

Adding SHM is a necessity for multi-GPU ML training workloads. It is currently not immediately obvious how to do this.

What changes were proposed in this pull request?

This PR simply adds a convenience function to generate a PodTemplate configured to attach SHM to a task.

Additionally, this PR adds a directory for future contributions around similar PodTemplate wrappers in the future.

How was this patch tested?

I have used this function for my workflows to attach SHM.

More tests to be added soon.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR implements and validates shared memory (SHM) attachment functionality for multi-GPU ML training workloads. It introduces a pod_templates package with attach_shm utility function for configuring shared memory in tasks. The implementation includes core functionality and comprehensive testing suite that verifies SHM pod template properties including name, size (5Gi), and proper template attachment.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 1

Copy link

codecov bot commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.72%. Comparing base (bc0e8c0) to head (87aa4a7).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/extras/pod_templates/attach_shm.py 0.00% 4 Missing ⚠️
flytekit/extras/pod_templates/__init__.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3020       +/-   ##
===========================================
+ Coverage   51.35%   75.72%   +24.37%     
===========================================
  Files         204      203        -1     
  Lines       21446    21280      -166     
  Branches     2729     2733        +4     
===========================================
+ Hits        11014    16115     +5101     
+ Misses       9834     4361     -5473     
- Partials      598      804      +206     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 30, 2024

Code Review Agent Run #dd1f99

Actionable Suggestions - 3
  • flytekit/extras/pod_templates/attach_shm.py - 2
    • Consider adding size parameter validation · Line 4-4
    • Potential insecure temp file usage · Line 16-16
  • tests/flytekit/unit/extras/templates/test_pod_templates.py - 1
    • Test lacks assertions for functionality verification · Line 4-10
Review Details
  • Files reviewed - 3 · Commit Range: a77aab8..44e2ea3
    • flytekit/extras/pod_templates/__init__.py
    • flytekit/extras/pod_templates/attach_shm.py
    • tests/flytekit/unit/extras/templates/test_pod_templates.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 30, 2024

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Shared Memory Pod Template Implementation

__init__.py - Initialize pod_templates package with attach_shm functionality

attach_shm.py - Implement attach_shm function to create PodTemplate with shared memory configuration

test_pod_templates.py - Add unit tests for attach_shm functionality

from flytekit.core.pod_template import PodTemplate


def attach_shm(name: str, size: str) -> PodTemplate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding size parameter validation

The attach_shm function parameters lack type validation. Consider validating that size follows Kubernetes resource quantity format (e.g., '1Gi', '500Mi') to prevent runtime errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
def attach_shm(name: str, size: str) -> PodTemplate:
def attach_shm(name: str, size: str) -> PodTemplate:
import re
size_pattern = r'^[0-9]+(Gi|Mi|Ki|G|M|K)?$'
if not re.match(size_pattern, size):
raise ValueError(f"Invalid size format: {size}. Expected format like '1Gi', '500Mi'")

Code Review Run #dd1f99


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

return PodTemplate(
primary_container_name=name,
pod_spec=V1PodSpec(
containers=[V1Container(name=name, volume_mounts=[V1VolumeMount(mount_path="/dev/shm", name="dshm")])],
Copy link
Contributor

@flyte-bot flyte-bot Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential insecure temp file usage

The hardcoded path '/dev/shm' could potentially be insecure. Consider making this path configurable or validating it before use.

Code suggestion
Check the AI-generated fix before applying
 @@ -4,6 +4,6 @@
 def attach_shm(name: str, size: str) -> PodTemplate:
 -            containers=[V1Container(name=name, volume_mounts=[V1VolumeMount(mount_path="/dev/shm", name="dshm")])],
 +            containers=[V1Container(name=name, volume_mounts=[V1VolumeMount(mount_path="/dev/shm", name="dshm", read_only=True)])],

Code Review Run #dd1f99


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be API for PodTemplate itself, to allow for mutating an existing pod template?

from flytekit.core.pod_template import PodTemplate

pod_template = (
    PodTemplate()
        .with_container(...)
        .with_shim(...)
)

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 30, 2024

Code Review Agent Run #8c9989

Actionable Suggestions - 1
  • tests/flytekit/unit/extras/templates/test_pod_templates.py - 1
    • Missing task decorator for pod template · Line 14-14
Review Details
  • Files reviewed - 1 · Commit Range: 44e2ea3..067c902
    • tests/flytekit/unit/extras/templates/test_pod_templates.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

pass

# Verify pod template is attached to task
assert my_task.pod_template == shm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing task decorator for pod template

Consider adding a decorator to attach the pod template to my_task before asserting. Currently, the test may fail as pod_template might not be properly attached.

Code suggestion
Check the AI-generated fix before applying
 @@ -10,5 +10,6 @@
      def my_task():
          pass
 
 +    my_task = task(pod_template=shm)(my_task)
      # Verify pod template is attached to task
      assert my_task.pod_template == shm

Code Review Run #8c9989


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@granthamtaylor
Copy link
Contributor Author

Should this be API for PodTemplate itself, to allow for mutating an existing pod template?

from flytekit.core.pod_template import PodTemplate



pod_template = (

    PodTemplate()

        .with_container(...)

        .with_shim(...)

)

I had put this together as a MVP to add SHM to a pod, but I would like this more extensible solution. I think that it could allow for the more intuitive ability to define SHM within flytekit.Resources

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding shm to resources sounds like a good idea but it's a bigger change (i think i'd want to implement it on the backend) so a helper function in the sdk i think is good until we have that.

from flytekit.core.pod_template import PodTemplate


def attach_shm(name: str, size: str) -> PodTemplate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make name -> primary_container_name? i thought it was the name of the mount

from flytekit.core.pod_template import PodTemplate


def attach_shm(name: str, size: str) -> PodTemplate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re thomas's comment, i was also thinking you were mutating a pod template. if it's just generating a default pod template with an shm, should we rename the function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants