Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change gpu_device_plugin_yaml default value #107

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Dec 2, 2024

This change is a breaking change. Since we are under 1.0, it feels right moment to introduce this type of change.

Previously, by default, the Ray Provider would force the creation of a GPU device plugin using https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml whenever setting up a RayCluster. This does not apply to most users, and it would raise errors for many people trying out the provider.

As an example, users who didn't have an Nvidia device available would face the following errors:

[2024-11-29, 15:24:16 UTC] {hooks.py:630} WARNING - DaemonSet not found: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': '5397225a-4ce2-4f65-81ba-2677b315fedb', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '955e8bb0-08b1-4d45-a768-e49387a9767c', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'd5240328-288d-4366-b094-d8fd793c7431', 'Date': 'Fri, 29 Nov 2024 15:24:16 GMT', 'Content-Length': '260'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"daemonsets.apps \"nvidia-device-plugin-daemonset\" not found","reason":"NotFound","details":{"name":"nvidia-device-plugin-daemonset","group":"apps","kind":"daemonsets"},"code":404}
[2024-11-29, 15:24:16 UTC] {hooks.py:427} INFO - Creating DaemonSet for NVIDIA device plugin...
[2024-11-29, 15:24:16 UTC] {hooks.py:653} ERROR - Exception when creating DaemonSet: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b8360148-5f7c-4060-ae2c-424d9ac13a8c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '955e8bb0-08b1-4d45-a768-e49387a9767c', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'd5240328-288d-4366-b094-d8fd793c7431', 'Date': 'Fri, 29 Nov 2024 15:24:16 GMT', 'Content-Length': '200'})

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.53%. Comparing base (c52202a) to head (7d2e3af).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #107   +/-   ##
=======================================
  Coverage   98.52%   98.53%           
=======================================
  Files           7        7           
  Lines         610      613    +3     
=======================================
+ Hits          601      604    +3     
  Misses          9        9           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana force-pushed the gpu-device-plugin-yaml-default-value branch from d955e34 to 9bf426a Compare December 2, 2024 14:07
Copy link
Collaborator

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! Should we include the breaking changelog in this PR, or wait until the release to add it?

@tatiana tatiana merged commit 18182c1 into main Dec 3, 2024
22 checks passed
@pankajastro pankajastro deleted the gpu-device-plugin-yaml-default-value branch December 3, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants