Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api): Change default GPU Request value and prevent null values from being accepted #533

Merged
merged 2 commits into from
Feb 14, 2024

Conversation

deadlycoconuts
Copy link
Contributor

Description

As of present, users are allowed to submit null or 0 values as a GPU request value even when a GPU is configured (when the GPU name field is set) for Merlin models. This causes the Merlin model to be deployed on a regular default node without the proper node selectors or tolerations applied which is an unexpected behaviour from users. In no scenario would a user intend to have 0 GPUs selected for a specific GPU type when he/she intends to use a GPU. However, this is not prevented by the API server nor the UI.

A user using the UI to change his/her GPU configuration might thus potentially end up redeploying his/her model in a regular node despite thinking that it has been configured to work with a GPU.

This PR addresses this issue with two approaches, 1) to prevent the 'None' GPU request value from being shown in the UI as a default value when a GPU is selected and to show the first available GPU request option instead and, 2) to make the API server throw an error if it receives any deployment request not having a GPU request value set when a GPU is configured.

The snippet below shows how the UI changes would look like:

Screen.Recording.2024-02-08.at.2.55.07.PM.mov

Modifications

  • ui/src/pages/version/components/forms/components/ResourcesPanel.js - Display first available GPU request value (set in the Helm chart values of the Merlin API server deployment) when a new GPU name is selected
  • api/cluster/resource/templater.go - Add an additional check to ensure that the GPU request value is not 0 when a GPU is set

Checklist

  • Added PR label
  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduces API changes

Release Notes

Users are now no longer able to select a GPU without setting a non-zero GPU request value.

@deadlycoconuts deadlycoconuts added the bug Something isn't working label Feb 8, 2024
@deadlycoconuts deadlycoconuts self-assigned this Feb 8, 2024
@ghost
Copy link

ghost commented Feb 8, 2024

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

Copy link
Contributor

@leonlnj leonlnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks

@deadlycoconuts deadlycoconuts force-pushed the change_default_gpu_request_value branch from d28b59c to 3252d76 Compare February 14, 2024 06:46
@deadlycoconuts deadlycoconuts merged commit a999a9c into main Feb 14, 2024
32 checks passed
@deadlycoconuts deadlycoconuts deleted the change_default_gpu_request_value branch February 14, 2024 09:54
leonlnj pushed a commit that referenced this pull request Feb 20, 2024
…om being accepted (#533)

# Description
As of present, users are allowed to submit null or `0` values as a GPU
request value even when a GPU is configured (when the GPU name field is
set) for Merlin models. This causes the Merlin model to be deployed on a
regular default node without the proper node selectors or tolerations
applied which is an unexpected behaviour from users. In no scenario
would a user intend to have 0 GPUs selected for a specific GPU type when
he/she intends to use a GPU. However, this is not prevented by the API
server nor the UI.

A user using the UI to change his/her GPU configuration might thus
potentially end up redeploying his/her model in a regular node despite
thinking that it has been configured to work with a GPU.

This PR addresses this issue with two approaches, 1) to prevent the
'None' GPU request value from being shown in the UI as a default value
when a GPU is selected and to show the first available GPU request
option instead and, 2) to make the API server throw an error if it
receives any deployment request not having a GPU request value set when
a GPU is configured.

The snippet below shows how the UI changes would look like:


https://github.com/caraml-dev/merlin/assets/36802364/e599b373-888c-45b8-9424-85282e035465

# Modifications
- `ui/src/pages/version/components/forms/components/ResourcesPanel.js` -
Display first available GPU request value (set in the Helm chart values
of the Merlin API server deployment) when a new GPU name is selected
- `api/cluster/resource/templater.go` - Add an additional check to
ensure that the GPU request value is not 0 when a GPU is set

# Checklist
- [x] Added PR label
- [x] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduces API
changes

# Release Notes
```release-note
Users are now no longer able to select a GPU without setting a non-zero GPU request value.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants