Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SP-4263 Implement compute quotas #719

Open
wants to merge 67 commits into
base: main
Choose a base branch
from

Conversation

anujachaitanya
Copy link
Contributor

@anujachaitanya anujachaitanya commented Oct 28, 2024

Changes

  1. Created separate helm chart for queues creation
  2. Added variable for localqueue in launch files for dynamic selection
  3. Localqueue selection based on user groups
  4. Introduced KubectlCommandBuilder
  5. Updated kueue version to 0.9.1

CC: @abhishekghoshhh

anujachaitanya and others added 29 commits November 11, 2024 11:15
Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 21.00%. Comparing base (5bcd4b2) to head (74d33ee).

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #719      +/-   ##
============================================
+ Coverage     18.61%   21.00%   +2.39%     
- Complexity      102      129      +27     
============================================
  Files            22       24       +2     
  Lines          1961     2061     +100     
  Branches        270      278       +8     
============================================
+ Hits            365      433      +68     
- Misses         1544     1574      +30     
- Partials         52       54       +2     
Flag Coverage Δ
skaha-unittests-coverage 21.00% <ø> (+2.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anujachaitanya anujachaitanya force-pushed the main branch 4 times, most recently from 61416bd to 592b50d Compare November 25, 2024 10:25
@anujachaitanya
Copy link
Contributor Author

Any updates on this @at88mph ?

@shinybrar
Copy link
Collaborator

shinybrar commented Dec 13, 2024

@anujachaitanya @abhishekghoshhh

Can you shed light on the reason to use transfer the main Kueue Helm Chart into this repository? We can potentially use the upstream Kueue Helm Chart as a dependency in the base chart, and appropriately configure it.

@anujachaitanya
Copy link
Contributor Author

@anujachaitanya @abhishekghoshhh

Can you shed light on the reason to use transfer the main Kueue Helm Chart into this repository? We can potentially use the upstream Kueue Helm Chart as a dependency in the base chart, and appropriately configure it.

Hello
@shinybrar, Kueue needs kueue-system as the namespace to run. We tried adding kueue as a dependency to the base helm chart but helm does not provide the functionality to configure namespaces for dependencies. Hence we could not add kueue as dependency.

@shinybrar
Copy link
Collaborator

shinybrar commented Dec 13, 2024

@shinybrar, Kueue needs kueue-system as the namespace to run. We tried adding kueue as a dependency to the base helm chart but helm does not provide the functionality to configure namespaces for dependencies. Hence we could not add kueue as dependency.

We can require to have a separate installation command in install documentation for kueue, e.g.

helm install kueue/chart/location/ --create-namespace --namespace kueue-system my-override-values.yaml

Ideally, we want to have just one amended values.yaml, rather than merging the entire upstream kueue helm chart into mainline, because currently, if kubernetes-sigs/kueue releases a new feature, we have to modify/sync the entire chart for compatibility, rather than enable a config or feature flag in our values.yaml.

It is not strictly required to install kueue in a namespace named kueue-system, even though it is the default and recommended setup and all official manifests and documentation refer back to it, so we can also install it in skaha-system namespace if needed.

Finally, queue-config should not be a helm chart, but rather just sample templates under the Base Helm Chart which are created only when the helm installer detects a Kueue CRD installed on the system, and similarly there should be a feature flag set in skaha via helm install as a environment variable, which enables kueue.

SKAHA_KUEUE_ENABLES=TRUE

Since every cluster, will have a different hardware topology, ResourceFlavors and ClusterQueues we cannot make a hard requirement. Every SRCnet Node administrator, should amend the the Base Helm Chart, with their respective deployments environments.

Finally, we are adding a significantly large system, and we need a documentation effort for this feature.

@anujachaitanya
Copy link
Contributor Author

We can require to have a separate installation command in install documentation for kueue, e.g.

When we first developed the Kueue feature for CANFAR, Kueue didn't have a Helm repository for installation, so we had to manually add the Helm repo. However, since the latest version of Kueue now includes a Helm repo, we can use a separate installation process.

Finally, queue-config should not be a helm chart

This Helm chart has been added for administrators to create the clusterqueue and localqueues. To introduce multiple groups, the configuration will need to be modified. If the configuration is included in the base, the base will need to be redeployed each time a queue is created. Therefore, the responsibility has been separated to avoid unnecessary redeployments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants