Skip to content

Commit

Permalink
[SP-4263 | Abhishek/Anuja]updated the kueue version to 0.9.1
Browse files Browse the repository at this point in the history
  • Loading branch information
anujachaitanya committed Nov 22, 2024
1 parent 08ab56c commit fc8bc76
Show file tree
Hide file tree
Showing 23 changed files with 1,061 additions and 591 deletions.
2 changes: 1 addition & 1 deletion deployment/helm/kueue/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ version: 0.1.0
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "v0.7.1"
appVersion: "v0.9.1"
41 changes: 39 additions & 2 deletions deployment/helm/kueue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,25 @@
- [Installing the chart](#installing-the-chart)
- [Install chart using Helm v3.0+](#install-chart-using-helm-v30)
- [Verify that controller pods are running properly.](#verify-that-controller-pods-are-running-properly)
- [Configuration](#configuration)
<!-- /toc -->


### Installation

Quick start instructions for the setup and configuration of kueue using Helm.

#### Prerequisites

- [Helm](https://helm.sh/docs/intro/quickstart/#install-helm)
- (Optional) [Cert-manager](https://cert-manager.io/docs/installation/)

#### Installing the chart

##### Install chart using Helm v3.0+

```
Either clone the kueue repository:

```bash
$ git clone https://github.com/opencadc/science-platform.git
$ cd science-platform/deployment/helm
$ helm install --create-namespace --namespace kueue-system --values ./kueue/values.yaml <name> ./kueue
Expand All @@ -34,3 +39,35 @@ $ kubectl get deploy -n kueue-system
NAME READY UP-TO-DATE AVAILABLE AGE
kueue-controller-manager 1/1 1 1 7s
```

### Configuration

The following table lists the configurable parameters of the kueue chart and their default values.

| Parameter | Description | Default |
|--------------------------------------------------------|--------------------------------------------------------|---------------------------------------------|
| `nameOverride` | override the resource name | `` |
| `fullnameOverride` | override the resource name | `` |
| `enablePrometheus` | enable Prometheus | `false` |
| `enableCertManager` | enable CertManager | `false` |
| `enableVisibilityAPF` | enable APF for the visibility API | `false` |
| `controllerManager.kubeRbacProxy.image` | controllerManager.kubeRbacProxy's image | `gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0` |
| `controllerManager.manager.image.repository` | controllerManager.manager's repository and image | `us-central1-docker.pkg.dev/k8s-staging-images/kueue/kueue` |
| `controllerManager.manager.image.tag` | controllerManager.manager's tag | `main` |
| `controllerManager.manager.resources` | controllerManager.manager's resources | abbr. |
| `controllerManager.replicas` | ControllerManager's replicaCount | `1` |
| `controllerManager.imagePullSecrets` | ControllerManager's imagePullSecrets | `[]` |
| `controllerManager.readinessProbe.initialDelaySeconds` | ControllerManager's readinessProbe initialDelaySeconds | `5` |
| `controllerManager.readinessProbe.periodSeconds` | ControllerManager's readinessProbe periodSeconds | `10` |
| `controllerManager.readinessProbe.timeoutSeconds` | ControllerManager's readinessProbe timeoutSeconds | `1` |
| `controllerManager.readinessProbe.failureThreshold` | ControllerManager's readinessProbe failureThreshold | `3` |
| `controllerManager.readinessProbe.successThreshold` | ControllerManager's readinessProbe successThreshold | `1` |
| `controllerManager.livenessProbe.initialDelaySeconds` | ControllerManager's livenessProbe initialDelaySeconds | `15` |
| `controllerManager.livenessProbe.periodSeconds` | ControllerManager's livenessProbe periodSeconds | `20` |
| `controllerManager.livenessProbe.timeoutSeconds` | ControllerManager's livenessProbe timeoutSeconds | `1` |
| `controllerManager.livenessProbe.failureThreshold` | ControllerManager's livenessProbe failureThreshold | `3` |
| `controllerManager.livenessProbe.successThreshold` | ControllerManager's livenessProbe successThreshold | `1` |
| `kubernetesClusterDomain` | kubernetesCluster's Domain | `cluster.local` |
| `managerConfig.controllerManagerConfigYaml` | controllerManagerConfigYaml | abbr. |
| `metricsService` | metricsService's ports | abbr. |
| `webhookService` | webhookService's ports | abbr. |
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
{{- if .Values.enableCertManager }}
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "kueue.fullname" . }}-serving-cert
{{- end }}
controller-gen.kubebuilder.io/version: v0.15.0
controller-gen.kubebuilder.io/version: v0.16.5
name: admissionchecks.kueue.x-k8s.io
spec:
conversion:
Expand Down Expand Up @@ -90,10 +90,10 @@ spec:
retryDelayMinutes:
default: 15
description: |-
RetryDelayMinutes **deprecated** specifies how long to keep the workload suspended after
RetryDelayMinutes specifies how long to keep the workload suspended after
a failed check (after it transitioned to False). When the delay period has passed, the check
state goes to "Unknown". The default is 15 min.
The default is 15 min.
Deprecated: retryDelayMinutes has already been deprecated since v0.8 and will be removed in v1beta2.
format: int64
type: integer
required:
Expand All @@ -107,16 +107,8 @@ spec:
conditions hold the latest available observations of the AdmissionCheck
current state.
items:
description: "Condition contains details for one aspect of the current
state of this API Resource.\n---\nThis struct is intended for
direct use as an array at the field path .status.conditions. For
example,\n\n\n\ttype FooStatus struct{\n\t // Represents the
observations of a foo's current state.\n\t // Known .status.conditions.type
are: \"Available\", \"Progressing\", and \"Degraded\"\n\t //
+patchMergeKey=type\n\t // +patchStrategy=merge\n\t // +listType=map\n\t
\ // +listMapKey=type\n\t Conditions []metav1.Condition `json:\"conditions,omitempty\"
patchStrategy:\"merge\" patchMergeKey:\"type\" protobuf:\"bytes,1,rep,name=conditions\"`\n\n\n\t
\ // other fields\n\t}"
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
Expand Down Expand Up @@ -157,12 +149,7 @@ spec:
- Unknown
type: string
type:
description: |-
type of condition in CamelCase or in foo.example.com/CamelCase.
---
Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be
useful (see .node.status.conditions), the ability to deconflict is important.
The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
{{- if .Values.enableCertManager }}
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "kueue.fullname" . }}-serving-cert
{{- end }}
controller-gen.kubebuilder.io/version: v0.15.0
controller-gen.kubebuilder.io/version: v0.16.5
name: clusterqueues.kueue.x-k8s.io
spec:
conversion:
Expand Down Expand Up @@ -117,19 +117,16 @@ spec:
cohort that this ClusterQueue belongs to. CQs that belong to the
same cohort can borrow unused resources from each other.
A CQ can be a member of a single borrowing cohort. A workload submitted
to a queue referencing this CQ can borrow quota from any CQ in the cohort.
Only quota for the [resource, flavor] pairs listed in the CQ can be
borrowed.
If empty, this ClusterQueue cannot borrow from any other ClusterQueue and
vice versa.
A cohort is a name that links CQs together, but it doesn't reference any
object.
Validation of a cohort name is equivalent to that of object names:
subdomain in DNS (RFC 1123).
maxLength: 253
Expand Down Expand Up @@ -169,7 +166,6 @@ spec:
whenCanBorrow determines whether a workload should try the next flavor
before borrowing in current flavor. The possible values are:
- `Borrow` (default): allocate in current flavor if borrowing
is possible.
- `TryNextFlavor`: try next flavor even if the current
Expand All @@ -184,7 +180,6 @@ spec:
whenCanPreempt determines whether a workload should try the next flavor
before borrowing in current flavor. The possible values are:
- `Preempt`: allocate in current flavor if it's possible to preempt some workloads.
- `TryNextFlavor` (default): try next flavor even if there are enough
candidates for preemption in the current flavor.
Expand Down Expand Up @@ -250,18 +245,15 @@ spec:
preemption describes policies to preempt Workloads from this ClusterQueue
or the ClusterQueue's cohort.
Preemption can happen in two scenarios:
- When a Workload fits within the nominal quota of the ClusterQueue, but
the quota is currently borrowed by other ClusterQueues in the cohort.
Preempting Workloads in other ClusterQueues allows this ClusterQueue to
reclaim its nominal quota.
- When a Workload doesn't fit within the nominal quota of the ClusterQueue
and there are admitted Workloads in the ClusterQueue with lower priority.
The preemption algorithm tries to find a minimal set of Workloads to
preempt to accomomdate the pending Workload, preempting Workloads with
lower priority first.
Expand Down Expand Up @@ -303,14 +295,17 @@ spec:
Workloads from other ClusterQueues in the cohort that are using more than
their nominal quota. The possible values are:
- `Never` (default): do not preempt Workloads in the cohort.
- `LowerPriority`: if the pending Workload fits within the nominal
quota of its ClusterQueue, only preempt Workloads in the cohort that have
lower priority than the pending Workload.
- `Any`: if the pending Workload fits within the nominal quota of its
ClusterQueue, preempt any Workload in the cohort, irrespective of
priority.
- `LowerPriority`: **Classic Preemption** if the pending Workload
fits within the nominal quota of its ClusterQueue, only preempt
Workloads in the cohort that have lower priority than the pending
Workload. **Fair Sharing** only preempt Workloads in the cohort that
have lower priority than the pending Workload and that satisfy the
fair sharing preemptionStategies.
- `Any`: **Classic Preemption** if the pending Workload fits within
the nominal quota of its ClusterQueue, preempt any Workload in the
cohort, irrespective of priority. **Fair Sharing** preempt Workloads
in the cohort that satisfy the fair sharing preemptionStrategies.
enum:
- Never
- LowerPriority
Expand All @@ -323,7 +318,6 @@ spec:
within the nominal quota for its ClusterQueue, can preempt active Workloads in
the ClusterQueue. The possible values are:
- `Never` (default): do not preempt Workloads in the ClusterQueue.
- `LowerPriority`: only preempt Workloads in the ClusterQueue that have
lower priority than the pending Workload.
Expand All @@ -347,7 +341,6 @@ spec:
across the queues in this ClusterQueue.
Current Supported Strategies:
- StrictFIFO: workloads are ordered strictly by creation time.
Older workloads that can't be admitted will block admitting newer
workloads even if they fit available quota.
Expand Down Expand Up @@ -434,8 +427,7 @@ spec:
all the nominalQuota can be borrowed by other clusterQueues in the cohort.
If not null, it must be non-negative.
lendingLimit must be null if spec.cohort is empty.
This field is in alpha stage. To be able to use this field,
enable the feature gate LendingLimit, which is disabled by default.
This field is in beta stage and is enabled by default.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
name:
Expand All @@ -455,7 +447,6 @@ spec:
should account for resources that can be provided by a component such as
Kubernetes cluster-autoscaler.
If the ClusterQueue belongs to a cohort, the sum of the quotas for each
(flavor, resource) combination defines the maximum quantity that can be
allocated by a ClusterQueue in the cohort.
Expand Down Expand Up @@ -498,10 +489,8 @@ spec:
stopPolicy - if set to a value different from None, the ClusterQueue is considered Inactive, no new reservation being
made.
Depending on its value, its associated workloads will:
- None - Workloads are admitted
- HoldAndDrain - Admitted workloads are evicted and Reserving workloads will cancel the reservation.
- Hold - Admitted workloads will run to completion and Reserving workloads will cancel the reservation.
Expand Down Expand Up @@ -529,16 +518,8 @@ spec:
conditions hold the latest available observations of the ClusterQueue
current state.
items:
description: "Condition contains details for one aspect of the current
state of this API Resource.\n---\nThis struct is intended for
direct use as an array at the field path .status.conditions. For
example,\n\n\n\ttype FooStatus struct{\n\t // Represents the
observations of a foo's current state.\n\t // Known .status.conditions.type
are: \"Available\", \"Progressing\", and \"Degraded\"\n\t //
+patchMergeKey=type\n\t // +patchStrategy=merge\n\t // +listType=map\n\t
\ // +listMapKey=type\n\t Conditions []metav1.Condition `json:\"conditions,omitempty\"
patchStrategy:\"merge\" patchMergeKey:\"type\" protobuf:\"bytes,1,rep,name=conditions\"`\n\n\n\t
\ // other fields\n\t}"
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
Expand Down Expand Up @@ -579,12 +560,7 @@ spec:
- Unknown
type: string
type:
description: |-
type of condition in CamelCase or in foo.example.com/CamelCase.
---
Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be
useful (see .node.status.conditions), the ability to deconflict is important.
The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
Expand Down Expand Up @@ -734,6 +710,9 @@ spec:
description: |-
PendingWorkloadsStatus contains the information exposed about the current
status of the pending workloads in the cluster queue.
Deprecated: This field will be removed on v1beta2, use VisibilityOnDemand
(https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/)
instead.
properties:
clusterQueuePendingWorkload:
description: Head contains the list of top pending workloads.
Expand Down
Loading

0 comments on commit fc8bc76

Please sign in to comment.