GPU Quota Management with Kueue¶
Kueue provides job queuing and GPU quota management for OpenShift AI training workloads. It controls which jobs can run, how many GPUs they consume, and supports preemption for priority-based scheduling. Use Kueue when you need to share GPU resources across multiple teams or workloads with fair scheduling.
Dependencies¶
| Requirement | Type | Path |
|---|---|---|
| cert-manager Operator | Operator | components/operators/cert-manager/ |
| Kueue Operator | Operator | components/operators/kueue-operator/ |
| Kueue Instance | Instance | components/instances/kueue-instance/ |
| Kueue Config (Flavors + Queue) | Instance | components/instances/kueue-config/ |
| GPU Infrastructure | Operator + Instance | See gpu-infrastructure.md |
cert-manager is required
The official RHOAI 3.3 documentation lists cert-manager as a dependency for Kueue-based workloads. Install it before deploying the Kueue Operator.
Why Standalone?¶
RHOAI 3.3 includes a Kueue component in the DSC, but the standalone Red Hat
Build of Kueue Operator provides newer features and independent lifecycle
management. Setting kueue: Unmanaged in the DSC tells RHOAI not to install
its own Kueue -- the standalone operator takes over.
Deploy¶
Kueue is deployed automatically via ApplicationSet-discovered Applications:
operator-kueue-operator-- Kueue subscriptioninstance-kueue-instance-- Kueue operator instanceinstance-kueue-config-- ResourceFlavors + ClusterQueue
Verify¶
# Kueue controller running
oc get pods -n openshift-kueue-operator
# ClusterQueue created
oc get clusterqueue training-cluster-queue
# ResourceFlavors created
oc get resourceflavors
How It Works¶
ResourceFlavors¶
The default configuration uses a generic default-gpu ResourceFlavor that works with any NVIDIA GPU:
For GPU-type-specific scheduling, see components/instances/kueue-config/examples/resource-flavors-nvidia.yaml for flavors that target specific GPU products (L4, L40S) via node labels.
ClusterQueue¶
The ClusterQueue defines resource quotas:
| Resource | Quota |
|---|---|
| CPU | 48 |
| Memory | 192Gi |
| nvidia.com/gpu | 8 |
Preemption is enabled: LowerPriority within the queue, Any within the
cohort.
LocalQueue¶
Workloads submit to a namespaced LocalQueue that points to the ClusterQueue.
The ToolOrchestra use case creates one automatically:
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: training-queue
namespace: orchestrator-rhoai
spec:
clusterQueue: training-cluster-queue
Submitting a job through Kueue¶
Add the kueue.x-k8s.io/queue-name label to your job:
Supported workload types (configured in Kueue instance):
- batch/v1.Job
- ray.io/v1.RayJob
- ray.io/v1.RayCluster
- jobset.x-k8s.io/v1alpha2.JobSet
- kubeflow.org/v1.PyTorchJob
- trainer.kubeflow.org/v1alpha1.TrainJob
Customizing Quotas¶
Edit components/instances/kueue-config/cluster-queue.yaml to change quota
limits, add new flavors, or adjust preemption policies.
To use GPU-type-specific flavors instead of the generic default-gpu:
-
Replace
resource-flavor.yamlwith per-GPU-type flavors (seeexamples/resource-flavors-nvidia.yaml): -
Add the flavor to the
ClusterQueueincluster-queue.yaml: