Deploying OpenShift AI¶
Deploy Red Hat OpenShift AI (RHOAI) 3.3 on OpenShift -- from a full GitOps-managed platform to individual capabilities applied manually.
What This Project Does¶
This repository provides production-ready Kustomize manifests for deploying Red Hat OpenShift AI and AI use cases on OpenShift. The manifests are composable -- start with a minimal dashboard, add model serving, training, or the full stack -- and work with two deployment methods:
- GitOps (ArgoCD): Two commands bootstrap a self-managing app-of-apps. Push to Git, everything syncs automatically.
- Manual (Kustomize): Apply manifests directly with
oc apply -k. No ArgoCD needed. Full control over what gets deployed and when.
Target audience: Platform engineers deploying RHOAI, ML engineers who need a reproducible AI platform, and teams evaluating OpenShift AI capabilities.
What gets deployed:
- 7 operators (cert-manager, ServiceMesh, NFD, GPU Operator, Kueue, JobSet, RHOAI)
- GPU infrastructure (cloud-specific examples provided for AWS)
- A composable DataScienceCluster (DSC) with 10+ AI capabilities
- 3 models (orchestrator-8b, qwen-math-7b, gpt-oss-120b) independently deployable via GitOps
- 4 services (ToolOrchestra, LlamaStack, GenAI Toolbox, Red Hat OKP) auto-discovered by ArgoCD
What's Inside¶
-
Architecture
Layered Kustomize structure (operators, instances, overlays), ArgoCD app-of-apps pattern, and dependency chain.
-
Quick Start
Deploy the full stack or just what you need. GitOps and manual paths side by side.
-
Capabilities
Pick what you need: model serving, training, pipelines, workbenches, and more. Each has its own guide with composable overlays.
-
Use Cases
Pre-built AI applications: NVIDIA ToolOrchestra, Meta LlamaStack, and GenAI Toolbox.
Prerequisites¶
Review before installing
These requirements come from the official RHOAI 3.3 Installation Guide. Verify them before deploying.
- OpenShift Container Platform 4.19 or 4.20 (other versions are not supported)
- Minimum 2 worker nodes with 8 CPUs and 32 GiB RAM each
- Default storage class with dynamic provisioning configured
- Identity provider configured --
kubeadminis not sufficient for RHOAI ocCLI authenticated as cluster-admin- Open Data Hub must NOT be installed -- RHOAI and ODH cannot coexist on the same cluster
- No upgrade path from RHOAI 2.x (as of 3.3) -- 3.0 requires a fresh installation; upgrade support from 2.25 to a stable 3.x is planned for a later release (see Known Issues #4)
- Internet access to
cdn.redhat.com,registry.redhat.io,quay.io, and related Red Hat domains (or a disconnected mirror) - GPU nodes available (NVIDIA L4, L40S, A100, or H100) -- required for model serving and training workloads
- At least 50Gi storage per model in the GPU node availability zone
DSC Overlays -- Pick Your Profile¶
The base DataScienceCluster starts minimal (Dashboard only). Pick an overlay for your needs:
| Overlay | Components | Command |
|---|---|---|
minimal |
Dashboard | oc apply -k components/instances/rhoai-instance/overlays/minimal/ |
serving |
Dashboard, KServe, ModelMesh | oc apply -k components/instances/rhoai-instance/overlays/serving/ |
training |
Dashboard, Ray, Training Operator | oc apply -k components/instances/rhoai-instance/overlays/training/ |
full |
All 10 DSC components | oc apply -k components/instances/rhoai-instance/overlays/full/ |
dev |
All 10 DSC components (default) | oc apply -k components/instances/rhoai-instance/overlays/dev/ |
See Composing a Custom Profile for building your own overlay.
References¶
- RHOAI 3.3 Install Docs
- RHOAI 3.3 Distributed Workloads
- redhat-cop/gitops-catalog -- Kustomize bases for operators
- ToolOrchestra Paper -- NVIDIA's multi-model orchestration approach
- verl Framework -- Reinforcement learning training framework