Deploying OpenShift AI¶

Deploy Red Hat OpenShift AI (RHOAI) 3.3 on OpenShift -- from a full GitOps-managed platform to individual capabilities applied manually.

What This Project Does¶

This repository provides production-ready Kustomize manifests for deploying Red Hat OpenShift AI and AI use cases on OpenShift. The manifests are composable -- start with a minimal dashboard, add model serving, training, or the full stack -- and work with two deployment methods:

GitOps (ArgoCD): Two commands bootstrap a self-managing app-of-apps. Push to Git, everything syncs automatically.
Manual (Kustomize): Apply manifests directly with oc apply -k. No ArgoCD needed. Full control over what gets deployed and when.

Target audience: Platform engineers deploying RHOAI, ML engineers who need a reproducible AI platform, and teams evaluating OpenShift AI capabilities.

What gets deployed:

7 operators (cert-manager, ServiceMesh, NFD, GPU Operator, Kueue, JobSet, RHOAI)
GPU infrastructure (cloud-specific examples provided for AWS)
A composable DataScienceCluster (DSC) with 10+ AI capabilities
3 models (orchestrator-8b, qwen-math-7b, gpt-oss-120b) independently deployable via GitOps
4 services (ToolOrchestra, LlamaStack, GenAI Toolbox, Red Hat OKP) auto-discovered by ArgoCD

What's Inside¶

Architecture

Layered Kustomize structure (operators, instances, overlays), ArgoCD app-of-apps pattern, and dependency chain.

Architecture
Quick Start

Deploy the full stack or just what you need. GitOps and manual paths side by side.

Quick Start
Capabilities

Pick what you need: model serving, training, pipelines, workbenches, and more. Each has its own guide with composable overlays.

Capabilities
Use Cases

Pre-built AI applications: NVIDIA ToolOrchestra, Meta LlamaStack, and GenAI Toolbox.

Use Cases

Prerequisites¶

Review before installing

These requirements come from the official RHOAI 3.3 Installation Guide. Verify them before deploying.

OpenShift Container Platform 4.19 or 4.20 (other versions are not supported)
Minimum 2 worker nodes with 8 CPUs and 32 GiB RAM each
Default storage class with dynamic provisioning configured
Identity provider configured -- kubeadmin is not sufficient for RHOAI
oc CLI authenticated as cluster-admin
Open Data Hub must NOT be installed -- RHOAI and ODH cannot coexist on the same cluster
No upgrade path from RHOAI 2.x (as of 3.3) -- 3.0 requires a fresh installation; upgrade support from 2.25 to a stable 3.x is planned for a later release (see Known Issues #4)
Internet access to cdn.redhat.com, registry.redhat.io, quay.io, and related Red Hat domains (or a disconnected mirror)
GPU nodes available (NVIDIA L4, L40S, A100, or H100) -- required for model serving and training workloads
At least 50Gi storage per model in the GPU node availability zone

DSC Overlays -- Pick Your Profile¶

The base DataScienceCluster starts minimal (Dashboard only). Pick an overlay for your needs:

Overlay	Components	Command
`minimal`	Dashboard	`oc apply -k components/instances/rhoai-instance/overlays/minimal/`
`serving`	Dashboard, KServe, ModelMesh	`oc apply -k components/instances/rhoai-instance/overlays/serving/`
`training`	Dashboard, Ray, Training Operator	`oc apply -k components/instances/rhoai-instance/overlays/training/`
`full`	All 10 DSC components	`oc apply -k components/instances/rhoai-instance/overlays/full/`
`dev`	All 10 DSC components (default)	`oc apply -k components/instances/rhoai-instance/overlays/dev/`

See Composing a Custom Profile for building your own overlay.

References¶

RHOAI 3.3 Install Docs
RHOAI 3.3 Distributed Workloads
redhat-cop/gitops-catalog -- Kustomize bases for operators
ToolOrchestra Paper -- NVIDIA's multi-model orchestration approach
verl Framework -- Reinforcement learning training framework