ArgoCD Applications¶
After bootstrap, ArgoCD manages 25 Applications across four layers.
Application Table¶
| Application | Source | Sync Policy | Purpose |
|---|---|---|---|
cluster-bootstrap |
clusters/overlays/dev/ |
Auto (selfHeal) | Self-manages the dev overlay: AppSets, explicit Apps |
operator-cert-manager |
components/operators/cert-manager/ |
Auto (selfHeal) | cert-manager operator subscription |
operator-nfd |
components/operators/nfd/ |
Auto (selfHeal) | Node Feature Discovery operator |
operator-gpu-operator |
components/operators/gpu-operator/ |
Auto (selfHeal) | NVIDIA GPU Operator |
operator-kueue-operator |
components/operators/kueue-operator/ |
Auto (selfHeal) | Red Hat Build of Kueue |
operator-jobset-operator |
components/operators/jobset-operator/ |
Auto (selfHeal) | JobSet Operator |
operator-rhoai-operator |
components/operators/rhoai-operator/ |
Auto (selfHeal) | Red Hat OpenShift AI operator |
operator-servicemesh |
components/operators/servicemesh/ |
Auto (selfHeal) | Red Hat OpenShift Service Mesh 3 operator (required for LlamaStack) |
instance-nfd-instance |
components/instances/nfd-instance/ |
Auto (selfHeal) | NFD NodeFeatureDiscovery CR |
instance-gpu-instance |
components/instances/gpu-instance/ |
Auto (selfHeal) | GPU ClusterPolicy CR |
instance-kueue-instance |
components/instances/kueue-instance/ |
Auto (selfHeal) | Kueue operator instance |
instance-cluster-autoscaler |
components/instances/cluster-autoscaler/ |
Auto (selfHeal) | ClusterAutoscaler for GPU node auto-scaling |
instance-kueue-config |
components/instances/kueue-config/ |
Auto (selfHeal) | GPU ResourceFlavors + ClusterQueue |
instance-jobset-instance |
components/instances/jobset-instance/ |
Auto (selfHeal) | JobSet operator instance |
instance-rhoai |
components/instances/rhoai-instance/overlays/dev/ |
Auto (selfHeal, no prune) | DataScienceCluster with ignoreDifferences |
instance-dashboard-config |
components/instances/dashboard-config/ |
Auto (selfHeal) | Enables genAiStudio in the RHOAI dashboard |
instance-mcp-servers |
components/instances/mcp-servers/ |
Auto (selfHeal) | Registers MCP servers (GenAI Toolbox, OKP) in the RHOAI dashboard |
model-orchestrator-8b |
usecases/models/orchestrator-8b/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | Nemotron-Orchestrator-8B model serving |
model-qwen-math-7b |
usecases/models/qwen-math-7b/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | Qwen2.5-Math-7B-Instruct model serving |
model-gpt-oss-120b |
usecases/models/gpt-oss-120b/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | GPT-OSS-120B model serving (ModelCar) |
service-toolorchestra-app |
usecases/services/toolorchestra-app/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | ToolOrchestra UI + training infra |
service-llamastack |
usecases/services/llamastack/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | LlamaStack Distribution + PostgreSQL |
service-genai-toolbox |
usecases/services/genai-toolbox/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | GenAI Toolbox MCP Server |
service-rhokp |
usecases/services/rhokp/profiles/tier1-minimal/ |
Auto (selfHeal, prune) | Red Hat OKP MCP Server (Solr + MCP) |
usecase-toolorchestra-training |
usecases/services/toolorchestra-app/manifests/training/workloads/ |
Manual only | Download jobs + RayJob (on-demand) |
Sync Wave Ordering¶
Within the service-toolorchestra-app and model-* apps, sync waves ensure correct resource ordering:
| Wave | Resources | Purpose |
|---|---|---|
| -1 (default) | Namespace, RBAC, ConfigMaps, PVCs, ServingRuntimes, Service, Route, NetworkPolicy, LocalQueue | Infrastructure created first |
| 0 | download-orchestrator-8b, download-qwen-math-7b Jobs |
Model download jobs run and complete before predictors start |
| 1 | orchestrator-8b, qwen-math-7b InferenceServices |
Predictors created only after models are downloaded to PVCs |
Download jobs are idempotent (check for .download_complete marker) and have no TTL, so completed jobs persist as Synced/Healthy in ArgoCD without being recreated.
App-of-Apps Bootstrap¶
The cluster-bootstrap Application watches clusters/overlays/dev/ and auto-syncs any changes. This means:
- Adding a new
ApplicationYAML toclusters/overlays/dev/and pushing to Git automatically creates the new ArgoCD Application - Adding a new operator directory to
components/operators/automatically creates a new operator Application via thecluster-operatorsApplicationSet - Same for
components/instances/*,usecases/models/*/profiles/tier1-minimal, andusecases/services/*/profiles/tier1-minimal
The only manual oc apply ever needed is the initial bootstrap.
Adding a New Operator¶
- Create
components/operators/my-operator/kustomization.yamlwith a Subscription resource - Create
components/instances/my-instance/kustomization.yamlwith the instance CR - Push to Git --
cluster-bootstrapauto-syncs the AppSets, which auto-discover the new directories and create ArgoCD Applications