Skip to content

Use Cases

Each use case is a self-contained AI application deployed on top of the Red Hat OpenShift AI (RHOAI) platform.

Structure

The repository separates models (individual model deployments) from services (applications that consume models):

usecases/
├── models/                 # One directory per model
│   └── <model-name>/
│       ├── manifests/      # ServingRuntime, InferenceService, PVC, download Job
│       └── profiles/
│           └── tier1-minimal/  # Kustomize overlay (auto-discovered by cluster-models AppSet)
└── services/               # Application services
    └── <service-name>/
        ├── manifests/
        │   ├── base/       # Namespace, RBAC, config, network
        │   ├── services/   # Deployments, Routes
        │   └── training/   # Training infrastructure + workloads
        └── profiles/
            └── tier1-minimal/  # Kustomize overlay (auto-discovered by cluster-services AppSet)

Current Models

Model Description Deployed by Default
gpt-oss-120b OpenAI GPT-OSS 120B MoE (MXFP4, 4x L40S tensor-parallel, Red Hat AI validated ModelCar) Yes
orchestrator-8b NVIDIA Nemotron-Orchestrator-8B for multi-tool coordination No (excluded)
qwen-math-7b Qwen2.5-Math-7B-Instruct math specialist No (excluded)

Re-enabling excluded models

Models marked "excluded" have their manifests in Git but are excluded from ArgoCD discovery via exclude entries in cluster-models-appset.yaml. To re-enable a model, remove its exclude entry and push to Git.

Current Services

Service Description Model Dependencies Deployed by Default Guide
llamastack Meta's LlamaStack Distribution with agents, RAG, and tool use gpt-oss-120b (remote by default) Yes LlamaStack
genai-toolbox GenAI Toolbox MCP Server for database tools None (uses llamastack's PostgreSQL) Yes GenAI Toolbox
rhokp Red Hat OKP MCP Server for RHEL documentation, CVEs, errata None (self-contained with OKP Solr) Yes Red Hat OKP
toolorchestra-app NVIDIA ToolOrchestra UI for multi-model orchestration orchestrator-8b, qwen-math-7b No (excluded) ToolOrchestra

Re-enabling excluded services

Services marked "excluded" have their manifests in Git but are excluded from ArgoCD discovery via exclude entries in cluster-services-appset.yaml. To re-enable a service, remove its exclude entry (and re-enable its model dependencies) and push to Git.

Deploy models before services

Services depend on model endpoints being reachable. When deploying manually, deploy all required models and wait for them to become Ready before deploying services. In GitOps mode, both cluster-models and cluster-services ApplicationSets deploy in parallel, so models typically become ready before services finish initializing.

Adding a New Model

  1. Create a directory under usecases/models/:

    usecases/models/my-model/
      manifests/
        kustomization.yaml
        serving-runtime.yaml
        inference-service.yaml
      profiles/
        tier1-minimal/
          kustomization.yaml
    
  2. The cluster-models ApplicationSet auto-discovers usecases/models/*/profiles/tier1-minimal directories. Push to Git and a new model-<name> Application is created automatically.

Adding a New Service

  1. Create a directory under usecases/services/:

    usecases/services/my-service/
      manifests/
        base/
          kustomization.yaml
          namespace.yaml
        services/
          my-service/
      profiles/
        tier1-minimal/
          kustomization.yaml
    
  2. The cluster-services ApplicationSet auto-discovers usecases/services/*/profiles/tier1-minimal directories. Push to Git and a new service-<name> Application is created automatically.

Model download jobs

For model download jobs, always:

  • Add argocd.argoproj.io/sync-wave: "-1" to PVCs so they bind before download Jobs
  • Add argocd.argoproj.io/sync-wave: "0" so downloads run before InferenceService (wave 1)
  • Omit ttlSecondsAfterFinished so completed jobs persist and ArgoCD doesn't recreate them