Welcome to NVIDIA Run:ai Documentation

Cross-section page — Deployment and Scaling angle. See primary page for the full summary.

Deployment and Scaling Angle

NVIDIA Run:ai addresses a core challenge in production AI: keeping GPU clusters fully utilised as training and inference workloads compete for resources. From a deployment and scaling perspective, its key contributions are:

Dynamic Workload Scheduling

Run:ai continuously monitors GPU utilisation across the cluster and re-allocates idle capacity to queued jobs without manual intervention. This eliminates the idle time between training runs and inference bursts that typically leaves expensive GPU clusters underutilised.

Scaling Across Heterogeneous Infrastructure

Run:ai supports scaling AI workloads across on-premises GPU clusters, public cloud GPUs, and hybrid combinations under a single control plane. Teams do not need separate tools for on-prem scheduling and cloud burst — Run:ai presents a unified resource pool. This is directly relevant to the NCP-AAI deployment-and-scaling exam topic of “operationalising and scaling agentic systems.”

Integration With Existing MLOps Stacks

An API-first architecture means Run:ai integrates with existing Kubernetes deployments, CI/CD pipelines, and framework tools (PyTorch, TensorFlow, JAX, NVIDIA NeMo) without replacing them. Agents and training workloads submitted via the existing toolchain are scheduled by Run:ai transparently.

Connections

Scaling LLMs with Triton and TensorRT-LLM Using Kubernetes — the Kubernetes-based inference stack that Run:ai can orchestrate at the cluster level
DGX Cloud Benchmarking — benchmarking the workload throughput that Run:ai dynamic scheduling helps maximise

Personal Wiki

Explorer

Welcome to NVIDIA Run:ai Documentation (Deployment and Scaling)

Welcome to NVIDIA Run:ai Documentation

Deployment and Scaling Angle

Dynamic Workload Scheduling

Scaling Across Heterogeneous Infrastructure

Integration With Existing MLOps Stacks

Connections

Graph View

Table of Contents

Backlinks