Cross-section page — full summary at NVIDIA Nsight Systems.
NVIDIA Platform Angle
Nsight Systems is part of the NVIDIA Nsight Developer Tools suite, which provides a layered profiling strategy for applications running on NVIDIA hardware. This page covers how Nsight Systems fits into the NVIDIA platform toolchain and its specific integration with TensorRT-LLM and Triton Inference Server workflows.
Nsight Systems in the NVIDIA Toolchain Hierarchy
NVIDIA’s profiling stack operates at three levels of granularity:
- Nsight Systems (system-wide) — timeline-level visibility across CPUs, GPUs, NICs, and OS; the first diagnostic stop for any performance investigation.
- Nsight Compute (kernel-level) — deep per-CUDA-kernel metrics; used after Nsight Systems identifies a suspect kernel.
- Application-level tools — Triton’s
perf_analyzer, TensorRT-LLM’strtllm-bench— measure end-to-end throughput/latency without full profiling overhead.
In production agentic AI systems, this hierarchy means: benchmark with trtllm-bench/trtllm-serve first, then use Nsight Systems to pinpoint where time is lost, then optionally drill into specific kernels with Nsight Compute.
TensorRT-LLM Integration
TensorRT-LLM (see Performance Analysis — TensorRT LLM) has built-in Nsight Systems integration via:
- CUDA profiler runtime API on/off (
TLLM_PROFILE_START_STOP): limits capture to specific iterations, keeping profile files small and focused. - NVTX markers (
TLLM_LLMAPI_ENABLE_NVTX=1,TLLM_PROFILE_RECORD_GC=1): labels regions in the Nsight Systems timeline for iteration-level and GC-level annotation. - PyTorch profiler export (
TLLM_TORCH_PROFILE_TRACE): parallel trace for PyTorch workflow analysis.
Supported NVIDIA Platform Range
Nsight Systems profiles across the full NVIDIA hardware family:
- DGX/HGX: Data center training and inference servers
- RTX workstations: Developer workstations for model development
- DRIVE: Automotive inference platforms
- Jetson: Edge AI and robotics deployments
This breadth means the same profiling workflow applies whether optimizing a cloud-based LLM deployment or an edge agentic system.
Connections to NVIDIA Platform Stack
- Performance Analysis — TensorRT LLM (NVIDIA Platform) — the TensorRT-LLM guide that describes how to use Nsight Systems features from within the NVIDIA inference stack.
- Scaling LLMs with Triton and TensorRT-LLM (NVIDIA Platform) — Triton metrics (port 8002, Prometheus) provide the application-level observability layer above Nsight Systems.