Cross-section page — full summary at NVIDIA Nsight Systems.

NVIDIA Platform Angle

Nsight Systems is part of the NVIDIA Nsight Developer Tools suite, which provides a layered profiling strategy for applications running on NVIDIA hardware. This page covers how Nsight Systems fits into the NVIDIA platform toolchain and its specific integration with TensorRT-LLM and Triton Inference Server workflows.

Nsight Systems in the NVIDIA Toolchain Hierarchy

NVIDIA’s profiling stack operates at three levels of granularity:

  1. Nsight Systems (system-wide) — timeline-level visibility across CPUs, GPUs, NICs, and OS; the first diagnostic stop for any performance investigation.
  2. Nsight Compute (kernel-level) — deep per-CUDA-kernel metrics; used after Nsight Systems identifies a suspect kernel.
  3. Application-level tools — Triton’s perf_analyzer, TensorRT-LLM’s trtllm-bench — measure end-to-end throughput/latency without full profiling overhead.

In production agentic AI systems, this hierarchy means: benchmark with trtllm-bench/trtllm-serve first, then use Nsight Systems to pinpoint where time is lost, then optionally drill into specific kernels with Nsight Compute.

TensorRT-LLM Integration

TensorRT-LLM (see Performance Analysis — TensorRT LLM) has built-in Nsight Systems integration via:

  • CUDA profiler runtime API on/off (TLLM_PROFILE_START_STOP): limits capture to specific iterations, keeping profile files small and focused.
  • NVTX markers (TLLM_LLMAPI_ENABLE_NVTX=1, TLLM_PROFILE_RECORD_GC=1): labels regions in the Nsight Systems timeline for iteration-level and GC-level annotation.
  • PyTorch profiler export (TLLM_TORCH_PROFILE_TRACE): parallel trace for PyTorch workflow analysis.

Supported NVIDIA Platform Range

Nsight Systems profiles across the full NVIDIA hardware family:

  • DGX/HGX: Data center training and inference servers
  • RTX workstations: Developer workstations for model development
  • DRIVE: Automotive inference platforms
  • Jetson: Edge AI and robotics deployments

This breadth means the same profiling workflow applies whether optimizing a cloud-based LLM deployment or an edge agentic system.

Connections to NVIDIA Platform Stack