NVIDIA Nsight Systems (NVIDIA Platform)

Cross-section page — full summary at NVIDIA Nsight Systems.

NVIDIA Platform Angle

Nsight Systems is part of the NVIDIA Nsight Developer Tools suite, which provides a layered profiling strategy for applications running on NVIDIA hardware. This page covers how Nsight Systems fits into the NVIDIA platform toolchain and its specific integration with TensorRT-LLM and Triton Inference Server workflows.

Nsight Systems in the NVIDIA Toolchain Hierarchy

NVIDIA’s profiling stack operates at three levels of granularity:

Nsight Systems (system-wide) — timeline-level visibility across CPUs, GPUs, NICs, and OS; the first diagnostic stop for any performance investigation.
Nsight Compute (kernel-level) — deep per-CUDA-kernel metrics; used after Nsight Systems identifies a suspect kernel.
Application-level tools — Triton’s perf_analyzer, TensorRT-LLM’s trtllm-bench — measure end-to-end throughput/latency without full profiling overhead.

In production agentic AI systems, this hierarchy means: benchmark with trtllm-bench/trtllm-serve first, then use Nsight Systems to pinpoint where time is lost, then optionally drill into specific kernels with Nsight Compute.

TensorRT-LLM Integration

TensorRT-LLM (see Performance Analysis — TensorRT LLM) has built-in Nsight Systems integration via:

CUDA profiler runtime API on/off (TLLM_PROFILE_START_STOP): limits capture to specific iterations, keeping profile files small and focused.
NVTX markers (TLLM_LLMAPI_ENABLE_NVTX=1, TLLM_PROFILE_RECORD_GC=1): labels regions in the Nsight Systems timeline for iteration-level and GC-level annotation.
PyTorch profiler export (TLLM_TORCH_PROFILE_TRACE): parallel trace for PyTorch workflow analysis.

Supported NVIDIA Platform Range

Nsight Systems profiles across the full NVIDIA hardware family:

DGX/HGX: Data center training and inference servers
RTX workstations: Developer workstations for model development
DRIVE: Automotive inference platforms
Jetson: Edge AI and robotics deployments

This breadth means the same profiling workflow applies whether optimizing a cloud-based LLM deployment or an edge agentic system.

Connections to NVIDIA Platform Stack

Performance Analysis — TensorRT LLM (NVIDIA Platform) — the TensorRT-LLM guide that describes how to use Nsight Systems features from within the NVIDIA inference stack.
Scaling LLMs with Triton and TensorRT-LLM (NVIDIA Platform) — Triton metrics (port 8002, Prometheus) provide the application-level observability layer above Nsight Systems.

Personal Wiki

Explorer

NVIDIA Nsight Systems (NVIDIA Platform)

NVIDIA Platform Angle

Nsight Systems in the NVIDIA Toolchain Hierarchy

TensorRT-LLM Integration

Supported NVIDIA Platform Range

Connections to NVIDIA Platform Stack

Graph View

Table of Contents

Backlinks