NVIDIA NeMo Agent Toolkit
NVIDIA Developer product page — developer.nvidia.com/nemo-agent-toolkit
Abstract
NVIDIA NeMo Agent Toolkit is an open-source AI library that provides a unified platform for developing, evaluating, optimising, deploying, and observing AI agent systems across any framework. It works across LangChain, CrewAI, Google ADK, LangGraph, and custom frameworks through a common YAML-based workflow specification and a plugin architecture. Key pillars are: a built-in evaluation harness for iterative agent quality assurance; an Agent Hyperparameter Optimizer for automatic LLM and prompt tuning; NVIDIA Dynamo integration for telemetry-guided request routing at runtime; plugin-based OpenTelemetry-compatible observability; and safety/security red-teaming tooling for proactive vulnerability assessment. The toolkit ships as the nvidia-nat pip package with the nat CLI and is part of the broader NVIDIA Agent Toolkit ecosystem.
Key Concepts
- YAML configuration: universal descriptors for agents, tools, and workflows — enables reusability across projects and replatform-free migration between frameworks
- Evaluation harness:
nat evalcommand tests agents against datasets, scores outputs with configurable metrics, and generates reports; evaluation is built into the development loop, not bolted on at the end - Agent Hyperparameter Optimizer: automatically selects optimal LLM type, temperature, max_tokens, and system prompt for a given workflow, optimising for accuracy, latency, cost, or custom metrics
- NVIDIA Dynamo integration: uses agent telemetry hints to route inference requests intelligently at runtime, reducing latency under load
- Plugin-based observability: event-driven architecture that traces every step of agent workflows and exports telemetry to Phoenix, Langfuse, Weave, or any OpenTelemetry-compatible service; multiple exporters can be configured simultaneously
- Red-teaming tools: middleware to probe agentic workflows for prompt injection, jailbreaks, tool poisoning, and custom attacks; results are visualised on a dashboard with risk analysis; pluggable defence layers reduce identified risks
Key Capabilities
| Capability | Benefit |
|---|---|
| Framework-agnostic | Continue building with LangGraph, CrewAI, LangChain — no replatforming |
| Common specification | Reusable, portable workflow configs shareable through registry |
| Evaluation harness | Rapid dev iteration: define expected outputs, run nat eval, compare across models and configs |
| Built-in deployment | nat serve launches workflows as stateless REST microservices |
| Hyperparameter optimizer | Automated tuning saves manual trial-and-error on model and prompt selection |
| Observability | Full trace of agent behaviours, token usage, and execution paths in dev and prod |
| Safety middleware | Proactive red-teaming before deployment; pluggable defence layers reduce attack surface |
Installation
pip install nvidia-nat
nat --help
nat --versionTerminology
nvidia-nat: the pip package name for NeMo Agent Toolkitnat eval: CLI command for running evaluation datasets against a configured workflownat serve: CLI command for launching a workflow as a microservice- Telemetry hint: runtime metadata from agent execution used by NVIDIA Dynamo for request routing decisions
- Registry system: mechanism for sharing workflow specifications across projects and teams
Connections to Existing Wiki Pages
- AI-Q Blueprint — end-to-end enterprise RAG deployment using the Agent Toolkit as orchestration layer
- Improve AI Code Generation — tutorial building a test-driven coding agent with the toolkit
- NeMo Agent Toolkit: Evaluation — deep dive into the
nat evalevaluation harness and Ragas NV metrics - Scaling LLMs with Triton and TensorRT-LLM — deployment-scaling article complementing Dynamo-based request routing
- Building Autonomous AI with NVIDIA Agentic NeMo — architectural overview of the NeMo agentic stack this toolkit sits within