appears_in:
- ai-ml/nvidia-certs/NCP-AAI_Part_1_Exam_Prep_FULL
- ai-ml/nvidia-certs/NCP-AAI_Part3_GraphBased_Orchestration_Study_Guide
- ai-ml/NIPS-2017-attention-is-all-you-need-Paper
- ai-ml/Building_Agentic_AI_Applications_with_LLMs
- ai-ml/nvidia-certs/NCP-AAI_Part2_Exam_Prep_Full
- ai-ml/nvidia-certs/Generative AI LLM Exam Study Guide
- ai-ml/nvidia-certs/NCA-GENM Softerware development
- ai-ml/nvidia-certs/NCA-GENM Core Machine Learning and AI Knowledge
- ai-ml/nvidia-certs/NCA-GENM Experimentation
- ai-ml/nvidia-certs/NCA-GENM Performance Optimization
- ai-ml/nvidia-certs/NCP-AAI_Part0_Exam_Prep_FULL
- ai-ml/DeepSeek-R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learing
- ai-ml/nvidia-certs/NCP-AAI_Part4_Building_Retriever_Nodes_Study_Guide
- ai-ml/nvidia-certs/ncp-aai/agent-architecture-and-design/building-autonomous-ai-nvidia-agentic-nemo
- ai-ml/nvidia-certs/ncp-aai/agent-architecture-and-design/three-building-blocks-ai-virtual-assistants-nvidia-blueprint
- ai-ml/nvidia-certs/ncp-aai/agent-architecture-and-design/what-are-multi-agent-systems
- ai-ml/nvidia-certs/ncp-aai/agent-development/Optimization-NVIDIA-Triton-Inference-Server
- ai-ml/nvidia-certs/ncp-aai/agent-development/An-Introduction-to-Large-Language-Models-Prompt-Engineering-and-P-Tuning
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/Optimization-NVIDIA-Triton-Inference-Server
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Optimization-NVIDIA-Triton-Inference-Server
- ai-ml/nvidia-certs/ncp-aai/agent-architecture-and-design/what-are-ai-agents
- ai-ml/nvidia-certs/ncp-aai/evaluation-and-tuning/data-flywheel-what-it-is-and-how-it-works
- ai-ml/nvidia-certs/ncp-aai/evaluation-and-tuning/nvidia-nemo-agent-toolkit-evaluation
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/nvidia-nemo-agent-toolkit-evaluation
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/measure-and-improve-ai-workload-performance-with-nvidia-dgx-cloud-benchmarking
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/measure-and-improve-ai-workload-performance-with-nvidia-dgx-cloud-benchmarking
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/nvidia-nsight-systems
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/nvidia-nsight-systems
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/performance-analysis-tensorrt-llm
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/performance-analysis-tensorrt-llm
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/scaling-llms-with-nvidia-triton-and-tensorrt-llm-using-kubernetes
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/scaling-llms-with-nvidia-triton-and-tensorrt-llm-using-kubernetes
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/what-is-kubernetes
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Chat-With-Your-Enterprise-Data-Through-Open-Source-AI-Q-NVIDIA-Blueprint
- ai-ml/nvidia-certs/ncp-aai/knowledge-integration-and-data-handling/Chat-With-Your-Enterprise-Data-Through-Open-Source-AI-Q-NVIDIA-Blueprint
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Improve-AI-Code-Generation-Using-NVIDIA-NeMo-Agent-Toolkit
- ai-ml/nvidia-certs/ncp-aai/agent-development/Improve-AI-Code-Generation-Using-NVIDIA-NeMo-Agent-Toolkit
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/NVIDIA-NeMo-Agent-Toolkit
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Welcome-to-NVIDIA-RunAI-Documentation
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/Welcome-to-NVIDIA-RunAI-Documentation
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Performance-Tuning-Guide
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/Performance-Tuning-Guide
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/NVIDIA-NeMo-Guardrails
- ai-ml/nvidia-certs/ncp-aai/safety-ethics-and-compliance/NVIDIA-NeMo-Guardrails
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Triton-Inference-Server-Backend
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/Triton-Inference-Server-Backend
- ai-ml/nvidia-certs/ncp-aai/nvidia-platform-implementation/Batchers-NVIDIA-Triton-Inference-Server
- ai-ml/nvidia-certs/ncp-aai/deployment-and-scaling/Batchers-NVIDIA-Triton-Inference-Server
- ai-ml/nvidia-certs/ncp-aai/safety-ethics-and-compliance/building-safer-llm-apps-with-langchain-templates-and-nvidia-nemo-guardrails
- ai-ml/nvidia-certs/ncp-aai/safety-ethics-and-compliance/securing-generative-ai-deployments-with-nvidia-nim-and-nvidia-nemo-guardrails
- ai-ml/ai-accelerator-architectures/low-latency-llm-inference/Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache entity_type: institution last_updated: ‘2026-05-27’ sources: 52 status: stub title: NVIDIA
NVIDIA
Institution.
Appearances in this wiki
- NCP-AAI_Part_1_Exam_Prep_FULL — The organization issuing the NVIDIA Certified Professional - Agentic AI certification discussed in the document.
- NCP-AAI_Part3_GraphBased_Orchestration_Study_Guide — Provider of the DLI course and NeMo Agent Toolkit referenced in the document.
- NIPS-2017-attention-is-all-you-need-Paper — Company providing the GPU infrastructure explicitly referenced for training the Transformer model.
- Building_Agentic_AI_Applications_with_LLMs — Referenced in the source context regarding hardware infrastructure or frameworks, such as GPU acceleration, relevant to deploying LLM-based agentic systems.
- NCP-AAI_Part2_Exam_Prep_Full — Parent company and creator of the Deep Learning Institute (DLI) course central to this document.
- Generative AI LLM Exam Study Guide — Central reference for hardware and software infrastructure (NeMo, SteerLM, TensorRT, CUDA, RAPIDS) throughout the generative AI lifecycle covered in the guide.
- NCA-GENM Softerware development — Central provider of the GPU-accelerated deep learning infrastructure, including cuDNN, NGC containers, and ACE microservices, highlighted for optimizing model training and deployment.
- NCA-GENM Core Machine Learning and AI Knowledge — Company providing optimized AI tools, deployment ecosystems, and hardware acceleration referenced for generative model development and inference efficiency.
- NCA-GENM Experimentation — The company responsible for the Riva ASR platform referenced for inference-time domain adaptation and word boosting in automatic speech recognition workflows.
- NCA-GENM Performance Optimization — Company providing GPU infrastructure and DPUs referenced for compute efficiency, and the publisher of the certification curriculum linked in the document.
- NCP-AAI_Part0_Exam_Prep_FULL — The technology corporation that oversees the NCP-AAI certification track and developed the NeMo Guardrails toolkit for runtime safety and governance.
- DeepSeek-R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learing — The semiconductor manufacturer responsible for supplying the high-performance GPU hardware infrastructure used in the training pipeline.
- NCP-AAI_Part4_Building_Retriever_Nodes_Study_Guide — Provider of the NVIDIARerank reranking model used in the Part 4 assessment retrieval pipeline.
- Building Autonomous AI with NVIDIA Agentic NeMo — Subject of the article: the NeMo framework, Triton Inference Server, TensorRT-LLM, NeMo Guardrails, and Megatron-LM are all NVIDIA products central to the agentic stack described.
- Three Building Blocks for Creating AI Virtual Assistants (NVIDIA Blueprint) — Source and subject of the article: describes the NVIDIA AI Blueprint, NIM microservices (Llama 3.1 70B, NeMo Retriever Embedding, NeMo Retriever Reranking), and NVIDIA AI Enterprise software stack.
- What are Multi-Agent Systems? — Source (NVIDIA glossary page) defining multi-agent systems and their use cases, including references to NVIDIA Nemotron and agentic AI capabilities.
- Optimization — NVIDIA Triton Inference Server — Subject of the article: NVIDIA Triton Inference Server, TensorRT, OpenVINO, perf_analyzer, and Model Analyzer are the primary tools described.
- An Introduction to Large Language Models: Prompt Engineering and P-Tuning — Published by NVIDIA Developer Blog; describes NVIDIA NeMo as the platform for p-tuning large language models.
- Optimization — NVIDIA Triton Inference Server (Deployment) — Cross-section page covering Triton’s deployment-relevant optimisation options.
- Optimization — NVIDIA Triton Inference Server (NVIDIA Platform) — Cross-section page framing Triton as an NVIDIA platform component within the agentic stack.
- What are AI Agents? — Source (NVIDIA glossary); defines agent components, types, and orchestration patterns; references NVIDIA Nemotron, Cosmos, Blueprints, API catalog, and NVIDIA OpenShell as agent development tools.
- Data Flywheel: What It Is and How It Works — Source (NVIDIA glossary); NeMo Curator, Customizer, Evaluator, Guardrails, and Retriever microservices are the primary subject; NIM referenced as co-platform for the AT&T case study and the AI Data Flywheel Blueprint.
- NVIDIA NeMo Agent Toolkit: Agent Evaluation — Source (NVIDIA GitHub); the NeMo Agent Toolkit evaluation harness (
nat eval), NVIDIA-native Ragas NV metrics, and Nemotron judge LLM recommendations are the primary subject. - NVIDIA NeMo Agent Toolkit: Evaluation (NVIDIA Platform) — Cross-section page covering NIM as evaluation backend and judge LLM platform, NVIDIA Ragas NV metrics, and profiler integration.
- Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking — Source (NVIDIA Developer Blog); DGX Cloud Benchmarking suite, NeMo framework version optimization, Transformer Engine FP8, and DGX hardware family are the primary subjects.
- DGX Cloud Benchmarking (NVIDIA Platform) — Cross-section page covering DGX platform, NeMo framework, and Hopper/Blackwell Transformer Engine as NVIDIA-specific performance levers.
- NVIDIA Nsight Systems — Source (NVIDIA developer product page); Nsight Systems, Nsight Compute, Nsight Graphics, and Nsight Aftermath SDK are the primary NVIDIA tools described.
- NVIDIA Nsight Systems (NVIDIA Platform) — Cross-section page covering Nsight Systems’ role in the NVIDIA profiling toolchain and its integration with TensorRT-LLM and Triton.
- Performance Analysis — TensorRT LLM — Source (NVIDIA GitHub developer guide); TensorRT-LLM Nsight Systems integration, NVTX markers, CUDA profiler API gating, and ENABLE_PERFECT_ROUTER MoE analysis are the primary subjects.
- Performance Analysis — TensorRT LLM (NVIDIA Platform) — Cross-section page covering TensorRT-LLM’s built-in Nsight Systems integration as a native NVIDIA platform profiling capability.
- Scaling LLMs with NVIDIA Triton and TensorRT-LLM Using Kubernetes — Source (NVIDIA Developer Blog); TensorRT-LLM engine building, NVIDIA Dynamo Triton, DCGM Exporter, NGC containers, and Kubernetes HPA autoscaling are the primary subjects.
- Scaling LLMs with Triton and TensorRT-LLM (NVIDIA Platform) — Cross-section page covering the full NVIDIA production LLM serving stack: TensorRT-LLM + Dynamo Triton + DCGM + NGC.
- What is Kubernetes? — Source (NVIDIA glossary); covers NVIDIA GPU Kubernetes extensions: device plugin, GPU Feature Discovery, DCGM, MIG (A100), and EGX stack; NVIDIA Triton as hardware abstraction within Kubernetes nodes.
- Chat With Your Enterprise Data Through Open-Source AI-Q NVIDIA Blueprint — Source (NVIDIA Developer Blog); AI-Q Blueprint is built entirely on NVIDIA NIM, NeMo Retriever, and NeMo Agent Toolkit; Llama Nemotron reasoning model is the central AI engine.
- AI-Q Blueprint (Knowledge Integration angle) — Cross-section page covering NeMo Retriever, cuVS vector storage, and NVIDIA-accelerated multimodal data ingestion pipeline.
- Improve AI Code Generation Using NVIDIA NeMo Agent Toolkit — Source (NVIDIA Developer Blog); NeMo Agent Toolkit and NVIDIA NIM reasoning microservices are the platform backbone for the coding agent tutorial.
- Improve AI Code Generation (Agent Development angle) — Cross-section page covering agent design patterns enabled by Agent Toolkit.
- NVIDIA NeMo Agent Toolkit — Source (NVIDIA product page); comprehensive overview of the NeMo Agent Toolkit capabilities, architecture, and integration ecosystem.
- Welcome to NVIDIA Run:ai Documentation — Source (NVIDIA Run:ai product documentation); Run:ai is an NVIDIA platform for AI workload orchestration and GPU scheduling across hybrid infrastructure.
- NVIDIA Run:ai (Deployment and Scaling angle) — Cross-section page on scheduling and multi-cloud scaling perspective.
- Performance Tuning Guide — Megatron-Bridge — Source (NVIDIA NeMo docs); Megatron-Bridge FP8 training, distributed parallelism strategies, and MFU/TCO optimisation on NVIDIA GPUs.
- Performance Tuning Guide (Deployment and Scaling angle) — Cross-section covering MFU, TCO, and scale-out parallelism strategy selection.
- NVIDIA NeMo Guardrails — Source (NVIDIA product page); NeMo Guardrails is NVIDIA’s runtime safety enforcement platform for agentic AI (jailbreak, PII, content safety, RAG grounding).
- NVIDIA NeMo Guardrails (Safety angle) — Cross-section page covering safety, ethics, and compliance dimensions.
- Triton Inference Server Backend — Source (NVIDIA Triton docs); comprehensive reference on Triton backend API, supported backends (TRT, ONNX, PyTorch, vLLM, TRT-LLM), and custom backend development.
- Triton Backend (Deployment angle) — Cross-section covering backend selection and deployment implications for production LLM serving.
- Batchers — NVIDIA Triton Inference Server — Source (NVIDIA Triton docs); Dynamic Batcher, Sequence Batcher, and Custom Batcher for server-side request aggregation.
- Batchers (Deployment angle) — Cross-section covering latency-throughput trade-offs and stateful agent batching with Sequence Batcher.
- Building Safer LLM Apps with LangChain Templates and NVIDIA NeMo Guardrails — The primary technology vendor whose NeMo Guardrails platform and enterprise LLM hardening architecture form the technical foundation of this document.
- Securing Generative AI Deployments with NVIDIA NIM and NVIDIA NeMo Guardrails — The central vendor providing the NIM microservices and NeMo Guardrails framework used to secure generative AI deployments.