Evaluation and Tuning
NCP-AAI topic area — exam weight: 13%
Measuring, comparing, and optimizing agent performance.
Ingested Material
- Data Flywheel: What It Is and How It Works
- AI Agents in Production: Observability & Evaluation
- NVIDIA NeMo Agent Toolkit: Agent Evaluation
- Successful Agentic AI: Model Logic, Data Considerations and Manpower
- AI Agent Evaluation — Summary (cross-section)
- Troubleshooting — TensorRT-LLM
- A Guide to Monitoring Machine Learning Models in Production
- Monitoring ML Models in Production: Data Quality and Integrity
- Agentic or Tool use