Chat With Your Enterprise Data Through Open-Source AI-Q NVIDIA Blueprint
By Nicola Sessions — NVIDIA Developer Blog, 2025-06-11
Abstract
The AI-Q NVIDIA Blueprint is an open-source reference implementation for building enterprise AI agents that connect to, reason across, and deliver insights from diverse organisational data sources at petabyte scale. Built on three core building blocks — NVIDIA NeMo Retriever microservices, NIM inference microservices, and the NeMo Agent Toolkit — it demonstrates a full multimodal agentic pipeline covering data ingestion, semantic retrieval, advanced reasoning, and production observability. The blueprint is framework-agnostic and integrates with CrewAI, LangChain, LlamaIndex, Agno, and others through the Agent Toolkit’s plugin system. Real-world deployments at Therapyside and Pangaea Data demonstrate 22 minutes saved per patient per clinical day and 98% accuracy in rare-disease patient data retrieval, respectively.
Key Concepts
- AI-Q Blueprint: open-source NVIDIA Blueprint providing a developer-friendly workflow for building Artificial General Agents (AGA) that query and reason over enterprise data
- NeMo Retriever: NVIDIA microservices for extraction, embedding, and reranking of multimodal enterprise data — processes PDFs, images, tables, and databases up to 15× faster using accelerated computing
- cuVS: CUDA-accelerated vector search library; vectors are stored in a cuVS-managed database and served via Docker Compose
- Llama Nemotron: reasoning model with a unique toggleable reasoning mode — enables dynamic performance-cost trade-off and achieves up to 5× faster inference when reasoning is disabled
- NeMo Agent Toolkit: orchestration layer managing multi-agent REST APIs, observability telemetry, and workflow profiling; acts as the unifying layer across diverse agentic frameworks
- Multimodal PDF extraction: NeMo Retriever extraction microservices ingest structured, semi-structured, and unstructured content from diverse source formats
Key Claims and Findings
- 68% of available enterprise data goes unused (Gartner); AI-Q addresses this by providing always-current semantic search over private data
- NeMo Retriever processes multimodal enterprise data at petabyte scale, up to 15× faster than non-accelerated pipelines
- Llama Nemotron’s toggleable reasoning provides up to 5× faster inference while maintaining high-quality outputs on complex tasks
- Therapyside’s Maia agent (built on Agent Toolkit + NeMo Retriever) saves clinicians 22 minutes per patient per day
- Pangaea Data achieved 98% accuracy in clinical data retrieval for rare-disease detection, reducing configuration time from weeks to days
Architecture
The AI-Q pipeline follows five stages:
- Multimodal ingestion — NeMo Retriever extraction microservices process PDFs, images, tables, and databases
- Embedding and indexing — data is continuously embedded and stored in a cuVS-accelerated vector database
- Retrieval — RAG with NeMo Retriever Reranking surfaces the most relevant context for each query
- Reasoning — Llama Nemotron models decompose problems, plan iteratively, and reflect to produce nuanced answers
- Observability — the Agent Toolkit profiler exports OpenTelemetry-compatible telemetry covering token usage, latency, and cost per agent/tool
Terminology
- AGA (Artificial General Agent): agent capable of autonomous reasoning and action across heterogeneous data sources
- OpenTelemetry: CNCF standard for distributed observability; supported natively by the NeMo Agent Toolkit for integration with monitoring tools
- Data-grounded response: answer generated using retrieved, privately held enterprise data rather than parametric model knowledge
- Framework-agnostic: the Agent Toolkit integrates with any popular agentic framework via modular plugin packages without requiring teams to replatform
Connections to Existing Wiki Pages
- NeMo Agent Toolkit Evaluation — evaluation harness and profiler described in the blueprint are detailed here
- NVIDIA NeMo Agent Toolkit — product overview of the Agent Toolkit that underpins the AI-Q Blueprint
- AI-Q Blueprint (Knowledge Integration angle) — cross-section page covering RAG pipeline and NeMo Retriever in detail
- Three Building Blocks for AI Virtual Assistants — companion article on the NIM + NeMo Retriever + Agent Toolkit stack
- NeMo Agent Toolkit: Agent Evaluation — evaluation capabilities referenced as part of the toolkit used in AI-Q