Chat With Your Enterprise Data Through Open-Source AI-Q NVIDIA Blueprint

By Nicola Sessions — NVIDIA Developer Blog, 2025-06-11

Abstract

The AI-Q NVIDIA Blueprint is an open-source reference implementation for building enterprise AI agents that connect to, reason across, and deliver insights from diverse organisational data sources at petabyte scale. Built on three core building blocks — NVIDIA NeMo Retriever microservices, NIM inference microservices, and the NeMo Agent Toolkit — it demonstrates a full multimodal agentic pipeline covering data ingestion, semantic retrieval, advanced reasoning, and production observability. The blueprint is framework-agnostic and integrates with CrewAI, LangChain, LlamaIndex, Agno, and others through the Agent Toolkit’s plugin system. Real-world deployments at Therapyside and Pangaea Data demonstrate 22 minutes saved per patient per clinical day and 98% accuracy in rare-disease patient data retrieval, respectively.

Key Concepts

AI-Q Blueprint: open-source NVIDIA Blueprint providing a developer-friendly workflow for building Artificial General Agents (AGA) that query and reason over enterprise data
NeMo Retriever: NVIDIA microservices for extraction, embedding, and reranking of multimodal enterprise data — processes PDFs, images, tables, and databases up to 15× faster using accelerated computing
cuVS: CUDA-accelerated vector search library; vectors are stored in a cuVS-managed database and served via Docker Compose
Llama Nemotron: reasoning model with a unique toggleable reasoning mode — enables dynamic performance-cost trade-off and achieves up to 5× faster inference when reasoning is disabled
NeMo Agent Toolkit: orchestration layer managing multi-agent REST APIs, observability telemetry, and workflow profiling; acts as the unifying layer across diverse agentic frameworks
Multimodal PDF extraction: NeMo Retriever extraction microservices ingest structured, semi-structured, and unstructured content from diverse source formats

Key Claims and Findings

68% of available enterprise data goes unused (Gartner); AI-Q addresses this by providing always-current semantic search over private data
NeMo Retriever processes multimodal enterprise data at petabyte scale, up to 15× faster than non-accelerated pipelines
Llama Nemotron’s toggleable reasoning provides up to 5× faster inference while maintaining high-quality outputs on complex tasks
Therapyside’s Maia agent (built on Agent Toolkit + NeMo Retriever) saves clinicians 22 minutes per patient per day
Pangaea Data achieved 98% accuracy in clinical data retrieval for rare-disease detection, reducing configuration time from weeks to days

Architecture

The AI-Q pipeline follows five stages:

Multimodal ingestion — NeMo Retriever extraction microservices process PDFs, images, tables, and databases
Embedding and indexing — data is continuously embedded and stored in a cuVS-accelerated vector database
Retrieval — RAG with NeMo Retriever Reranking surfaces the most relevant context for each query
Reasoning — Llama Nemotron models decompose problems, plan iteratively, and reflect to produce nuanced answers
Observability — the Agent Toolkit profiler exports OpenTelemetry-compatible telemetry covering token usage, latency, and cost per agent/tool

Terminology

AGA (Artificial General Agent): agent capable of autonomous reasoning and action across heterogeneous data sources
OpenTelemetry: CNCF standard for distributed observability; supported natively by the NeMo Agent Toolkit for integration with monitoring tools
Data-grounded response: answer generated using retrieved, privately held enterprise data rather than parametric model knowledge
Framework-agnostic: the Agent Toolkit integrates with any popular agentic framework via modular plugin packages without requiring teams to replatform

Connections to Existing Wiki Pages

NeMo Agent Toolkit Evaluation — evaluation harness and profiler described in the blueprint are detailed here
NVIDIA NeMo Agent Toolkit — product overview of the Agent Toolkit that underpins the AI-Q Blueprint
AI-Q Blueprint (Knowledge Integration angle) — cross-section page covering RAG pipeline and NeMo Retriever in detail
Three Building Blocks for AI Virtual Assistants — companion article on the NIM + NeMo Retriever + Agent Toolkit stack
NeMo Agent Toolkit: Agent Evaluation — evaluation capabilities referenced as part of the toolkit used in AI-Q

Personal Wiki

Explorer

Chat With Your Enterprise Data Through Open-Source AI-Q NVIDIA Blueprint

Chat With Your Enterprise Data Through Open-Source AI-Q NVIDIA Blueprint

Abstract

Key Concepts

Key Claims and Findings

Architecture

Terminology

Connections to Existing Wiki Pages

Graph View

Table of Contents

Backlinks