Observability Concepts (LangSmith)
Abstract
LangSmith’s observability layer structures LLM application telemetry around four nested concepts: Projects (containers grouping all traces for one application/service), Traces (complete execution records for a single operation, bounded by a unique trace ID — analogous to OpenTelemetry traces), Runs (individual units of work within a trace such as one LLM call or retrieval step — analogous to OTel spans), and Threads (sequences of traces linked by a shared metadata key to represent a multi-turn conversation). Traces can be enriched with Feedback (run-level scores with a tag and value for annotation), Tags (strings for UI filtering), and Metadata (arbitrary key-value pairs for contextual information). Trace data is sent either via automatic instrumentation through framework integrations (LangChain, LangGraph, OpenAI, Anthropic, CrewAI — zero code changes) or via manual instrumentation (@traceable decorator, trace context manager, or the low-level RunTree API). SaaS data retention is 400 days; datasets snapshot trace data for indefinite persistence.
Key Concepts
- Project: Top-level container grouping all traces from one application or service; the unit of organization for a deployed LLM system
- Trace: A complete record of one operation’s execution (e.g. one user query handled end-to-end); bounded by a unique trace ID; capped at 25,000 runs per trace. Analogous to an OpenTelemetry trace
- Run: A single unit of work within a trace — one LLM call, one prompt formatting step, one retrieval call, or any other discrete operation. Analogous to an OTel span
- Thread: A sequence of traces linked by a shared
session_id,thread_id, orconversation_idmetadata key; represents a multi-turn conversation where each turn is its own trace - Feedback: Run-level quality annotation; each entry has a tag (label) and a score (continuous or discrete); reusable across runs organization-wide
- Automatic Instrumentation: Framework integrations (LangChain, LangGraph, OpenAI, Anthropic, CrewAI) capture inputs, outputs, and metadata without any code changes — equivalent to OTel auto-instrumentation
- Manual Instrumentation: Three mechanisms for tracing arbitrary code:
@traceable/traceabledecorator,tracecontext manager (Python), andRunTreeAPI (low-level, explicit construction) - Polly: LangSmith’s built-in analysis assistant for querying and interpreting trace data without manual inspection
Data Structure Hierarchy
Project
├── Trace 1 (one user request, end-to-end)
│ ├── Run A (LLM call)
│ ├── Run B (retrieval step)
│ └── Run C (output parser)
├── Trace 2
│ └── ...
└── Thread (links Trace 1 + Trace 2 as one conversation)
Threads are not automatic — the calling code must propagate a session_id/thread_id/conversation_id metadata key across turns.
Key Claims and Findings
- Each trace is hard-capped at 25,000 runs; exceeding this causes LangSmith to reject additional runs for that trace
- Threads require explicit propagation of a shared metadata key — they are not inferred automatically from conversation context
- SaaS trace data retention is 400 days from ingestion; datasets (snapshots of selected runs) persist indefinitely
- Feedback can be continuous or discrete (categorical); tags are reusable across runs within an organization
Sending Traces: Methods Summary
| Method | When to Use |
|---|---|
| Framework integration | Using LangChain, LangGraph, OpenAI, Anthropic, or CrewAI |
@traceable decorator | Tracing individual functions in any framework |
trace context manager (Python) | Wrapping specific code blocks |
RunTree API | Fine-grained, explicit trace construction |
Terminology
- Trace ID: Unique identifier binding all runs in one trace together
@traceable/traceable: Decorator that traces any function’s inputs, outputs, and errors as a runRunTreeAPI: Low-level API for explicit construction of trace hierarchies with full control over span attributes- Polly: LangSmith’s integrated analysis assistant for automated trace interpretation and pattern discovery
- Data Retention: How long trace data is stored before permanent deletion; 400 days on LangSmith SaaS
Connections to Existing Wiki Pages
- Log, Trace, and Monitor Portkey Integrations — Portkey provides analogous trace/logging capabilities (trace IDs, request logs, tagging) for non-LangSmith deployments; the two tools address the same observability need via different approaches — LangSmith as an integrated platform, Portkey as a gateway proxy
- AI Agents in Production: Observability & Evaluation — the traces/spans model described in that article maps directly onto LangSmith’s Projects/Traces/Runs hierarchy; LangSmith Feedback implements the “user feedback scoring” mechanism discussed there
- AI Agent Evaluation — Summary — LangSmith Feedback and Tags are the production collection layer for the evaluation metrics (task completion, LLM-as-a-judge scores) enumerated there