Observability Concepts (LangSmith)

Abstract

LangSmith’s observability layer structures LLM application telemetry around four nested concepts: Projects (containers grouping all traces for one application/service), Traces (complete execution records for a single operation, bounded by a unique trace ID — analogous to OpenTelemetry traces), Runs (individual units of work within a trace such as one LLM call or retrieval step — analogous to OTel spans), and Threads (sequences of traces linked by a shared metadata key to represent a multi-turn conversation). Traces can be enriched with Feedback (run-level scores with a tag and value for annotation), Tags (strings for UI filtering), and Metadata (arbitrary key-value pairs for contextual information). Trace data is sent either via automatic instrumentation through framework integrations (LangChain, LangGraph, OpenAI, Anthropic, CrewAI — zero code changes) or via manual instrumentation (@traceable decorator, trace context manager, or the low-level RunTree API). SaaS data retention is 400 days; datasets snapshot trace data for indefinite persistence.


Key Concepts

  • Project: Top-level container grouping all traces from one application or service; the unit of organization for a deployed LLM system
  • Trace: A complete record of one operation’s execution (e.g. one user query handled end-to-end); bounded by a unique trace ID; capped at 25,000 runs per trace. Analogous to an OpenTelemetry trace
  • Run: A single unit of work within a trace — one LLM call, one prompt formatting step, one retrieval call, or any other discrete operation. Analogous to an OTel span
  • Thread: A sequence of traces linked by a shared session_id, thread_id, or conversation_id metadata key; represents a multi-turn conversation where each turn is its own trace
  • Feedback: Run-level quality annotation; each entry has a tag (label) and a score (continuous or discrete); reusable across runs organization-wide
  • Automatic Instrumentation: Framework integrations (LangChain, LangGraph, OpenAI, Anthropic, CrewAI) capture inputs, outputs, and metadata without any code changes — equivalent to OTel auto-instrumentation
  • Manual Instrumentation: Three mechanisms for tracing arbitrary code: @traceable/traceable decorator, trace context manager (Python), and RunTree API (low-level, explicit construction)
  • Polly: LangSmith’s built-in analysis assistant for querying and interpreting trace data without manual inspection

Data Structure Hierarchy

Project
├── Trace 1 (one user request, end-to-end)
│   ├── Run A  (LLM call)
│   ├── Run B  (retrieval step)
│   └── Run C  (output parser)
├── Trace 2
│   └── ...
└── Thread (links Trace 1 + Trace 2 as one conversation)

Threads are not automatic — the calling code must propagate a session_id/thread_id/conversation_id metadata key across turns.


Key Claims and Findings

  • Each trace is hard-capped at 25,000 runs; exceeding this causes LangSmith to reject additional runs for that trace
  • Threads require explicit propagation of a shared metadata key — they are not inferred automatically from conversation context
  • SaaS trace data retention is 400 days from ingestion; datasets (snapshots of selected runs) persist indefinitely
  • Feedback can be continuous or discrete (categorical); tags are reusable across runs within an organization

Sending Traces: Methods Summary

MethodWhen to Use
Framework integrationUsing LangChain, LangGraph, OpenAI, Anthropic, or CrewAI
@traceable decoratorTracing individual functions in any framework
trace context manager (Python)Wrapping specific code blocks
RunTree APIFine-grained, explicit trace construction

Terminology

  • Trace ID: Unique identifier binding all runs in one trace together
  • @traceable / traceable: Decorator that traces any function’s inputs, outputs, and errors as a run
  • RunTree API: Low-level API for explicit construction of trace hierarchies with full control over span attributes
  • Polly: LangSmith’s integrated analysis assistant for automated trace interpretation and pattern discovery
  • Data Retention: How long trace data is stored before permanent deletion; 400 days on LangSmith SaaS

Connections to Existing Wiki Pages

  • Log, Trace, and Monitor Portkey Integrations — Portkey provides analogous trace/logging capabilities (trace IDs, request logs, tagging) for non-LangSmith deployments; the two tools address the same observability need via different approaches — LangSmith as an integrated platform, Portkey as a gateway proxy
  • AI Agents in Production: Observability & Evaluation — the traces/spans model described in that article maps directly onto LangSmith’s Projects/Traces/Runs hierarchy; LangSmith Feedback implements the “user feedback scoring” mechanism discussed there
  • AI Agent Evaluation — Summary — LangSmith Feedback and Tags are the production collection layer for the evaluation metrics (task completion, LLM-as-a-judge scores) enumerated there