Observability Concepts (LangSmith)

Abstract

LangSmith’s observability layer structures LLM application telemetry around four nested concepts: Projects (containers grouping all traces for one application/service), Traces (complete execution records for a single operation, bounded by a unique trace ID — analogous to OpenTelemetry traces), Runs (individual units of work within a trace such as one LLM call or retrieval step — analogous to OTel spans), and Threads (sequences of traces linked by a shared metadata key to represent a multi-turn conversation). Traces can be enriched with Feedback (run-level scores with a tag and value for annotation), Tags (strings for UI filtering), and Metadata (arbitrary key-value pairs for contextual information). Trace data is sent either via automatic instrumentation through framework integrations (LangChain, LangGraph, OpenAI, Anthropic, CrewAI — zero code changes) or via manual instrumentation (@traceable decorator, trace context manager, or the low-level RunTree API). SaaS data retention is 400 days; datasets snapshot trace data for indefinite persistence.

Key Concepts

Project: Top-level container grouping all traces from one application or service; the unit of organization for a deployed LLM system
Trace: A complete record of one operation’s execution (e.g. one user query handled end-to-end); bounded by a unique trace ID; capped at 25,000 runs per trace. Analogous to an OpenTelemetry trace
Run: A single unit of work within a trace — one LLM call, one prompt formatting step, one retrieval call, or any other discrete operation. Analogous to an OTel span
Thread: A sequence of traces linked by a shared session_id, thread_id, or conversation_id metadata key; represents a multi-turn conversation where each turn is its own trace
Feedback: Run-level quality annotation; each entry has a tag (label) and a score (continuous or discrete); reusable across runs organization-wide
Automatic Instrumentation: Framework integrations (LangChain, LangGraph, OpenAI, Anthropic, CrewAI) capture inputs, outputs, and metadata without any code changes — equivalent to OTel auto-instrumentation
Manual Instrumentation: Three mechanisms for tracing arbitrary code: @traceable/traceable decorator, trace context manager (Python), and RunTree API (low-level, explicit construction)
Polly: LangSmith’s built-in analysis assistant for querying and interpreting trace data without manual inspection

Data Structure Hierarchy

Project
├── Trace 1 (one user request, end-to-end)
│   ├── Run A  (LLM call)
│   ├── Run B  (retrieval step)
│   └── Run C  (output parser)
├── Trace 2
│   └── ...
└── Thread (links Trace 1 + Trace 2 as one conversation)

Threads are not automatic — the calling code must propagate a session_id/thread_id/conversation_id metadata key across turns.

Key Claims and Findings

Each trace is hard-capped at 25,000 runs; exceeding this causes LangSmith to reject additional runs for that trace
Threads require explicit propagation of a shared metadata key — they are not inferred automatically from conversation context
SaaS trace data retention is 400 days from ingestion; datasets (snapshots of selected runs) persist indefinitely
Feedback can be continuous or discrete (categorical); tags are reusable across runs within an organization

Sending Traces: Methods Summary

Method	When to Use
Framework integration	Using LangChain, LangGraph, OpenAI, Anthropic, or CrewAI
`@traceable` decorator	Tracing individual functions in any framework
`trace` context manager (Python)	Wrapping specific code blocks
`RunTree` API	Fine-grained, explicit trace construction

Terminology

Trace ID: Unique identifier binding all runs in one trace together
@traceable / traceable: Decorator that traces any function’s inputs, outputs, and errors as a run
RunTree API: Low-level API for explicit construction of trace hierarchies with full control over span attributes
Polly: LangSmith’s integrated analysis assistant for automated trace interpretation and pattern discovery
Data Retention: How long trace data is stored before permanent deletion; 400 days on LangSmith SaaS

Connections to Existing Wiki Pages

Log, Trace, and Monitor Portkey Integrations — Portkey provides analogous trace/logging capabilities (trace IDs, request logs, tagging) for non-LangSmith deployments; the two tools address the same observability need via different approaches — LangSmith as an integrated platform, Portkey as a gateway proxy
AI Agents in Production: Observability & Evaluation — the traces/spans model described in that article maps directly onto LangSmith’s Projects/Traces/Runs hierarchy; LangSmith Feedback implements the “user feedback scoring” mechanism discussed there
AI Agent Evaluation — Summary — LangSmith Feedback and Tags are the production collection layer for the evaluation metrics (task completion, LLM-as-a-judge scores) enumerated there

Personal Wiki

Explorer

Observability Concepts (LangSmith)

Observability Concepts (LangSmith)

Abstract

Key Concepts

Data Structure Hierarchy

Key Claims and Findings

Sending Traces: Methods Summary

Terminology

Connections to Existing Wiki Pages

Graph View

Table of Contents

Backlinks