OpenTelemetry for AI Agents: How the Strands SDK Instruments Traces, Metrics, and Token Usage
2nd March 2026
I’ve been digging into the Strands Agents SDK and was surprised to find a comprehensive, production-ready OpenTelemetry integration baked right in. If you’re building AI agents and wondering how to get visibility into what’s actually happening at runtime — model calls, tool executions, latencies, token usage — this is worth understanding.
This post covers what Strands exports via OTEL, how to enable it, and a primer on OpenTelemetry itself for anyone unfamiliar.
What Strands Exports via OpenTelemetry
The SDK instruments three categories of telemetry data:
1. Traces (Distributed Tracing)
Every significant operation gets its own span, linked together in a trace tree:
- Agent invocations —
start_agent_span/end_agent_span - Model calls —
start_model_invoke_span/end_model_invoke_span - Tool executions —
start_tool_call_span/end_tool_call_span - Event loop cycles — each iteration of the agent loop
- Multi-agent/swarm workflows — traces span across agent boundaries
- MCP context propagation — distributed tracing works across MCP tool server boundaries
A single agent invocation produces a trace that looks like:
Agent Span
├── Event Loop Cycle 1
│ ├── Model Invoke Span (Bedrock/Claude call)
│ ├── Tool Call Span ("search_database")
│ └── Tool Call Span ("format_response")
├── Event Loop Cycle 2
│ ├── Model Invoke Span
│ └── Tool Call Span ("send_email")
└── Agent Complete
2. Metrics
Numerical measurements exported continuously:
| Type | Metric Name | What It Measures |
|---|---|---|
| Counter | strands.event_loop.cycle_count | Total event loop iterations |
| Counter | strands.tool.call_count | Total tool invocations |
| Counter | strands.tool.success_count | Successful tool calls |
| Counter | strands.tool.error_count | Failed tool calls |
| Histogram | strands.event_loop.latency | Event loop cycle duration |
| Histogram | strands.tool.duration | Per-tool execution time |
| Histogram | strands.model.time_to_first_token | Model response latency |
| Histogram | Token counts | Input, output, and cached token usage |
3. Span Attributes (GenAI Semantic Conventions)
Spans are annotated with standardized attributes following the emerging GenAI semantic conventions:
gen_ai.request.model → "anthropic.claude-sonnet-4-20250514"
gen_ai.system → "aws.bedrock"
gen_ai.agent.name → "research-agent"
gen_ai.usage.input_tokens → 1524
gen_ai.usage.output_tokens → 387
gen_ai.tool.name → "search_database"
gen_ai.tool.status → "success"
These conventions mean your agent telemetry is compatible with any observability tool that understands GenAI spans — no custom dashboards required.
How to Enable It
First, install the OTEL extras:
pip install strands-agents[otel]
This pulls in the required dependencies:
opentelemetry-api>=1.30.0
opentelemetry-sdk>=1.30.0
opentelemetry-instrumentation-threading>=0.51b0
Option A: Enable via Code
from strands.telemetry import StrandsTelemetry
# Print traces/metrics to console (for debugging)
StrandsTelemetry().setup_console_exporter()
# Export to an OTLP-compatible backend (for production)
StrandsTelemetry().setup_otlp_exporter()
Option B: Enable via Environment Variables
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=my-agents
That’s it. Once enabled, every agent invocation, model call, and tool execution automatically gets traced and measured. The data exports to any OTLP-compatible backend — Datadog, Jaeger, Langfuse, Grafana, Prometheus, or anything else that speaks OTLP.
What is OpenTelemetry?
If you haven’t encountered OTEL before, here’s the short version: OpenTelemetry is an open-source, vendor-neutral observability framework under the CNCF (Cloud Native Computing Foundation). It was formed by merging two earlier projects — OpenTracing and OpenCensus — into a single standard.
The Three Pillars of Observability
| Pillar | What It Is | Agent Example |
|---|---|---|
| Traces | The journey of a request across services. Made up of spans (individual operations) linked in a tree. | Agent call → model invoke → tool call → response |
| Metrics | Numerical measurements over time — counters, histograms, gauges. | Request latency, token counts, error rates |
| Logs | Timestamped text records of discrete events. | Tool errors, model timeouts, agent state changes |
Core Components
Three pieces make up the OTEL ecosystem:
- SDKs — Language-specific libraries (Python, JS, Java, Go, etc.) that instrument your code and collect telemetry data. The Strands SDK uses the Python SDK internally.
- OTLP (OpenTelemetry Protocol) — The standard wire protocol for transmitting traces, metrics, and logs. A unified, structured protobuf-based format that all OTEL-compatible tools understand.
- Collector — An optional standalone service that receives, processes, and exports telemetry data. Useful for batching, filtering, and routing to multiple backends.
How Data Flows
Your App (Strands SDK)
│
│ instruments agent calls,
│ model invokes, tool executions
▼
OTEL Python SDK
│
│ packages into OTLP format
│ (protobuf spans + metrics)
▼
OTLP Exporter ──────► Backend
(Datadog, Jaeger, Grafana,
Langfuse, Prometheus, etc.)
What the OTEL Format Actually Looks Like
OTEL data is structured as JSON (or Protobuf on the wire). Here are concrete examples of all three signal types so you can see exactly what gets exported.
1. A Trace (Collection of Spans)
A trace is a tree of spans. Each span represents a unit of work. They share a trace_id and link to each other via parent_id:
{
"name": "hello",
"context": {
"trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "051581bf3cb55c13"
},
"parent_id": null,
"start_time": "2022-04-29T18:52:58.114201Z",
"end_time": "2022-04-29T18:52:58.114687Z",
"status_code": "STATUS_CODE_OK",
"attributes": {
"http.method": "GET",
"http.target": "/v1/sys/health"
},
"events": [
{
"name": "request_complete",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": { "event_attributes": 1 }
}
]
}
In the Strands SDK, a span for a model call would look like:
{
"name": "strands.model.invoke",
"context": {
"trace_id": "abc123...",
"span_id": "def456..."
},
"parent_id": "parent_span_id",
"attributes": {
"gen_ai.operation.name": "invoke_model",
"gen_ai.request.model": "anthropic.claude-sonnet",
"gen_ai.system": "strands-agents",
"gen_ai.agent.name": "my-agent",
"gen_ai.usage.input_tokens": 1500,
"gen_ai.usage.output_tokens": 350,
"gen_ai.server.time_to_first_token": 230
}
}
Spans nest into a tree — this is how you get the full picture of a single agent invocation:
Trace: 5b8aa5a2d2c872e8321cf37308d69df2
│
├── Span: "agent.invoke" (root span)
│ ├── Span: "event_loop.cycle" (child)
│ │ ├── Span: "model.invoke" (grandchild)
│ │ └── Span: "tool.call" (grandchild)
│ │ └── name: "calculator"
│ └── Span: "event_loop.cycle" (child — 2nd cycle)
│ └── Span: "model.invoke"
All spans share the same trace_id, but each has a unique span_id and a parent_id linking it to its parent. This is what lets observability tools render the waterfall view you see in Jaeger or Datadog.
2. A Log Record
{
"timestamp": "2024-08-04T12:34:56.789Z",
"observedTimestamp": "2024-08-04T12:34:56.790Z",
"severityText": "INFO",
"severityNumber": 9,
"body": "User login successful",
"traceId": "5b8aa5a2d2c872e8321cf37308d69df2",
"spanId": "051581bf3cb55c13",
"traceFlags": "01",
"resource": {
"service.name": "user-authentication",
"service.version": "1.0.0"
},
"attributes": {
"user.id": "12345",
"username": "johndoe"
}
}
The key insight: traceId and spanId link this log to the exact span in the trace tree. That’s how OTEL correlates logs with traces automatically — when you see an error log, you can jump straight to the trace that produced it.
3. A Metric Data Point
{
"name": "strands.event_loop.latency",
"description": "Event loop latency in milliseconds",
"unit": "ms",
"type": "histogram",
"dataPoints": [
{
"startTimeUnix": 1698000000000000000,
"timeUnix": 1698000060000000000,
"count": 42,
"sum": 8400.5,
"bucketCounts": [5, 10, 15, 8, 3, 1],
"explicitBounds": [10, 50, 100, 250, 500],
"attributes": {
"gen_ai.request.model": "claude-sonnet"
}
}
]
}
OTEL supports four metric types:
| Type | What It Does | Example |
|---|---|---|
| Counter | Monotonically increasing count | strands.tool.call_count |
| UpDownCounter | Count that goes up and down | Active connections |
| Histogram | Distribution of values across buckets | strands.event_loop.latency |
| Gauge | Current snapshot value at a point in time | CPU usage, memory |
What Gets Sent Over the Wire (OTLP)
When the exporter fires, it packages everything into the OTLP format. Here’s what the actual JSON payload looks like (Protobuf binary encoding is more common in production, but JSON is also supported):
{
"resourceSpans": [
{
"resource": {
"attributes": [
{ "key": "service.name",
"value": { "stringValue": "strands-agents" } }
]
},
"scopeSpans": [
{
"scope": { "name": "strands.telemetry" },
"spans": [
{
"traceId": "5b8aa5a...",
"spanId": "051581b...",
"parentSpanId": "",
"name": "agent.invoke",
"kind": 1,
"startTimeUnixNano": "1698000000000000000",
"endTimeUnixNano": "1698000001000000000",
"attributes": [
{ "key": "gen_ai.system",
"value": { "stringValue": "strands-agents" } }
],
"status": { "code": 1 }
}
]
}
]
}
]
}
This is the actual JSON sent over HTTP to an OTLP collector endpoint. Every attribute is a key-value pair with an explicit type (stringValue, intValue, etc.) — verbose but unambiguous.
Why This Matters for AI Agents
Agents are harder to debug than traditional request/response services. A single user query might trigger multiple model calls, branch into different tool chains, retry on failures, and spawn sub-agents. Without observability, you’re flying blind when things go wrong in production.
What OTEL gives you for agents specifically:
- Vendor-neutral instrumentation — Instrument once, export to any backend. Switch from Jaeger to Datadog without changing application code.
- Correlated signals — Traces, metrics, and logs are automatically linked via context propagation (trace IDs, span IDs). A slow tool call shows up in both the trace waterfall and the latency histogram.
- Token cost tracking — Input/output/cached token counts on every model call, exportable to your metrics backend for cost dashboards.
- Multi-agent visibility — Distributed tracing across agent boundaries and MCP tool servers. You can follow a request through an entire swarm.
The Strands SDK handles all the instrumentation automatically. You enable the exporter, and every agent operation becomes visible. For anyone running agents in production, this is table stakes observability — and it’s good to see it built into the SDK rather than bolted on as an afterthought.
More recent articles
- OpenUSD: Advanced Patterns and Common Gotchas. - 28th March 2026
- OpenUSD Mastery: From Composition to Pipeline — A SO-101 Arm Journey - 25th March 2026
- Learning OpenUSD — From Curious Questions to Real Understanding - 19th March 2026