OpenTelemetry for AI Agents: How the Strands SDK Instruments Traces, Metrics, and Token Usage

2nd March 2026

I’ve been digging into the Strands Agents SDK and was surprised to find a comprehensive, production-ready OpenTelemetry integration baked right in. If you’re building AI agents and wondering how to get visibility into what’s actually happening at runtime — model calls, tool executions, latencies, token usage — this is worth understanding.

This post covers what Strands exports via OTEL, how to enable it, and a primer on OpenTelemetry itself for anyone unfamiliar.

What Strands Exports via OpenTelemetry

The SDK instruments three categories of telemetry data:

1. Traces (Distributed Tracing)

Every significant operation gets its own span, linked together in a trace tree:

Agent invocations — start_agent_span / end_agent_span
Model calls — start_model_invoke_span / end_model_invoke_span
Tool executions — start_tool_call_span / end_tool_call_span
Event loop cycles — each iteration of the agent loop
Multi-agent/swarm workflows — traces span across agent boundaries
MCP context propagation — distributed tracing works across MCP tool server boundaries

A single agent invocation produces a trace that looks like:

Agent Span
  ├── Event Loop Cycle 1
  │     ├── Model Invoke Span (Bedrock/Claude call)
  │     ├── Tool Call Span ("search_database")
  │     └── Tool Call Span ("format_response")
  ├── Event Loop Cycle 2
  │     ├── Model Invoke Span
  │     └── Tool Call Span ("send_email")
  └── Agent Complete

2. Metrics

Numerical measurements exported continuously:

Type	Metric Name	What It Measures
Counter	`strands.event_loop.cycle_count`	Total event loop iterations
Counter	`strands.tool.call_count`	Total tool invocations
Counter	`strands.tool.success_count`	Successful tool calls
Counter	`strands.tool.error_count`	Failed tool calls
Histogram	`strands.event_loop.latency`	Event loop cycle duration
Histogram	`strands.tool.duration`	Per-tool execution time
Histogram	`strands.model.time_to_first_token`	Model response latency
Histogram	Token counts	Input, output, and cached token usage

3. Span Attributes (GenAI Semantic Conventions)

Spans are annotated with standardized attributes following the emerging GenAI semantic conventions:

gen_ai.request.model      → "anthropic.claude-sonnet-4-20250514"
gen_ai.system              → "aws.bedrock"
gen_ai.agent.name          → "research-agent"
gen_ai.usage.input_tokens  → 1524
gen_ai.usage.output_tokens → 387
gen_ai.tool.name           → "search_database"
gen_ai.tool.status         → "success"

These conventions mean your agent telemetry is compatible with any observability tool that understands GenAI spans — no custom dashboards required.

How to Enable It

First, install the OTEL extras:

pip install strands-agents[otel]

This pulls in the required dependencies:

opentelemetry-api>=1.30.0
opentelemetry-sdk>=1.30.0
opentelemetry-instrumentation-threading>=0.51b0

Option A: Enable via Code

from strands.telemetry import StrandsTelemetry

# Print traces/metrics to console (for debugging)
StrandsTelemetry().setup_console_exporter()

# Export to an OTLP-compatible backend (for production)
StrandsTelemetry().setup_otlp_exporter()

Option B: Enable via Environment Variables

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=my-agents

That’s it. Once enabled, every agent invocation, model call, and tool execution automatically gets traced and measured. The data exports to any OTLP-compatible backend — Datadog, Jaeger, Langfuse, Grafana, Prometheus, or anything else that speaks OTLP.

What is OpenTelemetry?

If you haven’t encountered OTEL before, here’s the short version: OpenTelemetry is an open-source, vendor-neutral observability framework under the CNCF (Cloud Native Computing Foundation). It was formed by merging two earlier projects — OpenTracing and OpenCensus — into a single standard.

The Three Pillars of Observability

Pillar	What It Is	Agent Example
Traces	The journey of a request across services. Made up of spans (individual operations) linked in a tree.	Agent call → model invoke → tool call → response
Metrics	Numerical measurements over time — counters, histograms, gauges.	Request latency, token counts, error rates
Logs	Timestamped text records of discrete events.	Tool errors, model timeouts, agent state changes

Core Components

Three pieces make up the OTEL ecosystem:

SDKs — Language-specific libraries (Python, JS, Java, Go, etc.) that instrument your code and collect telemetry data. The Strands SDK uses the Python SDK internally.
OTLP (OpenTelemetry Protocol) — The standard wire protocol for transmitting traces, metrics, and logs. A unified, structured protobuf-based format that all OTEL-compatible tools understand.
Collector — An optional standalone service that receives, processes, and exports telemetry data. Useful for batching, filtering, and routing to multiple backends.

How Data Flows

Your App (Strands SDK)
       │
       │  instruments agent calls,
       │  model invokes, tool executions
       ▼
OTEL Python SDK
       │
       │  packages into OTLP format
       │  (protobuf spans + metrics)
       ▼
OTLP Exporter ──────► Backend
                       (Datadog, Jaeger, Grafana,
                        Langfuse, Prometheus, etc.)

What the OTEL Format Actually Looks Like

OTEL data is structured as JSON (or Protobuf on the wire). Here are concrete examples of all three signal types so you can see exactly what gets exported.

1. A Trace (Collection of Spans)

A trace is a tree of spans. Each span represents a unit of work. They share a trace_id and link to each other via parent_id:

{
  "name": "hello",
  "context": {
    "trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
    "span_id": "051581bf3cb55c13"
  },
  "parent_id": null,
  "start_time": "2022-04-29T18:52:58.114201Z",
  "end_time": "2022-04-29T18:52:58.114687Z",
  "status_code": "STATUS_CODE_OK",
  "attributes": {
    "http.method": "GET",
    "http.target": "/v1/sys/health"
  },
  "events": [
    {
      "name": "request_complete",
      "timestamp": "2022-04-29T18:52:58.114561Z",
      "attributes": { "event_attributes": 1 }
    }
  ]
}

In the Strands SDK, a span for a model call would look like:

{
  "name": "strands.model.invoke",
  "context": {
    "trace_id": "abc123...",
    "span_id": "def456..."
  },
  "parent_id": "parent_span_id",
  "attributes": {
    "gen_ai.operation.name": "invoke_model",
    "gen_ai.request.model": "anthropic.claude-sonnet",
    "gen_ai.system": "strands-agents",
    "gen_ai.agent.name": "my-agent",
    "gen_ai.usage.input_tokens": 1500,
    "gen_ai.usage.output_tokens": 350,
    "gen_ai.server.time_to_first_token": 230
  }
}

Spans nest into a tree — this is how you get the full picture of a single agent invocation:

Trace: 5b8aa5a2d2c872e8321cf37308d69df2
│
├── Span: "agent.invoke"          (root span)
│   ├── Span: "event_loop.cycle"  (child)
│   │   ├── Span: "model.invoke"  (grandchild)
│   │   └── Span: "tool.call"     (grandchild)
│   │       └── name: "calculator"
│   └── Span: "event_loop.cycle"  (child — 2nd cycle)
│       └── Span: "model.invoke"

All spans share the same trace_id, but each has a unique span_id and a parent_id linking it to its parent. This is what lets observability tools render the waterfall view you see in Jaeger or Datadog.

2. A Log Record

{
  "timestamp": "2024-08-04T12:34:56.789Z",
  "observedTimestamp": "2024-08-04T12:34:56.790Z",
  "severityText": "INFO",
  "severityNumber": 9,
  "body": "User login successful",
  "traceId": "5b8aa5a2d2c872e8321cf37308d69df2",
  "spanId": "051581bf3cb55c13",
  "traceFlags": "01",
  "resource": {
    "service.name": "user-authentication",
    "service.version": "1.0.0"
  },
  "attributes": {
    "user.id": "12345",
    "username": "johndoe"
  }
}

The key insight: traceId and spanId link this log to the exact span in the trace tree. That’s how OTEL correlates logs with traces automatically — when you see an error log, you can jump straight to the trace that produced it.

3. A Metric Data Point

{
  "name": "strands.event_loop.latency",
  "description": "Event loop latency in milliseconds",
  "unit": "ms",
  "type": "histogram",
  "dataPoints": [
    {
      "startTimeUnix": 1698000000000000000,
      "timeUnix": 1698000060000000000,
      "count": 42,
      "sum": 8400.5,
      "bucketCounts": [5, 10, 15, 8, 3, 1],
      "explicitBounds": [10, 50, 100, 250, 500],
      "attributes": {
        "gen_ai.request.model": "claude-sonnet"
      }
    }
  ]
}

OTEL supports four metric types:

Type	What It Does	Example
Counter	Monotonically increasing count	`strands.tool.call_count`
UpDownCounter	Count that goes up and down	Active connections
Histogram	Distribution of values across buckets	`strands.event_loop.latency`
Gauge	Current snapshot value at a point in time	CPU usage, memory

What Gets Sent Over the Wire (OTLP)

When the exporter fires, it packages everything into the OTLP format. Here’s what the actual JSON payload looks like (Protobuf binary encoding is more common in production, but JSON is also supported):

{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          { "key": "service.name",
            "value": { "stringValue": "strands-agents" } }
        ]
      },
      "scopeSpans": [
        {
          "scope": { "name": "strands.telemetry" },
          "spans": [
            {
              "traceId": "5b8aa5a...",
              "spanId": "051581b...",
              "parentSpanId": "",
              "name": "agent.invoke",
              "kind": 1,
              "startTimeUnixNano": "1698000000000000000",
              "endTimeUnixNano": "1698000001000000000",
              "attributes": [
                { "key": "gen_ai.system",
                  "value": { "stringValue": "strands-agents" } }
              ],
              "status": { "code": 1 }
            }
          ]
        }
      ]
    }
  ]
}

This is the actual JSON sent over HTTP to an OTLP collector endpoint. Every attribute is a key-value pair with an explicit type (stringValue, intValue, etc.) — verbose but unambiguous.

Why This Matters for AI Agents

Agents are harder to debug than traditional request/response services. A single user query might trigger multiple model calls, branch into different tool chains, retry on failures, and spawn sub-agents. Without observability, you’re flying blind when things go wrong in production.

What OTEL gives you for agents specifically:

Vendor-neutral instrumentation — Instrument once, export to any backend. Switch from Jaeger to Datadog without changing application code.
Correlated signals — Traces, metrics, and logs are automatically linked via context propagation (trace IDs, span IDs). A slow tool call shows up in both the trace waterfall and the latency histogram.
Token cost tracking — Input/output/cached token counts on every model call, exportable to your metrics backend for cost dashboards.
Multi-agent visibility — Distributed tracing across agent boundaries and MCP tool servers. You can follow a request through an entire swarm.

The Strands SDK handles all the instrumentation automatically. You enable the exporter, and every agent operation becomes visible. For anyone running agents in production, this is table stakes observability — and it’s good to see it built into the SDK rather than bolted on as an afterthought.

Posted 2nd March 2026 at 3:23 pm · Subscribe to my newsletter

Akshay Parkhi's Weblog