Akshay Parkhi's Weblog

Subscribe

AG-UI Protocol: A Layer-by-Layer Deep Dive with Real Network Captures

4th April 2026

There’s a common misconception about AG-UI: people treat it as a transport protocol. It isn’t. AG-UI rides on top of HTTP and WebSocket — it doesn’t replace them. Understanding where each layer starts and stops is the key to debugging, optimizing, and building correctly with it.

┌─────────────────────────────────────────────────────┐
│  Application Layer                                  │
│  AG-UI Event Protocol                               │
│  (RUN_STARTED, TEXT_MESSAGE_*, TOOL_CALL_*,         │
│   STATE_SNAPSHOT)                                   │
├─────────────────────────────────────────────────────┤
│  Transport Layer                                    │
│  Option A: HTTP + SSE       Option B: WebSocket     │
│  POST /invocations          wss://.../ws            │
│  Content-Type:              Upgrade: websocket      │
│    text/event-stream                                │
├─────────────────────────────────────────────────────┤
│  Network Layer                                      │
│  TCP + TLS (both use the same thing)                │
└─────────────────────────────────────────────────────┘

AG-UI defines what is sent. HTTP and WebSocket define how it’s sent. Think of JSON vs HTTP — JSON is the data format, HTTP is the transport. You send JSON over HTTP. Similarly, AG-UI is an event protocol; SSE and WebSocket are two different transports that carry it.

To make this concrete: we ran Playwright tests with CDP (Chrome DevTools Protocol) against a live AgentCore deployment to capture actual packet-level data for both transports. Everything below comes from those captures.

Layer 1 — Network Transport

Both SSE and WebSocket use identical Layer 1 infrastructure:

Remote IP:    x.xx.xx.xxx:443   (AgentCore endpoint)
TLS:          TLS 1.3
Cipher:       AES_128_GCM
Certificate:  Amazon RSA 2048 M03
Protocol:     TCP → TLS → HTTP/2 (SSE)
              TCP → TLS → HTTP/1.1+Upgrade (WebSocket)

An observer watching the network sees no difference — both are encrypted TCP streams to port 443. Where they diverge is what happens after the handshake.

SSE connection lifecycle:

TCP SYN → SYN-ACK → ACK                  (3-way handshake)
TLS ClientHello → ServerHello → Finished  (TLS 1.3, 1-RTT)
HTTP/2 SETTINGS frame                     (HTTP/2 negotiation)
── connection ready ──
OPTIONS /invocations                      (CORS preflight)
POST /invocations                         (actual request)
← streaming response chunks               (events arrive)
── connection kept alive ──
POST /invocations                         (next message — NEW request on same TCP)
← streaming response

WebSocket connection lifecycle:

TCP SYN → SYN-ACK → ACK                  (same 3-way handshake)
TLS ClientHello → ServerHello → Finished  (same TLS 1.3)
GET /ws (Upgrade: websocket)              (HTTP upgrade request)
← 101 Switching Protocols                 (protocol switch — HTTP is done here)
── TCP connection is now WebSocket ──
→ frame (message 1)                       (raw WS frames)
← frame ← frame ← frame
→ frame (message 2)                       (same pipe, no setup overhead)
← frame ← frame ← frame
→ close frame
← close frame

The critical Layer 1 difference: after the initial handshake, SSE stays in HTTP mode — each new message is a full HTTP request/response cycle. WebSocket upgrades away from HTTP. The TCP connection becomes a raw frame-based pipe. No HTTP headers, no request/response semantics. Just frames flowing in both directions.

Layer 2 — Transport Framing

The same AG-UI event looks completely different at the wire level depending on which transport carries it.

SSE framing (from captured headers):

Before a single AG-UI event arrives, the browser sends:

POST /runtimes/arn%3Aaws%3A.../invocations?qualifier=DEFAULT HTTP/2
Host: bedrock-agentcore.us-east-1.amazonaws.com
Content-Type: application/json
Accept: text/event-stream, application/json
Authorization: Bearer eyJraWQiOiJCSFwvQjVEOVh...    ← 1,081 bytes
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id: 52ed4489-...
Origin: https://d3rpk5004rsri0.cloudfront.net
Sec-Fetch-Mode: cors
sec-ch-ua: "HeadlessChrome";v="147"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...

{"threadId":"t1","runId":"r1","state":{},"messages":[...]}    ← 430 bytes

Overhead per message before any event comes back: ~2,311 bytes (CORS preflight + HTTP headers + auth token + request body).

The response arrives as a text/event-stream, with each event formatted as:

data: {"type":"RUN_STARTED","threadId":"t1","runId":"r1"}\n\n
data: {"type":"TEXT_MESSAGE_START","messageId":"abc"}\n\n
data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":"Hi"}\n\n
data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":" there"}\n\n
data: {"type":"RUN_FINISHED","threadId":"t1","runId":"r1"}\n\n

SSE framing cost per event:

"data: "          = 6 bytes prefix
"{json payload}"  = variable
"\n\n"            = 2 bytes terminator
HTTP/2 DATA frame = 9 bytes header
                    ───────────────
                    17 bytes overhead per AG-UI event

WebSocket framing (from captured frames):

The browser sends one HTTP Upgrade request — this happens once, not per message:

GET /runtimes/arn%3A.../ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: qJSR4G+mpEAzrfElKVFhvA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: base64UrlBearerAuthorization.ZXlKcmFXUWl...[1461 chars]
Sec-WebSocket-Protocol: base64UrlBearerAuthorization

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Sec-WebSocket-Accept: YP1UCDyzHAuiDOCdM0TANqraFwU=
Sec-WebSocket-Protocol: base64UrlBearerAuthorization
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id: c056eb10-...

After 101, HTTP is gone. Subsequent frames captured from the session:

→ FRAME SEND (430 bytes, opcode=1)     RunAgentInput JSON
← FRAME RECV (158 bytes, opcode=1)     RUN_STARTED
← FRAME RECV (73 bytes,  opcode=1)     STATE_SNAPSHOT
← FRAME RECV (146 bytes, opcode=1)     TEXT_MESSAGE_START
← FRAME RECV (130 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: "Hi"
← FRAME RECV (134 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: " there"
← FRAME RECV (133 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: "! How"
← FRAME RECV (132 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: " are"
← FRAME RECV (133 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: " you?"
← FRAME RECV (113 bytes, opcode=1)     TEXT_MESSAGE_END
← FRAME RECV (73 bytes,  opcode=1)     STATE_SNAPSHOT
← FRAME RECV (139 bytes, opcode=1)     RUN_FINISHED

WebSocket frame structure (RFC 6455):

┌─────┬─────┬──────────┬────────────────────────────┐
│ FIN │ RSV │ Opcode   │ Payload length             │
├─────┴─────┴──────────┴────────────────────────────┤
│ Masking key (4 bytes, client→server only)          │
├───────────────────────────────────────────────────┤
│ Payload data (the AG-UI JSON)                     │
└───────────────────────────────────────────────────┘

Overhead: 2 bytes per event (server→client)
          6 bytes per event (client→server)

Side-by-side for the same event{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":"Hi"}:

SSE on the wire (152 bytes total):
┌─────────────────────────────────────────────┐
│ HTTP/2 DATA frame header       (9 bytes)    │ ← HTTP/2 framing
│ "data: "                       (6 bytes)    │ ← SSE prefix
│ {"type":"TEXT_MESSAGE_CONTENT",...}(129 bytes)│ ← AG-UI payload
│ "\n\n"                         (2 bytes)    │ ← SSE terminator
└─────────────────────────────────────────────┘
  Overhead: 17 bytes (13%)

WebSocket on the wire (132 bytes total):
┌─────────────────────────────────────────────┐
│ WS frame header                (2 bytes)    │ ← WS framing
│ {"type":"TEXT_MESSAGE_CONTENT",...}(130 bytes)│ ← AG-UI payload
└─────────────────────────────────────────────┘
  Overhead: 2 bytes (1.5%)

WebSocket has 8x less framing overhead per event. The bigger difference is at message boundaries — SSE sends 2,311 bytes of setup per message; WebSocket sends 436 bytes (the frame + payload) per message after the initial connection.

How both transports hand off to the same handler:

// SSE transport — strips "data: " prefix, parses JSON
for (const line of lines) {
  if (line.startsWith("data: ")) {
    const event: AguiEvent = JSON.parse(line.slice(6));  // strip SSE framing
    onEvent(event);  // ← same handler
  }
}

// WebSocket transport — parses JSON directly from frame
ws.onmessage = (ev) => {
  const event: AguiEvent = JSON.parse(ev.data);  // no framing to strip
  onEvent(event);  // ← same handler
};

The frontend’s onEvent function is identical for both transports. Layer 2 strips the framing; Layer 3 sees the same object either way.

Layer 3 — AG-UI Event Protocol

After stripping Layer 2 framing, both transports produce identical JSON objects. From the captured session:

Event #1:  {"type":"RUN_STARTED","threadId":"thread_2_1775335498802","runId":"run_3_..."}
Event #2:  {"type":"STATE_SNAPSHOT","snapshot":{}}
Event #3:  {"type":"TEXT_MESSAGE_START","messageId":"8bfc10b0-027e-...","role":"assistant"}
Event #4:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":"Hi"}
Event #5:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" there"}
Event #6:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":"! How"}
Event #7:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" are"}
Event #8:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" you?"}
Event #9:  {"type":"TEXT_MESSAGE_END","messageId":"8bfc10b0-027e-..."}
Event #10: {"type":"STATE_SNAPSHOT","snapshot":{}}
Event #11: {"type":"RUN_FINISHED","threadId":"thread_2_...","runId":"run_3_..."}

The AG-UI state machine:

                  ┌─────────────┐
                  │ RUN_STARTED │
                  └──────┬──────┘
                         │
                  ┌──────▼──────┐
           ┌─────▶│   RUNNING   │◀──────────────────────┐
           │      └──────┬──────┘                        │
           │             │                               │
           │      ┌──────▼──────────────┐                │
           │      │ TEXT_MESSAGE_START  │                │
           │      │ TEXT_MESSAGE_CONTENT│ (0..N times)   │
           │      │ TEXT_MESSAGE_END    │                │
           │      └──────┬─────────────┘                 │
           │             │                               │
           │      ┌──────▼──────────────┐                │
           │      │ TOOL_CALL_START     │                │
           │      │ TOOL_CALL_ARGS      │ (0..N times)   │
           │      │ TOOL_CALL_END       │                │
           │      │ TOOL_CALL_RESULT    │                │
           │      └──────┬─────────────┘                 │
           │             │                               │
           │      ┌──────▼──────┐                        │
           │      │STATE_SNAPSHOT│ (after state-changing │
           │      └──────┬──────┘  tool calls)           │
           └─────────────┘   (agent loops: think → tool → think)

                  ┌──────────────┐
                  │ RUN_FINISHED │  (or RUN_ERROR)
                  └──────────────┘

Ordering rules:

  1. Every run starts with RUN_STARTED and ends with RUN_FINISHED or RUN_ERROR
  2. TEXT_MESSAGE_CONTENT can only appear between TEXT_MESSAGE_START and TEXT_MESSAGE_END
  3. TOOL_CALL_ARGS can only appear between TOOL_CALL_START and TOOL_CALL_END
  4. STATE_SNAPSHOT can appear at any point — usually after a state-changing tool call
  5. The agent can cycle through think → tool → think → tool multiple times before finishing
  6. All events within a run share the same threadId and runId
  7. messageId ties text events together; toolCallId ties tool events together

What each key field means:

RUN_STARTED {
  threadId: "thread_2_1775335498802"  // Conversation (survives across runs)
  runId:    "run_3_1775335498802"     // This single request/response only
}

TEXT_MESSAGE_START {
  messageId: "8bfc10b0-027e-..."      // Groups content deltas together
  role: "assistant"                    // Always "assistant" for agent output
}
TEXT_MESSAGE_CONTENT {
  messageId: "8bfc10b0-027e-..."      // Must match the START event
  delta: "Hi"                         // Incremental — NOT cumulative
}
// Concatenating all deltas: "Hi" + " there" + "! How" + " are" + " you?"
// → "Hi there! How are you?"

TOOL_CALL_START {
  toolCallId:     "tooluse_V0vFkv2N5..."  // Groups tool events together
  toolCallName:   "research_topic"         // Which tool the agent is calling
  parentMessageId: "ebf4d1dd-..."          // Links to the assistant message
}
TOOL_CALL_ARGS {
  toolCallId: "tooluse_V0vFkv2N5..."
  delta: '{"query": "cloud security"}'    // JSON args, may arrive in chunks
}

STATE_SNAPSHOT {
  snapshot: {                             // Complete replacement of shared state
    title: "Cloud Security Guide",        // Application-defined structure
    sections: [...],                      // (not prescribed by AG-UI)
    metadata: { version: 1 }
  }
}

The request contract — what the frontend sends:

RunAgentInput {
  threadId: string     // Identifies the conversation
  runId: string        // Identifies this specific run
  state: any           // Current shared state (sent to agent for context)
  messages: Message[]  // Full conversation history
    // Each: { id, role, content }
    // role: "user" | "assistant" | "tool" | "system"
    // "tool" messages carry results for client-side tools
  tools: Tool[]        // Client-side tool definitions
    // Proxy tools — agent calls them, frontend executes them
    // (e.g., confirmation dialogs, file pickers)
  context: Context[]   // Additional context (RAG results, etc.)
  forwardedProps: any  // Pass-through metadata
}

The state field is what makes bidirectional shared state work. Frontend sends current state → agent sees it → agent modifies it via tools → STATE_SNAPSHOT sends new state back → frontend renders it → next request sends the updated state again. A continuous loop.

The Complete Picture

Here is every byte exchanged for a single “Say hi in 5 words” message over SSE:

BROWSER                               AGENTCORE (x.xx.xx.xxx)
  │                                        │
  │──── TCP SYN ─────────────────────────▶│  Layer 1: TCP
  │◀─── TCP SYN-ACK ──────────────────────│
  │──── TCP ACK ─────────────────────────▶│
  │                                        │
  │──── TLS ClientHello (TLS 1.3) ───────▶│  Layer 1: TLS
  │◀─── TLS ServerHello + Cert ───────────│
  │──── TLS Finished ────────────────────▶│
  │                                        │
  │──── POST /invocations ───────────────▶│  Layer 2: HTTP/2 request
  │     Headers: 800 bytes                 │  (auth, content-type, session-id)
  │     Auth: 1081 bytes                   │
  │     Body: 430 bytes                    │  (RunAgentInput JSON)
  │                                        │
  │◀─── 200 text/event-stream ────────────│  Layer 2: HTTP/2 response headers
  │                                        │
  │◀─── "data: {RUN_STARTED}\n\n" ────────│  Layer 2+3: SSE frame + AG-UI event
  │◀─── "data: {STATE_SNAPSHOT}\n\n" ─────│  Layer 2+3
  │◀─── "data: {TEXT_MSG_START}\n\n" ─────│  Layer 2+3
  │◀─── "data: {TEXT_MSG_CONTENT}\n\n" ───│  Layer 2+3 (×5 chunks)
  │◀─── "data: {TEXT_MSG_END}\n\n" ───────│  Layer 2+3
  │◀─── "data: {STATE_SNAPSHOT}\n\n" ─────│  Layer 2+3
  │◀─── "data: {RUN_FINISHED}\n\n" ───────│  Layer 2+3
  │                                        │
  │──── (connection stays open) ──────────│  Layer 1: HTTP/2 keep-alive

The same message over WebSocket:

BROWSER                               AGENTCORE (x.xx.xx.xxx)
  │                                        │
  │──── TCP SYN ─────────────────────────▶│  Layer 1: TCP (same)
  │◀─── TCP SYN-ACK ──────────────────────│
  │──── TCP ACK ─────────────────────────▶│
  │                                        │
  │──── TLS ClientHello (TLS 1.3) ───────▶│  Layer 1: TLS (same)
  │◀─── TLS ServerHello + Cert ───────────│
  │──── TLS Finished ────────────────────▶│
  │                                        │
  │──── GET /ws (Upgrade: websocket) ────▶│  Layer 2: WS handshake
  │     Sec-WebSocket-Protocol: base64...  │  (auth baked into handshake)
  │◀─── 101 Switching Protocols ──────────│  HTTP is DONE here
  │                                        │
  │═══════════════ TCP is now WebSocket ══│
  │                                        │
  │──── [frame: RunAgentInput] ──────────▶│  Layer 2: 2+4+430 bytes
  │                                        │  NO HTTP headers
  │◀─── [frame: RUN_STARTED]    (158B) ───│  Layer 2+3
  │◀─── [frame: STATE_SNAPSHOT] (73B) ────│  Layer 2+3
  │◀─── [frame: TEXT_MSG_START] (146B) ───│  Layer 2+3
  │◀─── [frame: TEXT_MSG_CONTENT] (130B) ─│  Layer 2+3 (×5)
  │◀─── [frame: TEXT_MSG_END]   (113B) ───│  Layer 2+3
  │◀─── [frame: STATE_SNAPSHOT] (73B) ────│  Layer 2+3
  │◀─── [frame: RUN_FINISHED]   (139B) ───│  Layer 2+3
  │                                        │
  │══ connection open for message 2 ══════│  Layer 1: same TCP pipe
  │                                        │
  │──── [frame: RunAgentInput #2] ───────▶│  NO new TCP, TLS, HTTP, or auth
  │◀─── [frames: events...] ──────────────│  Just frames

What the Layers Mean in Practice

Most AG-UI debugging happens at exactly one of these layers. Knowing which layer the problem lives in tells you where to look.

SymptomLayerWhere to look
Connection refused or TLS errorLayer 1Network config, certificates, port 443 access
WebSocket 401 or auth failureLayer 2Sec-WebSocket-Protocol header — are you using access tokens, not ID tokens?
SSE events not arriving / hangingLayer 2Missing Accept: text/event-stream header; proxy buffering the response
Frontend crashes on empty stateLayer 3First STATE_SNAPSHOT is always {} — guard optional fields
Multiple chat bubbles per runLayer 3Multiple TEXT_MESSAGE_START events are normal — collapse consecutive assistant messages
422 validation error on second messageLayer 3Messages missing id field in RunAgentInput
High latency on every messageLayer 1+2SSE pays TCP+TLS+HTTP per message; consider WebSocket for interactive sessions

One-liner summary: HTTP/WebSocket is the road. AG-UI is the language everyone speaks on it. Layer 1 is the asphalt. Layer 2 is whether you drive a car or a motorbike. Layer 3 is what you say when you get there.

This is AG-UI Protocol: A Layer-by-Layer Deep Dive with Real Network Captures by Akshay Parkhi, posted on 4th April 2026.

Next: HTTP vs MCP vs A2A vs AG-UI: The Four Protocols of AgentCore Runtime

Previous: AG-UI Protocol: The Missing Standard for AI Agent Interfaces