AG-UI Protocol: A Layer-by-Layer Deep Dive with Real Network Captures

4th April 2026

There’s a common misconception about AG-UI: people treat it as a transport protocol. It isn’t. AG-UI rides on top of HTTP and WebSocket — it doesn’t replace them. Understanding where each layer starts and stops is the key to debugging, optimizing, and building correctly with it.

┌─────────────────────────────────────────────────────┐
│  Application Layer                                  │
│  AG-UI Event Protocol                               │
│  (RUN_STARTED, TEXT_MESSAGE_*, TOOL_CALL_*,         │
│   STATE_SNAPSHOT)                                   │
├─────────────────────────────────────────────────────┤
│  Transport Layer                                    │
│  Option A: HTTP + SSE       Option B: WebSocket     │
│  POST /invocations          wss://.../ws            │
│  Content-Type:              Upgrade: websocket      │
│    text/event-stream                                │
├─────────────────────────────────────────────────────┤
│  Network Layer                                      │
│  TCP + TLS (both use the same thing)                │
└─────────────────────────────────────────────────────┘

AG-UI defines what is sent. HTTP and WebSocket define how it’s sent. Think of JSON vs HTTP — JSON is the data format, HTTP is the transport. You send JSON over HTTP. Similarly, AG-UI is an event protocol; SSE and WebSocket are two different transports that carry it.

To make this concrete: we ran Playwright tests with CDP (Chrome DevTools Protocol) against a live AgentCore deployment to capture actual packet-level data for both transports. Everything below comes from those captures.

Layer 1 — Network Transport

Both SSE and WebSocket use identical Layer 1 infrastructure:

Remote IP:    x.xx.xx.xxx:443   (AgentCore endpoint)
TLS:          TLS 1.3
Cipher:       AES_128_GCM
Certificate:  Amazon RSA 2048 M03
Protocol:     TCP → TLS → HTTP/2 (SSE)
              TCP → TLS → HTTP/1.1+Upgrade (WebSocket)

An observer watching the network sees no difference — both are encrypted TCP streams to port 443. Where they diverge is what happens after the handshake.

SSE connection lifecycle:

TCP SYN → SYN-ACK → ACK                  (3-way handshake)
TLS ClientHello → ServerHello → Finished  (TLS 1.3, 1-RTT)
HTTP/2 SETTINGS frame                     (HTTP/2 negotiation)
── connection ready ──
OPTIONS /invocations                      (CORS preflight)
POST /invocations                         (actual request)
← streaming response chunks               (events arrive)
── connection kept alive ──
POST /invocations                         (next message — NEW request on same TCP)
← streaming response

WebSocket connection lifecycle:

TCP SYN → SYN-ACK → ACK                  (same 3-way handshake)
TLS ClientHello → ServerHello → Finished  (same TLS 1.3)
GET /ws (Upgrade: websocket)              (HTTP upgrade request)
← 101 Switching Protocols                 (protocol switch — HTTP is done here)
── TCP connection is now WebSocket ──
→ frame (message 1)                       (raw WS frames)
← frame ← frame ← frame
→ frame (message 2)                       (same pipe, no setup overhead)
← frame ← frame ← frame
→ close frame
← close frame

The critical Layer 1 difference: after the initial handshake, SSE stays in HTTP mode — each new message is a full HTTP request/response cycle. WebSocket upgrades away from HTTP. The TCP connection becomes a raw frame-based pipe. No HTTP headers, no request/response semantics. Just frames flowing in both directions.

Layer 2 — Transport Framing

The same AG-UI event looks completely different at the wire level depending on which transport carries it.

SSE framing (from captured headers):

Before a single AG-UI event arrives, the browser sends:

POST /runtimes/arn%3Aaws%3A.../invocations?qualifier=DEFAULT HTTP/2
Host: bedrock-agentcore.us-east-1.amazonaws.com
Content-Type: application/json
Accept: text/event-stream, application/json
Authorization: Bearer eyJraWQiOiJCSFwvQjVEOVh...    ← 1,081 bytes
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id: 52ed4489-...
Origin: https://d3rpk5004rsri0.cloudfront.net
Sec-Fetch-Mode: cors
sec-ch-ua: "HeadlessChrome";v="147"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...

{"threadId":"t1","runId":"r1","state":{},"messages":[...]}    ← 430 bytes

Overhead per message before any event comes back: ~2,311 bytes (CORS preflight + HTTP headers + auth token + request body).

The response arrives as a text/event-stream, with each event formatted as:

data: {"type":"RUN_STARTED","threadId":"t1","runId":"r1"}\n\n
data: {"type":"TEXT_MESSAGE_START","messageId":"abc"}\n\n
data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":"Hi"}\n\n
data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":" there"}\n\n
data: {"type":"RUN_FINISHED","threadId":"t1","runId":"r1"}\n\n

SSE framing cost per event:

"data: "          = 6 bytes prefix
"{json payload}"  = variable
"\n\n"            = 2 bytes terminator
HTTP/2 DATA frame = 9 bytes header
                    ───────────────
                    17 bytes overhead per AG-UI event

WebSocket framing (from captured frames):

The browser sends one HTTP Upgrade request — this happens once, not per message:

GET /runtimes/arn%3A.../ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: qJSR4G+mpEAzrfElKVFhvA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: base64UrlBearerAuthorization.ZXlKcmFXUWl...[1461 chars]
Sec-WebSocket-Protocol: base64UrlBearerAuthorization

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Sec-WebSocket-Accept: YP1UCDyzHAuiDOCdM0TANqraFwU=
Sec-WebSocket-Protocol: base64UrlBearerAuthorization
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id: c056eb10-...

After 101, HTTP is gone. Subsequent frames captured from the session:

→ FRAME SEND (430 bytes, opcode=1)     RunAgentInput JSON
← FRAME RECV (158 bytes, opcode=1)     RUN_STARTED
← FRAME RECV (73 bytes,  opcode=1)     STATE_SNAPSHOT
← FRAME RECV (146 bytes, opcode=1)     TEXT_MESSAGE_START
← FRAME RECV (130 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: "Hi"
← FRAME RECV (134 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: " there"
← FRAME RECV (133 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: "! How"
← FRAME RECV (132 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: " are"
← FRAME RECV (133 bytes, opcode=1)     TEXT_MESSAGE_CONTENT: " you?"
← FRAME RECV (113 bytes, opcode=1)     TEXT_MESSAGE_END
← FRAME RECV (73 bytes,  opcode=1)     STATE_SNAPSHOT
← FRAME RECV (139 bytes, opcode=1)     RUN_FINISHED

WebSocket frame structure (RFC 6455):

┌─────┬─────┬──────────┬────────────────────────────┐
│ FIN │ RSV │ Opcode   │ Payload length             │
├─────┴─────┴──────────┴────────────────────────────┤
│ Masking key (4 bytes, client→server only)          │
├───────────────────────────────────────────────────┤
│ Payload data (the AG-UI JSON)                     │
└───────────────────────────────────────────────────┘

Overhead: 2 bytes per event (server→client)
          6 bytes per event (client→server)

Side-by-side for the same event — {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":"Hi"}:

SSE on the wire (152 bytes total):
┌─────────────────────────────────────────────┐
│ HTTP/2 DATA frame header       (9 bytes)    │ ← HTTP/2 framing
│ "data: "                       (6 bytes)    │ ← SSE prefix
│ {"type":"TEXT_MESSAGE_CONTENT",...}(129 bytes)│ ← AG-UI payload
│ "\n\n"                         (2 bytes)    │ ← SSE terminator
└─────────────────────────────────────────────┘
  Overhead: 17 bytes (13%)

WebSocket on the wire (132 bytes total):
┌─────────────────────────────────────────────┐
│ WS frame header                (2 bytes)    │ ← WS framing
│ {"type":"TEXT_MESSAGE_CONTENT",...}(130 bytes)│ ← AG-UI payload
└─────────────────────────────────────────────┘
  Overhead: 2 bytes (1.5%)

WebSocket has 8x less framing overhead per event. The bigger difference is at message boundaries — SSE sends 2,311 bytes of setup per message; WebSocket sends 436 bytes (the frame + payload) per message after the initial connection.

How both transports hand off to the same handler:

// SSE transport — strips "data: " prefix, parses JSON
for (const line of lines) {
  if (line.startsWith("data: ")) {
    const event: AguiEvent = JSON.parse(line.slice(6));  // strip SSE framing
    onEvent(event);  // ← same handler
  }
}

// WebSocket transport — parses JSON directly from frame
ws.onmessage = (ev) => {
  const event: AguiEvent = JSON.parse(ev.data);  // no framing to strip
  onEvent(event);  // ← same handler
};

The frontend’s onEvent function is identical for both transports. Layer 2 strips the framing; Layer 3 sees the same object either way.

Layer 3 — AG-UI Event Protocol

After stripping Layer 2 framing, both transports produce identical JSON objects. From the captured session:

Event #1:  {"type":"RUN_STARTED","threadId":"thread_2_1775335498802","runId":"run_3_..."}
Event #2:  {"type":"STATE_SNAPSHOT","snapshot":{}}
Event #3:  {"type":"TEXT_MESSAGE_START","messageId":"8bfc10b0-027e-...","role":"assistant"}
Event #4:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":"Hi"}
Event #5:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" there"}
Event #6:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":"! How"}
Event #7:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" are"}
Event #8:  {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" you?"}
Event #9:  {"type":"TEXT_MESSAGE_END","messageId":"8bfc10b0-027e-..."}
Event #10: {"type":"STATE_SNAPSHOT","snapshot":{}}
Event #11: {"type":"RUN_FINISHED","threadId":"thread_2_...","runId":"run_3_..."}

The AG-UI state machine:

                  ┌─────────────┐
                  │ RUN_STARTED │
                  └──────┬──────┘
                         │
                  ┌──────▼──────┐
           ┌─────▶│   RUNNING   │◀──────────────────────┐
           │      └──────┬──────┘                        │
           │             │                               │
           │      ┌──────▼──────────────┐                │
           │      │ TEXT_MESSAGE_START  │                │
           │      │ TEXT_MESSAGE_CONTENT│ (0..N times)   │
           │      │ TEXT_MESSAGE_END    │                │
           │      └──────┬─────────────┘                 │
           │             │                               │
           │      ┌──────▼──────────────┐                │
           │      │ TOOL_CALL_START     │                │
           │      │ TOOL_CALL_ARGS      │ (0..N times)   │
           │      │ TOOL_CALL_END       │                │
           │      │ TOOL_CALL_RESULT    │                │
           │      └──────┬─────────────┘                 │
           │             │                               │
           │      ┌──────▼──────┐                        │
           │      │STATE_SNAPSHOT│ (after state-changing │
           │      └──────┬──────┘  tool calls)           │
           └─────────────┘   (agent loops: think → tool → think)

                  ┌──────────────┐
                  │ RUN_FINISHED │  (or RUN_ERROR)
                  └──────────────┘

Ordering rules:

Every run starts with RUN_STARTED and ends with RUN_FINISHED or RUN_ERROR
TEXT_MESSAGE_CONTENT can only appear between TEXT_MESSAGE_START and TEXT_MESSAGE_END
TOOL_CALL_ARGS can only appear between TOOL_CALL_START and TOOL_CALL_END
STATE_SNAPSHOT can appear at any point — usually after a state-changing tool call
The agent can cycle through think → tool → think → tool multiple times before finishing
All events within a run share the same threadId and runId
messageId ties text events together; toolCallId ties tool events together

What each key field means:

RUN_STARTED {
  threadId: "thread_2_1775335498802"  // Conversation (survives across runs)
  runId:    "run_3_1775335498802"     // This single request/response only
}

TEXT_MESSAGE_START {
  messageId: "8bfc10b0-027e-..."      // Groups content deltas together
  role: "assistant"                    // Always "assistant" for agent output
}
TEXT_MESSAGE_CONTENT {
  messageId: "8bfc10b0-027e-..."      // Must match the START event
  delta: "Hi"                         // Incremental — NOT cumulative
}
// Concatenating all deltas: "Hi" + " there" + "! How" + " are" + " you?"
// → "Hi there! How are you?"

TOOL_CALL_START {
  toolCallId:     "tooluse_V0vFkv2N5..."  // Groups tool events together
  toolCallName:   "research_topic"         // Which tool the agent is calling
  parentMessageId: "ebf4d1dd-..."          // Links to the assistant message
}
TOOL_CALL_ARGS {
  toolCallId: "tooluse_V0vFkv2N5..."
  delta: '{"query": "cloud security"}'    // JSON args, may arrive in chunks
}

STATE_SNAPSHOT {
  snapshot: {                             // Complete replacement of shared state
    title: "Cloud Security Guide",        // Application-defined structure
    sections: [...],                      // (not prescribed by AG-UI)
    metadata: { version: 1 }
  }
}

The request contract — what the frontend sends:

RunAgentInput {
  threadId: string     // Identifies the conversation
  runId: string        // Identifies this specific run
  state: any           // Current shared state (sent to agent for context)
  messages: Message[]  // Full conversation history
    // Each: { id, role, content }
    // role: "user" | "assistant" | "tool" | "system"
    // "tool" messages carry results for client-side tools
  tools: Tool[]        // Client-side tool definitions
    // Proxy tools — agent calls them, frontend executes them
    // (e.g., confirmation dialogs, file pickers)
  context: Context[]   // Additional context (RAG results, etc.)
  forwardedProps: any  // Pass-through metadata
}

The state field is what makes bidirectional shared state work. Frontend sends current state → agent sees it → agent modifies it via tools → STATE_SNAPSHOT sends new state back → frontend renders it → next request sends the updated state again. A continuous loop.

The Complete Picture

Here is every byte exchanged for a single “Say hi in 5 words” message over SSE:

BROWSER                               AGENTCORE (x.xx.xx.xxx)
  │                                        │
  │──── TCP SYN ─────────────────────────▶│  Layer 1: TCP
  │◀─── TCP SYN-ACK ──────────────────────│
  │──── TCP ACK ─────────────────────────▶│
  │                                        │
  │──── TLS ClientHello (TLS 1.3) ───────▶│  Layer 1: TLS
  │◀─── TLS ServerHello + Cert ───────────│
  │──── TLS Finished ────────────────────▶│
  │                                        │
  │──── POST /invocations ───────────────▶│  Layer 2: HTTP/2 request
  │     Headers: 800 bytes                 │  (auth, content-type, session-id)
  │     Auth: 1081 bytes                   │
  │     Body: 430 bytes                    │  (RunAgentInput JSON)
  │                                        │
  │◀─── 200 text/event-stream ────────────│  Layer 2: HTTP/2 response headers
  │                                        │
  │◀─── "data: {RUN_STARTED}\n\n" ────────│  Layer 2+3: SSE frame + AG-UI event
  │◀─── "data: {STATE_SNAPSHOT}\n\n" ─────│  Layer 2+3
  │◀─── "data: {TEXT_MSG_START}\n\n" ─────│  Layer 2+3
  │◀─── "data: {TEXT_MSG_CONTENT}\n\n" ───│  Layer 2+3 (×5 chunks)
  │◀─── "data: {TEXT_MSG_END}\n\n" ───────│  Layer 2+3
  │◀─── "data: {STATE_SNAPSHOT}\n\n" ─────│  Layer 2+3
  │◀─── "data: {RUN_FINISHED}\n\n" ───────│  Layer 2+3
  │                                        │
  │──── (connection stays open) ──────────│  Layer 1: HTTP/2 keep-alive

The same message over WebSocket:

BROWSER                               AGENTCORE (x.xx.xx.xxx)
  │                                        │
  │──── TCP SYN ─────────────────────────▶│  Layer 1: TCP (same)
  │◀─── TCP SYN-ACK ──────────────────────│
  │──── TCP ACK ─────────────────────────▶│
  │                                        │
  │──── TLS ClientHello (TLS 1.3) ───────▶│  Layer 1: TLS (same)
  │◀─── TLS ServerHello + Cert ───────────│
  │──── TLS Finished ────────────────────▶│
  │                                        │
  │──── GET /ws (Upgrade: websocket) ────▶│  Layer 2: WS handshake
  │     Sec-WebSocket-Protocol: base64...  │  (auth baked into handshake)
  │◀─── 101 Switching Protocols ──────────│  HTTP is DONE here
  │                                        │
  │═══════════════ TCP is now WebSocket ══│
  │                                        │
  │──── [frame: RunAgentInput] ──────────▶│  Layer 2: 2+4+430 bytes
  │                                        │  NO HTTP headers
  │◀─── [frame: RUN_STARTED]    (158B) ───│  Layer 2+3
  │◀─── [frame: STATE_SNAPSHOT] (73B) ────│  Layer 2+3
  │◀─── [frame: TEXT_MSG_START] (146B) ───│  Layer 2+3
  │◀─── [frame: TEXT_MSG_CONTENT] (130B) ─│  Layer 2+3 (×5)
  │◀─── [frame: TEXT_MSG_END]   (113B) ───│  Layer 2+3
  │◀─── [frame: STATE_SNAPSHOT] (73B) ────│  Layer 2+3
  │◀─── [frame: RUN_FINISHED]   (139B) ───│  Layer 2+3
  │                                        │
  │══ connection open for message 2 ══════│  Layer 1: same TCP pipe
  │                                        │
  │──── [frame: RunAgentInput #2] ───────▶│  NO new TCP, TLS, HTTP, or auth
  │◀─── [frames: events...] ──────────────│  Just frames

What the Layers Mean in Practice

Most AG-UI debugging happens at exactly one of these layers. Knowing which layer the problem lives in tells you where to look.

Symptom	Layer	Where to look
Connection refused or TLS error	Layer 1	Network config, certificates, port 443 access
WebSocket 401 or auth failure	Layer 2	`Sec-WebSocket-Protocol` header — are you using access tokens, not ID tokens?
SSE events not arriving / hanging	Layer 2	Missing `Accept: text/event-stream` header; proxy buffering the response
Frontend crashes on empty state	Layer 3	First `STATE_SNAPSHOT` is always `{}` — guard optional fields
Multiple chat bubbles per run	Layer 3	Multiple `TEXT_MESSAGE_START` events are normal — collapse consecutive assistant messages
422 validation error on second message	Layer 3	Messages missing `id` field in `RunAgentInput`
High latency on every message	Layer 1+2	SSE pays TCP+TLS+HTTP per message; consider WebSocket for interactive sessions

One-liner summary: HTTP/WebSocket is the road. AG-UI is the language everyone speaks on it. Layer 1 is the asphalt. Layer 2 is whether you drive a car or a motorbike. Layer 3 is what you say when you get there.

Posted 4th April 2026 at 5:37 pm · Subscribe to my newsletter

Akshay Parkhi's Weblog