AG-UI Protocol: A Layer-by-Layer Deep Dive with Real Network Captures
4th April 2026
There’s a common misconception about AG-UI: people treat it as a transport protocol. It isn’t. AG-UI rides on top of HTTP and WebSocket — it doesn’t replace them. Understanding where each layer starts and stops is the key to debugging, optimizing, and building correctly with it.
┌─────────────────────────────────────────────────────┐
│ Application Layer │
│ AG-UI Event Protocol │
│ (RUN_STARTED, TEXT_MESSAGE_*, TOOL_CALL_*, │
│ STATE_SNAPSHOT) │
├─────────────────────────────────────────────────────┤
│ Transport Layer │
│ Option A: HTTP + SSE Option B: WebSocket │
│ POST /invocations wss://.../ws │
│ Content-Type: Upgrade: websocket │
│ text/event-stream │
├─────────────────────────────────────────────────────┤
│ Network Layer │
│ TCP + TLS (both use the same thing) │
└─────────────────────────────────────────────────────┘
AG-UI defines what is sent. HTTP and WebSocket define how it’s sent. Think of JSON vs HTTP — JSON is the data format, HTTP is the transport. You send JSON over HTTP. Similarly, AG-UI is an event protocol; SSE and WebSocket are two different transports that carry it.
To make this concrete: we ran Playwright tests with CDP (Chrome DevTools Protocol) against a live AgentCore deployment to capture actual packet-level data for both transports. Everything below comes from those captures.
Layer 1 — Network Transport
Both SSE and WebSocket use identical Layer 1 infrastructure:
Remote IP: x.xx.xx.xxx:443 (AgentCore endpoint)
TLS: TLS 1.3
Cipher: AES_128_GCM
Certificate: Amazon RSA 2048 M03
Protocol: TCP → TLS → HTTP/2 (SSE)
TCP → TLS → HTTP/1.1+Upgrade (WebSocket)
An observer watching the network sees no difference — both are encrypted TCP streams to port 443. Where they diverge is what happens after the handshake.
SSE connection lifecycle:
TCP SYN → SYN-ACK → ACK (3-way handshake)
TLS ClientHello → ServerHello → Finished (TLS 1.3, 1-RTT)
HTTP/2 SETTINGS frame (HTTP/2 negotiation)
── connection ready ──
OPTIONS /invocations (CORS preflight)
POST /invocations (actual request)
← streaming response chunks (events arrive)
── connection kept alive ──
POST /invocations (next message — NEW request on same TCP)
← streaming response
WebSocket connection lifecycle:
TCP SYN → SYN-ACK → ACK (same 3-way handshake)
TLS ClientHello → ServerHello → Finished (same TLS 1.3)
GET /ws (Upgrade: websocket) (HTTP upgrade request)
← 101 Switching Protocols (protocol switch — HTTP is done here)
── TCP connection is now WebSocket ──
→ frame (message 1) (raw WS frames)
← frame ← frame ← frame
→ frame (message 2) (same pipe, no setup overhead)
← frame ← frame ← frame
→ close frame
← close frame
The critical Layer 1 difference: after the initial handshake, SSE stays in HTTP mode — each new message is a full HTTP request/response cycle. WebSocket upgrades away from HTTP. The TCP connection becomes a raw frame-based pipe. No HTTP headers, no request/response semantics. Just frames flowing in both directions.
Layer 2 — Transport Framing
The same AG-UI event looks completely different at the wire level depending on which transport carries it.
SSE framing (from captured headers):
Before a single AG-UI event arrives, the browser sends:
POST /runtimes/arn%3Aaws%3A.../invocations?qualifier=DEFAULT HTTP/2
Host: bedrock-agentcore.us-east-1.amazonaws.com
Content-Type: application/json
Accept: text/event-stream, application/json
Authorization: Bearer eyJraWQiOiJCSFwvQjVEOVh... ← 1,081 bytes
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id: 52ed4489-...
Origin: https://d3rpk5004rsri0.cloudfront.net
Sec-Fetch-Mode: cors
sec-ch-ua: "HeadlessChrome";v="147"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...
{"threadId":"t1","runId":"r1","state":{},"messages":[...]} ← 430 bytes
Overhead per message before any event comes back: ~2,311 bytes (CORS preflight + HTTP headers + auth token + request body).
The response arrives as a text/event-stream, with each event formatted as:
data: {"type":"RUN_STARTED","threadId":"t1","runId":"r1"}\n\n
data: {"type":"TEXT_MESSAGE_START","messageId":"abc"}\n\n
data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":"Hi"}\n\n
data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":" there"}\n\n
data: {"type":"RUN_FINISHED","threadId":"t1","runId":"r1"}\n\n
SSE framing cost per event:
"data: " = 6 bytes prefix
"{json payload}" = variable
"\n\n" = 2 bytes terminator
HTTP/2 DATA frame = 9 bytes header
───────────────
17 bytes overhead per AG-UI event
WebSocket framing (from captured frames):
The browser sends one HTTP Upgrade request — this happens once, not per message:
GET /runtimes/arn%3A.../ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: qJSR4G+mpEAzrfElKVFhvA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: base64UrlBearerAuthorization.ZXlKcmFXUWl...[1461 chars]
Sec-WebSocket-Protocol: base64UrlBearerAuthorization
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Sec-WebSocket-Accept: YP1UCDyzHAuiDOCdM0TANqraFwU=
Sec-WebSocket-Protocol: base64UrlBearerAuthorization
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id: c056eb10-...
After 101, HTTP is gone. Subsequent frames captured from the session:
→ FRAME SEND (430 bytes, opcode=1) RunAgentInput JSON
← FRAME RECV (158 bytes, opcode=1) RUN_STARTED
← FRAME RECV (73 bytes, opcode=1) STATE_SNAPSHOT
← FRAME RECV (146 bytes, opcode=1) TEXT_MESSAGE_START
← FRAME RECV (130 bytes, opcode=1) TEXT_MESSAGE_CONTENT: "Hi"
← FRAME RECV (134 bytes, opcode=1) TEXT_MESSAGE_CONTENT: " there"
← FRAME RECV (133 bytes, opcode=1) TEXT_MESSAGE_CONTENT: "! How"
← FRAME RECV (132 bytes, opcode=1) TEXT_MESSAGE_CONTENT: " are"
← FRAME RECV (133 bytes, opcode=1) TEXT_MESSAGE_CONTENT: " you?"
← FRAME RECV (113 bytes, opcode=1) TEXT_MESSAGE_END
← FRAME RECV (73 bytes, opcode=1) STATE_SNAPSHOT
← FRAME RECV (139 bytes, opcode=1) RUN_FINISHED
WebSocket frame structure (RFC 6455):
┌─────┬─────┬──────────┬────────────────────────────┐
│ FIN │ RSV │ Opcode │ Payload length │
├─────┴─────┴──────────┴────────────────────────────┤
│ Masking key (4 bytes, client→server only) │
├───────────────────────────────────────────────────┤
│ Payload data (the AG-UI JSON) │
└───────────────────────────────────────────────────┘
Overhead: 2 bytes per event (server→client)
6 bytes per event (client→server)
Side-by-side for the same event — {"type":"TEXT_MESSAGE_CONTENT","messageId":"abc","delta":"Hi"}:
SSE on the wire (152 bytes total):
┌─────────────────────────────────────────────┐
│ HTTP/2 DATA frame header (9 bytes) │ ← HTTP/2 framing
│ "data: " (6 bytes) │ ← SSE prefix
│ {"type":"TEXT_MESSAGE_CONTENT",...}(129 bytes)│ ← AG-UI payload
│ "\n\n" (2 bytes) │ ← SSE terminator
└─────────────────────────────────────────────┘
Overhead: 17 bytes (13%)
WebSocket on the wire (132 bytes total):
┌─────────────────────────────────────────────┐
│ WS frame header (2 bytes) │ ← WS framing
│ {"type":"TEXT_MESSAGE_CONTENT",...}(130 bytes)│ ← AG-UI payload
└─────────────────────────────────────────────┘
Overhead: 2 bytes (1.5%)
WebSocket has 8x less framing overhead per event. The bigger difference is at message boundaries — SSE sends 2,311 bytes of setup per message; WebSocket sends 436 bytes (the frame + payload) per message after the initial connection.
How both transports hand off to the same handler:
// SSE transport — strips "data: " prefix, parses JSON
for (const line of lines) {
if (line.startsWith("data: ")) {
const event: AguiEvent = JSON.parse(line.slice(6)); // strip SSE framing
onEvent(event); // ← same handler
}
}
// WebSocket transport — parses JSON directly from frame
ws.onmessage = (ev) => {
const event: AguiEvent = JSON.parse(ev.data); // no framing to strip
onEvent(event); // ← same handler
};
The frontend’s onEvent function is identical for both transports. Layer 2 strips the framing; Layer 3 sees the same object either way.
Layer 3 — AG-UI Event Protocol
After stripping Layer 2 framing, both transports produce identical JSON objects. From the captured session:
Event #1: {"type":"RUN_STARTED","threadId":"thread_2_1775335498802","runId":"run_3_..."}
Event #2: {"type":"STATE_SNAPSHOT","snapshot":{}}
Event #3: {"type":"TEXT_MESSAGE_START","messageId":"8bfc10b0-027e-...","role":"assistant"}
Event #4: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":"Hi"}
Event #5: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" there"}
Event #6: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":"! How"}
Event #7: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" are"}
Event #8: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8bfc10b0-027e-...","delta":" you?"}
Event #9: {"type":"TEXT_MESSAGE_END","messageId":"8bfc10b0-027e-..."}
Event #10: {"type":"STATE_SNAPSHOT","snapshot":{}}
Event #11: {"type":"RUN_FINISHED","threadId":"thread_2_...","runId":"run_3_..."}
The AG-UI state machine:
┌─────────────┐
│ RUN_STARTED │
└──────┬──────┘
│
┌──────▼──────┐
┌─────▶│ RUNNING │◀──────────────────────┐
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────────────┐ │
│ │ TEXT_MESSAGE_START │ │
│ │ TEXT_MESSAGE_CONTENT│ (0..N times) │
│ │ TEXT_MESSAGE_END │ │
│ └──────┬─────────────┘ │
│ │ │
│ ┌──────▼──────────────┐ │
│ │ TOOL_CALL_START │ │
│ │ TOOL_CALL_ARGS │ (0..N times) │
│ │ TOOL_CALL_END │ │
│ │ TOOL_CALL_RESULT │ │
│ └──────┬─────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │STATE_SNAPSHOT│ (after state-changing │
│ └──────┬──────┘ tool calls) │
└─────────────┘ (agent loops: think → tool → think)
┌──────────────┐
│ RUN_FINISHED │ (or RUN_ERROR)
└──────────────┘
Ordering rules:
- Every run starts with
RUN_STARTEDand ends withRUN_FINISHEDorRUN_ERROR TEXT_MESSAGE_CONTENTcan only appear betweenTEXT_MESSAGE_STARTandTEXT_MESSAGE_ENDTOOL_CALL_ARGScan only appear betweenTOOL_CALL_STARTandTOOL_CALL_ENDSTATE_SNAPSHOTcan appear at any point — usually after a state-changing tool call- The agent can cycle through think → tool → think → tool multiple times before finishing
- All events within a run share the same
threadIdandrunId messageIdties text events together;toolCallIdties tool events together
What each key field means:
RUN_STARTED {
threadId: "thread_2_1775335498802" // Conversation (survives across runs)
runId: "run_3_1775335498802" // This single request/response only
}
TEXT_MESSAGE_START {
messageId: "8bfc10b0-027e-..." // Groups content deltas together
role: "assistant" // Always "assistant" for agent output
}
TEXT_MESSAGE_CONTENT {
messageId: "8bfc10b0-027e-..." // Must match the START event
delta: "Hi" // Incremental — NOT cumulative
}
// Concatenating all deltas: "Hi" + " there" + "! How" + " are" + " you?"
// → "Hi there! How are you?"
TOOL_CALL_START {
toolCallId: "tooluse_V0vFkv2N5..." // Groups tool events together
toolCallName: "research_topic" // Which tool the agent is calling
parentMessageId: "ebf4d1dd-..." // Links to the assistant message
}
TOOL_CALL_ARGS {
toolCallId: "tooluse_V0vFkv2N5..."
delta: '{"query": "cloud security"}' // JSON args, may arrive in chunks
}
STATE_SNAPSHOT {
snapshot: { // Complete replacement of shared state
title: "Cloud Security Guide", // Application-defined structure
sections: [...], // (not prescribed by AG-UI)
metadata: { version: 1 }
}
}
The request contract — what the frontend sends:
RunAgentInput {
threadId: string // Identifies the conversation
runId: string // Identifies this specific run
state: any // Current shared state (sent to agent for context)
messages: Message[] // Full conversation history
// Each: { id, role, content }
// role: "user" | "assistant" | "tool" | "system"
// "tool" messages carry results for client-side tools
tools: Tool[] // Client-side tool definitions
// Proxy tools — agent calls them, frontend executes them
// (e.g., confirmation dialogs, file pickers)
context: Context[] // Additional context (RAG results, etc.)
forwardedProps: any // Pass-through metadata
}
The state field is what makes bidirectional shared state work. Frontend sends current state → agent sees it → agent modifies it via tools → STATE_SNAPSHOT sends new state back → frontend renders it → next request sends the updated state again. A continuous loop.
The Complete Picture
Here is every byte exchanged for a single “Say hi in 5 words” message over SSE:
BROWSER AGENTCORE (x.xx.xx.xxx)
│ │
│──── TCP SYN ─────────────────────────▶│ Layer 1: TCP
│◀─── TCP SYN-ACK ──────────────────────│
│──── TCP ACK ─────────────────────────▶│
│ │
│──── TLS ClientHello (TLS 1.3) ───────▶│ Layer 1: TLS
│◀─── TLS ServerHello + Cert ───────────│
│──── TLS Finished ────────────────────▶│
│ │
│──── POST /invocations ───────────────▶│ Layer 2: HTTP/2 request
│ Headers: 800 bytes │ (auth, content-type, session-id)
│ Auth: 1081 bytes │
│ Body: 430 bytes │ (RunAgentInput JSON)
│ │
│◀─── 200 text/event-stream ────────────│ Layer 2: HTTP/2 response headers
│ │
│◀─── "data: {RUN_STARTED}\n\n" ────────│ Layer 2+3: SSE frame + AG-UI event
│◀─── "data: {STATE_SNAPSHOT}\n\n" ─────│ Layer 2+3
│◀─── "data: {TEXT_MSG_START}\n\n" ─────│ Layer 2+3
│◀─── "data: {TEXT_MSG_CONTENT}\n\n" ───│ Layer 2+3 (×5 chunks)
│◀─── "data: {TEXT_MSG_END}\n\n" ───────│ Layer 2+3
│◀─── "data: {STATE_SNAPSHOT}\n\n" ─────│ Layer 2+3
│◀─── "data: {RUN_FINISHED}\n\n" ───────│ Layer 2+3
│ │
│──── (connection stays open) ──────────│ Layer 1: HTTP/2 keep-alive
The same message over WebSocket:
BROWSER AGENTCORE (x.xx.xx.xxx)
│ │
│──── TCP SYN ─────────────────────────▶│ Layer 1: TCP (same)
│◀─── TCP SYN-ACK ──────────────────────│
│──── TCP ACK ─────────────────────────▶│
│ │
│──── TLS ClientHello (TLS 1.3) ───────▶│ Layer 1: TLS (same)
│◀─── TLS ServerHello + Cert ───────────│
│──── TLS Finished ────────────────────▶│
│ │
│──── GET /ws (Upgrade: websocket) ────▶│ Layer 2: WS handshake
│ Sec-WebSocket-Protocol: base64... │ (auth baked into handshake)
│◀─── 101 Switching Protocols ──────────│ HTTP is DONE here
│ │
│═══════════════ TCP is now WebSocket ══│
│ │
│──── [frame: RunAgentInput] ──────────▶│ Layer 2: 2+4+430 bytes
│ │ NO HTTP headers
│◀─── [frame: RUN_STARTED] (158B) ───│ Layer 2+3
│◀─── [frame: STATE_SNAPSHOT] (73B) ────│ Layer 2+3
│◀─── [frame: TEXT_MSG_START] (146B) ───│ Layer 2+3
│◀─── [frame: TEXT_MSG_CONTENT] (130B) ─│ Layer 2+3 (×5)
│◀─── [frame: TEXT_MSG_END] (113B) ───│ Layer 2+3
│◀─── [frame: STATE_SNAPSHOT] (73B) ────│ Layer 2+3
│◀─── [frame: RUN_FINISHED] (139B) ───│ Layer 2+3
│ │
│══ connection open for message 2 ══════│ Layer 1: same TCP pipe
│ │
│──── [frame: RunAgentInput #2] ───────▶│ NO new TCP, TLS, HTTP, or auth
│◀─── [frames: events...] ──────────────│ Just frames
What the Layers Mean in Practice
Most AG-UI debugging happens at exactly one of these layers. Knowing which layer the problem lives in tells you where to look.
| Symptom | Layer | Where to look |
|---|---|---|
| Connection refused or TLS error | Layer 1 | Network config, certificates, port 443 access |
| WebSocket 401 or auth failure | Layer 2 | Sec-WebSocket-Protocol header — are you using access tokens, not ID tokens? |
| SSE events not arriving / hanging | Layer 2 | Missing Accept: text/event-stream header; proxy buffering the response |
| Frontend crashes on empty state | Layer 3 | First STATE_SNAPSHOT is always {} — guard optional fields |
| Multiple chat bubbles per run | Layer 3 | Multiple TEXT_MESSAGE_START events are normal — collapse consecutive assistant messages |
| 422 validation error on second message | Layer 3 | Messages missing id field in RunAgentInput |
| High latency on every message | Layer 1+2 | SSE pays TCP+TLS+HTTP per message; consider WebSocket for interactive sessions |
One-liner summary: HTTP/WebSocket is the road. AG-UI is the language everyone speaks on it. Layer 1 is the asphalt. Layer 2 is whether you drive a car or a motorbike. Layer 3 is what you say when you get there.