AG-UI Protocol: The Missing Standard for AI Agent Interfaces
4th April 2026
If you’ve built applications with AI agents, you’ve hit this wall: every framework has its own way of streaming responses to the UI. LangChain uses callbacks and streaming iterators. CrewAI returns completed results. AutoGen has its own message protocol. Amazon Bedrock Agents uses a proprietary streaming format. OpenAI Assistants has yet another event structure.
Your frontend team writes custom parsing logic for each one. Switch frameworks? Rewrite the UI layer. Want to show tool calls in progress? Build custom event handling. Need the agent and UI to share state? Invent your own protocol.
AG-UI (Agent-User Interface) solves this. It’s an open protocol — think of it as HTTP for AI agent frontends. Any agent framework that speaks AG-UI can plug into any frontend that understands it, without custom glue code.
What is AG-UI?
AG-UI is a standardized event streaming protocol that defines how AI agents communicate with user interfaces in real-time. It was created by CopilotKit and has been adopted by AWS for AgentCore Runtime.
At its core, AG-UI defines:
- A set of typed events that flow from agent to UI
- Two transport mechanisms — SSE (Server-Sent Events) and WebSocket
- Three interaction patterns — streaming text, tool visualization, and shared state
- A request/response contract —
RunAgentInput→ stream ofAguiEvent
The full set of event types:
Lifecycle:
RUN_STARTED → Agent begins processing
RUN_FINISHED → Agent completes
RUN_ERROR → Something went wrong
Text Streaming:
TEXT_MESSAGE_START → New text block begins
TEXT_MESSAGE_CONTENT → Delta text chunk
TEXT_MESSAGE_END → Text block complete
Tool Calls:
TOOL_CALL_START → Agent invokes a tool
TOOL_CALL_ARGS → Streaming tool arguments
TOOL_CALL_END → Tool execution complete
Shared State:
STATE_SNAPSHOT → Full state snapshot
STATE_DELTA → Incremental state patch (JSON)
Every event is a JSON object with a type field. No framework-specific wrappers, no proprietary encoding. Any language, any framework, any transport.
What We Built: A Collaborative Document Generator
To understand AG-UI deeply, we built a full-stack application on AWS AgentCore Runtime — a collaborative document generator where an AI agent co-authors documents with users in real-time.
┌──────────────────────────┐ ┌──────────────────────────────┐
│ CloudFront + S3 │ │ AgentCore Runtime │
│ React SPA (TypeScript) │◄──────►│ Strands Agent │
│ • Streaming chat │ AG-UI │ • research_topic tool │
│ • Tool cards │ │ • generate_outline tool │
│ • Document preview │ │ • update_document tool │
│ • Confirm dialogs │ │ • Port 8080 (/invocations │
│ │ │ /ws, /ping) │
└──────────┬───────────────┘ └──────────────────────────────┘
│ Auth (OAuth 2.0)
▼
┌──────────────────────────┐
│ Cognito User Pool │
│ Access Token → client_id│
└──────────────────────────┘
| Tool | Purpose | AG-UI Pattern |
|---|---|---|
research_topic | Gathers information | Tool Call Visualization — UI shows a card with 🔍 icon, args, and progress spinner |
generate_outline | Creates document structure | Tool Call Visualization — UI shows 📋 card |
update_document | Writes content sections | Shared State — live document preview updates in real-time |
Pattern 1 — Streaming Text
The simplest pattern: the agent streams text character by character, just like a chat interface.
Wire format:
{"type":"TEXT_MESSAGE_START","messageId":"abc-123","role":"assistant"}
{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc-123","delta":"Hello"}
{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc-123","delta":"! I'm"}
{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc-123","delta":" your assistant."}
{"type":"TEXT_MESSAGE_END","messageId":"abc-123"}
Frontend handler:
case "TEXT_MESSAGE_START":
// Create a new empty message bubble
setMessages(prev => [...prev, { id: msgId, role: "assistant", content: "" }]);
case "TEXT_MESSAGE_CONTENT":
// Append delta — user sees characters appear
currentContent += delta;
updateLastMessage(currentContent);
case "TEXT_MESSAGE_END":
// Message complete — re-enable input
Each TEXT_MESSAGE_CONTENT event carries a few words, arriving every ~40ms. Before AG-UI, you’d parse raw SSE data: lines, handle OpenAI’s [DONE] sentinel, deal with Bedrock’s contentBlockDelta format, or LangChain’s callback structure. AG-UI standardizes it — TEXT_MESSAGE_CONTENT with a delta field, always.
Pattern 2 — Tool Call Visualization
Most chat UIs hide tool calls — you see “thinking...” for 10 seconds, then the response. AG-UI makes tool calls visible and interactive.
Wire format:
{"type":"TOOL_CALL_START","toolCallId":"tc-1","toolCallName":"research_topic","parentMessageId":"msg-2"}
{"type":"TOOL_CALL_ARGS","toolCallId":"tc-1","delta":"{\"query\": \"cloud security\"}"}
{"type":"TOOL_CALL_END","toolCallId":"tc-1"}
{"type":"TOOL_CALL_RESULT","toolCallId":"tc-1","content":"{\"findings\": [...]}"}
What the UI renders:
┌─────────────────────────────────────────┐
│ 🔍 research_topic ✓ done │
│ query: cloud security │
└─────────────────────────────────────────┘
The card appears at TOOL_CALL_START with a spinner. Arguments stream in via TOOL_CALL_ARGS. At TOOL_CALL_END, the spinner becomes a checkmark. Users see exactly what the agent is doing and why a response took 15 seconds. This builds trust and makes the agent feel collaborative rather than opaque.
Pattern 3 — Shared State
This is AG-UI’s most powerful and least understood pattern. The agent and UI share a live data structure — in our case, the document being authored.
The flow:
- The frontend sends its current state in
RunAgentInput.state - The agent processes the request and calls
update_document(title, sections, version) - The
ag-ui-strandslibrary extracts document state from the tool arguments and emits aSTATE_SNAPSHOTevent - The frontend receives the snapshot and renders the document
Wire format:
{
"type": "STATE_SNAPSHOT",
"snapshot": {
"title": "Cloud Security: A Comprehensive Guide",
"sections": [
{
"heading": "Introduction to Cloud Security",
"body": "Cloud computing has revolutionized how organizations..."
},
{
"heading": "Threat Landscape",
"body": "Primary security threats include data breaches..."
}
],
"metadata": {
"last_modified": "2026-04-03T22:33:21Z",
"version": 1
}
}
}
Backend configuration:
ToolBehavior(
state_from_args=lambda ctx: {
"title": ctx.tool_input.get("title", ""),
"sections": ctx.tool_input.get("sections", []),
"metadata": {
"last_modified": datetime.now(timezone.utc).isoformat(),
"version": ctx.tool_input.get("version", 1),
},
},
skip_messages_snapshot=True, # Don't echo back message history
)
This is fundamentally different from “the agent returns a JSON blob.” The state is bidirectional — the frontend sends current state to the agent, the agent modifies it, the UI renders the update. This enables collaborative workflows where both human and AI contribute to a shared artifact: documents, spreadsheets, design tools, code editors, project plans.
SSE vs WebSocket: Measured Results
We deployed with both transports and ran Playwright tests to capture actual network behavior across two sequential messages (“Say hello” then “Say goodbye”).
SSE — 2 messages = 2 HTTP connections:
Total HTTP requests to AgentCore: 2
Total HTTP responses: 2
Request 1: POST /invocations (new TCP+TLS+HTTP connection)
→ Response: text/event-stream, 11 events streamed
→ Connection closes after RUN_FINISHED
Request 2: POST /invocations (new TCP+TLS+HTTP connection)
→ Response: text/event-stream, 13 events streamed
→ Connection closes after RUN_FINISHED
Each request carries ~2–5KB of headers, auth token, and the entire conversation history.
WebSocket — 2 messages = 1 persistent connection:
HTTP requests to AgentCore: 0 ← zero
WebSocket connections opened: 1 ← just one
WebSocket frames sent: 2 ← one per message
WebSocket frames received: 25 ← all events on same connection
Measured latency:
| Metric | SSE | WebSocket |
|---|---|---|
| Message sent → first event received | ~5000ms | 22ms |
| Message 2 sent → first event received | ~5000ms | 21ms |
| Connection overhead per message | ~100–200ms (new TLS) | 0ms (already open) |
The ~5000ms includes AgentCore cold start and Bedrock model inference. But the connection setup overhead is the key difference — SSE pays it every message, WebSocket pays it once.
| Metric | SSE (2 messages) | WebSocket (2 messages) |
|---|---|---|
| TLS handshakes | 2 | 1 |
| Auth tokens sent | 2 × ~800 bytes | 1 × ~800 bytes |
| Payload for message 2 | ~2KB (full history) | 715 bytes (frame only) |
When the difference matters:
- Voice agents — Audio frames arrive at 16kHz (every 62.5ms). SSE’s per-request overhead adds unacceptable latency. WebSocket keeps round-trips under 25ms.
- High-frequency interactions — If the agent needs user input mid-run (approvals, choices, corrections), WebSocket handles it on the same connection. SSE requires a new POST for each user response.
- Mobile on poor networks — Each new TLS handshake on 3G adds 300–500ms. WebSocket’s single connection reduces radio wake-ups and battery drain.
- Scale — 1000 concurrent users. SSE: potentially 2000+ in-flight HTTP connections. WebSocket: exactly 1000 persistent connections.
AgentCore’s WebSocket Implementation
Endpoint: wss://bedrock-agentcore.<region>.amazonaws.com/runtimes/<arn>/ws
Auth: OAuth 2.0 Bearer token via Sec-WebSocket-Protocol header
Session: X-Amzn-Bedrock-AgentCore-Runtime-Session-Id (query parameter)
Container: Must implement /ws endpoint on port 8080
The browser WebSocket API doesn’t support custom headers. AgentCore works around this using the subprotocol field:
// Base64url-encode the OAuth token
const base64url = btoa(token)
.replace(/\+/g, "-")
.replace(/\//g, "_")
.replace(/=/g, "");
// Pass as WebSocket subprotocol
const ws = new WebSocket(wsUrl, [
`base64UrlBearerAuthorization.${base64url}`,
"base64UrlBearerAuthorization"
]);
AgentCore extracts the token from the Sec-WebSocket-Protocol header during the handshake and validates it against the configured JWT authorizer.
Production Lessons
1. Empty STATE_SNAPSHOT crashes React
The first STATE_SNAPSHOT event after RUN_STARTED carries an empty snapshot: {"type":"STATE_SNAPSHOT","snapshot":{}}. If your document renderer assumes state.sections is always an array, it crashes on .length of undefined.
if (!state || (!state.title && (!state.sections || state.sections.length === 0))) {
return <EmptyState />;
}
2. Multiple TEXT_MESSAGE_START events per run
A Strands agent that calls tools emits multiple text segments in one run:
TEXT_MESSAGE_START #1 → "I'll research this for you..."
[tool calls happen]
TEXT_MESSAGE_START #2 → "Based on my research..."
[more tool calls]
TEXT_MESSAGE_START #3 → "Here's your completed document..."
If you create a new chat bubble per TEXT_MESSAGE_START, the user sees 3+ separate agent messages. Fix: collapse consecutive assistant message segments into one bubble.
3. RunAgentInput requires id on every message
Both UserMessage and AssistantMessage require an id field in the Pydantic model. If your frontend loses IDs during state updates, the second request fails with a 422 validation error.
# Backend safety net
for msg in body.get("messages", []):
if "id" not in msg or not msg["id"]:
msg["id"] = str(uuid.uuid4())
4. Cognito ID tokens vs Access tokens
AgentCore’s customJWTAuthorizer validates the client_id claim. Cognito ID tokens don’t have client_id — they have aud. Cognito Access tokens have client_id. You must use access tokens for AgentCore OAuth.
5. AgentCore session IDs must be ≥33 characters
The X-Amzn-Bedrock-AgentCore-Runtime-Session-Id header requires at least 33 characters. A standard uuid4() (36 chars) works, but shorter IDs fail with a validation error.
Minimal Implementation
Backend (Python + Strands):
from strands import Agent, tool
from strands.models.bedrock import BedrockModel
from ag_ui_strands import StrandsAgent, create_strands_app
@tool
def my_tool(query: str) -> str:
"""Does something useful."""
return f"Result for {query}"
model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
agent = Agent(model=model, tools=[my_tool])
strands_agent = StrandsAgent(agent=agent, name="my-agent")
app = create_strands_app(strands_agent, path="/invocations", ping_path="/ping")
Frontend (TypeScript):
const response = await fetch("/invocations", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
threadId: "t1", runId: "r1", state: {},
messages: [{ id: "m1", role: "user", content: "Hello" }],
tools: [], context: [], forwardedProps: {}
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
for (const line of buffer.split("\n")) {
if (line.startsWith("data: ")) {
const event = JSON.parse(line.slice(6));
switch (event.type) {
case "TEXT_MESSAGE_CONTENT":
appendToChat(event.delta); // Streaming text
break;
case "TOOL_CALL_START":
showToolCard(event.toolCallName); // Tool in progress
break;
case "STATE_SNAPSHOT":
updateSharedState(event.snapshot); // Shared UI state
break;
}
}
}
}
About 40 lines for a complete AG-UI frontend. No SDK required — just parse JSON from an event stream.
The Ecosystem
| Agent Framework | AG-UI Adapter |
|---|---|
| Strands (AWS) | ag-ui-strands |
| LangGraph (LangChain) | ag-ui-langgraph |
| CrewAI | ag-ui-crewai |
| Mastra | ag-ui-mastra |
| AG2 (AutoGen) | ag-ui-ag2 |
| Any HTTP/WebSocket server | Implement the protocol directly |
Frontend toolkits: @copilotkit/react-core provides pre-built hooks and components, @ag-ui/client provides a transport-agnostic JS client. Or parse the JSON events directly — the protocol is simple enough that a custom implementation takes an afternoon.
What Changes
Before AG-UI: your UI code was married to your agent framework. Custom streaming parsing for each one. Can’t swap LangChain for Strands without rewriting the frontend. Users saw “thinking...” with no insight into what the agent was actually doing. Communication was one-way — agent produces output, user reads it.
After AG-UI: any AG-UI agent works with any AG-UI frontend. TEXT_MESSAGE_CONTENT, TOOL_CALL_START, STATE_SNAPSHOT — the same events everywhere. Users see tool calls, progress, and state changes in real-time. Shared state enables human-AI co-creation rather than just Q&A.
We built a complete application — document generation with research, outlining, writing, real-time preview, and user confirmation — deployed on AWS AgentCore with Cognito auth, CloudFront hosting, and both SSE and WebSocket transports. The AG-UI protocol kept the frontend framework-agnostic: switching from Strands to LangGraph tomorrow would not require changing the React app.
Built with: AWS AgentCore Runtime, Strands Agents, ag-ui-strands, Claude Sonnet 4 on Bedrock, React 19, Vite, Cognito, S3, CloudFront. AG-UI Protocol: github.com/CopilotKit/ag-ui
More recent articles
- Does Claude Code Test Itself? Yes — Here's What's Actually in the Source - 31st March 2026
- Claude Code's Design Philosophy: 10 Patterns to use for Your Agent Systems - 31st March 2026