Akshay Parkhi's Weblog

Subscribe

AG-UI Protocol: The Missing Standard for AI Agent Interfaces

4th April 2026

If you’ve built applications with AI agents, you’ve hit this wall: every framework has its own way of streaming responses to the UI. LangChain uses callbacks and streaming iterators. CrewAI returns completed results. AutoGen has its own message protocol. Amazon Bedrock Agents uses a proprietary streaming format. OpenAI Assistants has yet another event structure.

Your frontend team writes custom parsing logic for each one. Switch frameworks? Rewrite the UI layer. Want to show tool calls in progress? Build custom event handling. Need the agent and UI to share state? Invent your own protocol.

AG-UI (Agent-User Interface) solves this. It’s an open protocol — think of it as HTTP for AI agent frontends. Any agent framework that speaks AG-UI can plug into any frontend that understands it, without custom glue code.

What is AG-UI?

AG-UI is a standardized event streaming protocol that defines how AI agents communicate with user interfaces in real-time. It was created by CopilotKit and has been adopted by AWS for AgentCore Runtime.

At its core, AG-UI defines:

  • A set of typed events that flow from agent to UI
  • Two transport mechanisms — SSE (Server-Sent Events) and WebSocket
  • Three interaction patterns — streaming text, tool visualization, and shared state
  • A request/response contract — RunAgentInput → stream of AguiEvent

The full set of event types:

Lifecycle:
  RUN_STARTED    → Agent begins processing
  RUN_FINISHED   → Agent completes
  RUN_ERROR      → Something went wrong

Text Streaming:
  TEXT_MESSAGE_START    → New text block begins
  TEXT_MESSAGE_CONTENT  → Delta text chunk
  TEXT_MESSAGE_END      → Text block complete

Tool Calls:
  TOOL_CALL_START  → Agent invokes a tool
  TOOL_CALL_ARGS   → Streaming tool arguments
  TOOL_CALL_END    → Tool execution complete

Shared State:
  STATE_SNAPSHOT  → Full state snapshot
  STATE_DELTA     → Incremental state patch (JSON)

Every event is a JSON object with a type field. No framework-specific wrappers, no proprietary encoding. Any language, any framework, any transport.

What We Built: A Collaborative Document Generator

To understand AG-UI deeply, we built a full-stack application on AWS AgentCore Runtime — a collaborative document generator where an AI agent co-authors documents with users in real-time.

┌──────────────────────────┐        ┌──────────────────────────────┐
│  CloudFront + S3         │        │  AgentCore Runtime           │
│  React SPA (TypeScript)  │◄──────►│  Strands Agent               │
│  • Streaming chat        │  AG-UI │  • research_topic tool       │
│  • Tool cards            │        │  • generate_outline tool     │
│  • Document preview      │        │  • update_document tool      │
│  • Confirm dialogs       │        │  • Port 8080 (/invocations   │
│                          │        │    /ws, /ping)               │
└──────────┬───────────────┘        └──────────────────────────────┘
           │ Auth (OAuth 2.0)
           ▼
┌──────────────────────────┐
│  Cognito User Pool       │
│  Access Token → client_id│
└──────────────────────────┘
ToolPurposeAG-UI Pattern
research_topicGathers informationTool Call Visualization — UI shows a card with 🔍 icon, args, and progress spinner
generate_outlineCreates document structureTool Call Visualization — UI shows 📋 card
update_documentWrites content sectionsShared State — live document preview updates in real-time

Pattern 1 — Streaming Text

The simplest pattern: the agent streams text character by character, just like a chat interface.

Wire format:

{"type":"TEXT_MESSAGE_START","messageId":"abc-123","role":"assistant"}
{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc-123","delta":"Hello"}
{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc-123","delta":"! I'm"}
{"type":"TEXT_MESSAGE_CONTENT","messageId":"abc-123","delta":" your assistant."}
{"type":"TEXT_MESSAGE_END","messageId":"abc-123"}

Frontend handler:

case "TEXT_MESSAGE_START":
  // Create a new empty message bubble
  setMessages(prev => [...prev, { id: msgId, role: "assistant", content: "" }]);

case "TEXT_MESSAGE_CONTENT":
  // Append delta — user sees characters appear
  currentContent += delta;
  updateLastMessage(currentContent);

case "TEXT_MESSAGE_END":
  // Message complete — re-enable input

Each TEXT_MESSAGE_CONTENT event carries a few words, arriving every ~40ms. Before AG-UI, you’d parse raw SSE data: lines, handle OpenAI’s [DONE] sentinel, deal with Bedrock’s contentBlockDelta format, or LangChain’s callback structure. AG-UI standardizes it — TEXT_MESSAGE_CONTENT with a delta field, always.

Pattern 2 — Tool Call Visualization

Most chat UIs hide tool calls — you see “thinking...” for 10 seconds, then the response. AG-UI makes tool calls visible and interactive.

Wire format:

{"type":"TOOL_CALL_START","toolCallId":"tc-1","toolCallName":"research_topic","parentMessageId":"msg-2"}
{"type":"TOOL_CALL_ARGS","toolCallId":"tc-1","delta":"{\"query\": \"cloud security\"}"}
{"type":"TOOL_CALL_END","toolCallId":"tc-1"}
{"type":"TOOL_CALL_RESULT","toolCallId":"tc-1","content":"{\"findings\": [...]}"}

What the UI renders:

┌─────────────────────────────────────────┐
│ 🔍 research_topic               ✓ done  │
│ query: cloud security                   │
└─────────────────────────────────────────┘

The card appears at TOOL_CALL_START with a spinner. Arguments stream in via TOOL_CALL_ARGS. At TOOL_CALL_END, the spinner becomes a checkmark. Users see exactly what the agent is doing and why a response took 15 seconds. This builds trust and makes the agent feel collaborative rather than opaque.

Pattern 3 — Shared State

This is AG-UI’s most powerful and least understood pattern. The agent and UI share a live data structure — in our case, the document being authored.

The flow:

  1. The frontend sends its current state in RunAgentInput.state
  2. The agent processes the request and calls update_document(title, sections, version)
  3. The ag-ui-strands library extracts document state from the tool arguments and emits a STATE_SNAPSHOT event
  4. The frontend receives the snapshot and renders the document

Wire format:

{
  "type": "STATE_SNAPSHOT",
  "snapshot": {
    "title": "Cloud Security: A Comprehensive Guide",
    "sections": [
      {
        "heading": "Introduction to Cloud Security",
        "body": "Cloud computing has revolutionized how organizations..."
      },
      {
        "heading": "Threat Landscape",
        "body": "Primary security threats include data breaches..."
      }
    ],
    "metadata": {
      "last_modified": "2026-04-03T22:33:21Z",
      "version": 1
    }
  }
}

Backend configuration:

ToolBehavior(
    state_from_args=lambda ctx: {
        "title": ctx.tool_input.get("title", ""),
        "sections": ctx.tool_input.get("sections", []),
        "metadata": {
            "last_modified": datetime.now(timezone.utc).isoformat(),
            "version": ctx.tool_input.get("version", 1),
        },
    },
    skip_messages_snapshot=True,  # Don't echo back message history
)

This is fundamentally different from “the agent returns a JSON blob.” The state is bidirectional — the frontend sends current state to the agent, the agent modifies it, the UI renders the update. This enables collaborative workflows where both human and AI contribute to a shared artifact: documents, spreadsheets, design tools, code editors, project plans.

SSE vs WebSocket: Measured Results

We deployed with both transports and ran Playwright tests to capture actual network behavior across two sequential messages (“Say hello” then “Say goodbye”).

SSE — 2 messages = 2 HTTP connections:

Total HTTP requests to AgentCore: 2
Total HTTP responses: 2

Request 1: POST /invocations (new TCP+TLS+HTTP connection)
  → Response: text/event-stream, 11 events streamed
  → Connection closes after RUN_FINISHED

Request 2: POST /invocations (new TCP+TLS+HTTP connection)
  → Response: text/event-stream, 13 events streamed
  → Connection closes after RUN_FINISHED

Each request carries ~2–5KB of headers, auth token, and the entire conversation history.

WebSocket — 2 messages = 1 persistent connection:

HTTP requests to AgentCore: 0      ← zero
WebSocket connections opened: 1    ← just one
WebSocket frames sent: 2           ← one per message
WebSocket frames received: 25      ← all events on same connection

Measured latency:

MetricSSEWebSocket
Message sent → first event received~5000ms22ms
Message 2 sent → first event received~5000ms21ms
Connection overhead per message~100–200ms (new TLS)0ms (already open)

The ~5000ms includes AgentCore cold start and Bedrock model inference. But the connection setup overhead is the key difference — SSE pays it every message, WebSocket pays it once.

MetricSSE (2 messages)WebSocket (2 messages)
TLS handshakes21
Auth tokens sent2 × ~800 bytes1 × ~800 bytes
Payload for message 2~2KB (full history)715 bytes (frame only)

When the difference matters:

  • Voice agents — Audio frames arrive at 16kHz (every 62.5ms). SSE’s per-request overhead adds unacceptable latency. WebSocket keeps round-trips under 25ms.
  • High-frequency interactions — If the agent needs user input mid-run (approvals, choices, corrections), WebSocket handles it on the same connection. SSE requires a new POST for each user response.
  • Mobile on poor networks — Each new TLS handshake on 3G adds 300–500ms. WebSocket’s single connection reduces radio wake-ups and battery drain.
  • Scale — 1000 concurrent users. SSE: potentially 2000+ in-flight HTTP connections. WebSocket: exactly 1000 persistent connections.

AgentCore’s WebSocket Implementation

Endpoint: wss://bedrock-agentcore.<region>.amazonaws.com/runtimes/<arn>/ws
Auth: OAuth 2.0 Bearer token via Sec-WebSocket-Protocol header
Session: X-Amzn-Bedrock-AgentCore-Runtime-Session-Id (query parameter)
Container: Must implement /ws endpoint on port 8080

The browser WebSocket API doesn’t support custom headers. AgentCore works around this using the subprotocol field:

// Base64url-encode the OAuth token
const base64url = btoa(token)
  .replace(/\+/g, "-")
  .replace(/\//g, "_")
  .replace(/=/g, "");

// Pass as WebSocket subprotocol
const ws = new WebSocket(wsUrl, [
  `base64UrlBearerAuthorization.${base64url}`,
  "base64UrlBearerAuthorization"
]);

AgentCore extracts the token from the Sec-WebSocket-Protocol header during the handshake and validates it against the configured JWT authorizer.

Production Lessons

1. Empty STATE_SNAPSHOT crashes React

The first STATE_SNAPSHOT event after RUN_STARTED carries an empty snapshot: {"type":"STATE_SNAPSHOT","snapshot":{}}. If your document renderer assumes state.sections is always an array, it crashes on .length of undefined.

if (!state || (!state.title && (!state.sections || state.sections.length === 0))) {
  return <EmptyState />;
}

2. Multiple TEXT_MESSAGE_START events per run

A Strands agent that calls tools emits multiple text segments in one run:

TEXT_MESSAGE_START #1 → "I'll research this for you..."
[tool calls happen]
TEXT_MESSAGE_START #2 → "Based on my research..."
[more tool calls]
TEXT_MESSAGE_START #3 → "Here's your completed document..."

If you create a new chat bubble per TEXT_MESSAGE_START, the user sees 3+ separate agent messages. Fix: collapse consecutive assistant message segments into one bubble.

3. RunAgentInput requires id on every message

Both UserMessage and AssistantMessage require an id field in the Pydantic model. If your frontend loses IDs during state updates, the second request fails with a 422 validation error.

# Backend safety net
for msg in body.get("messages", []):
    if "id" not in msg or not msg["id"]:
        msg["id"] = str(uuid.uuid4())

4. Cognito ID tokens vs Access tokens

AgentCore’s customJWTAuthorizer validates the client_id claim. Cognito ID tokens don’t have client_id — they have aud. Cognito Access tokens have client_id. You must use access tokens for AgentCore OAuth.

5. AgentCore session IDs must be ≥33 characters

The X-Amzn-Bedrock-AgentCore-Runtime-Session-Id header requires at least 33 characters. A standard uuid4() (36 chars) works, but shorter IDs fail with a validation error.

Minimal Implementation

Backend (Python + Strands):

from strands import Agent, tool
from strands.models.bedrock import BedrockModel
from ag_ui_strands import StrandsAgent, create_strands_app

@tool
def my_tool(query: str) -> str:
    """Does something useful."""
    return f"Result for {query}"

model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
agent = Agent(model=model, tools=[my_tool])

strands_agent = StrandsAgent(agent=agent, name="my-agent")
app = create_strands_app(strands_agent, path="/invocations", ping_path="/ping")

Frontend (TypeScript):

const response = await fetch("/invocations", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    threadId: "t1", runId: "r1", state: {},
    messages: [{ id: "m1", role: "user", content: "Hello" }],
    tools: [], context: [], forwardedProps: {}
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  for (const line of buffer.split("\n")) {
    if (line.startsWith("data: ")) {
      const event = JSON.parse(line.slice(6));

      switch (event.type) {
        case "TEXT_MESSAGE_CONTENT":
          appendToChat(event.delta);       // Streaming text
          break;
        case "TOOL_CALL_START":
          showToolCard(event.toolCallName); // Tool in progress
          break;
        case "STATE_SNAPSHOT":
          updateSharedState(event.snapshot); // Shared UI state
          break;
      }
    }
  }
}

About 40 lines for a complete AG-UI frontend. No SDK required — just parse JSON from an event stream.

The Ecosystem

Agent FrameworkAG-UI Adapter
Strands (AWS)ag-ui-strands
LangGraph (LangChain)ag-ui-langgraph
CrewAIag-ui-crewai
Mastraag-ui-mastra
AG2 (AutoGen)ag-ui-ag2
Any HTTP/WebSocket serverImplement the protocol directly

Frontend toolkits: @copilotkit/react-core provides pre-built hooks and components, @ag-ui/client provides a transport-agnostic JS client. Or parse the JSON events directly — the protocol is simple enough that a custom implementation takes an afternoon.

What Changes

Before AG-UI: your UI code was married to your agent framework. Custom streaming parsing for each one. Can’t swap LangChain for Strands without rewriting the frontend. Users saw “thinking...” with no insight into what the agent was actually doing. Communication was one-way — agent produces output, user reads it.

After AG-UI: any AG-UI agent works with any AG-UI frontend. TEXT_MESSAGE_CONTENT, TOOL_CALL_START, STATE_SNAPSHOT — the same events everywhere. Users see tool calls, progress, and state changes in real-time. Shared state enables human-AI co-creation rather than just Q&A.

We built a complete application — document generation with research, outlining, writing, real-time preview, and user confirmation — deployed on AWS AgentCore with Cognito auth, CloudFront hosting, and both SSE and WebSocket transports. The AG-UI protocol kept the frontend framework-agnostic: switching from Strands to LangGraph tomorrow would not require changing the React app.

Built with: AWS AgentCore Runtime, Strands Agents, ag-ui-strands, Claude Sonnet 4 on Bedrock, React 19, Vite, Cognito, S3, CloudFront. AG-UI Protocol: github.com/CopilotKit/ag-ui

This is AG-UI Protocol: The Missing Standard for AI Agent Interfaces by Akshay Parkhi, posted on 4th April 2026.

Previous: Does Claude Code Test Itself? Yes — Here's What's Actually in the Source