MCP Apps Explained: How AI Agent Shows Live Widgets Inside the Chat

23rd April 2026

I built a greeting card generator and got confused. The AI agent showed a real card with buttons inside the chat, and I couldn’t figure out why. Here’s what I learned — explained the way I wish someone had explained it to me.

Start with what you already know

When you ask an AI agent a question, it sends back text. That’s it. Text.

You: “Roll three dice for me.”
Agent: “You rolled 4, 2, and 6.”

Text works fine for simple answers. But what if you wanted the dice to actually tumble? Or a real calendar to pick a date from? Or a chart you could click?

Text can describe these things. It can’t be them.

That’s the gap MCP Apps fill. They let your server send back a small, live webpage — not a description of one — that appears right inside the chat.

The mental model: a tiny webpage inside the chat

Imagine the agent’s chat window has a hole in it. Your MCP server sends back a little webpage that slots into that hole. The webpage has buttons, colors, animations — anything a normal webpage can do. The user can click it. It can talk back to your server. All without leaving the chat.

┌─────────────────────────────────────────────┐
│  AI Agent                                   │
│                                             │
│  You: "Make a greeting card for Sarah"      │
│  Agent: Here you go!                        │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │  🌙  Dear Sarah,                    │    │  ← your webpage
│  │      Happy Birthday                 │    │    lives here
│  │   [✨ Show available themes]        │    │
│  └─────────────────────────────────────┘    │
│                                             │
└─────────────────────────────────────────────┘

That little box is the MCP App. Your server built the HTML. The agent put it on screen. The user clicks buttons inside it.

Why not just send a link to a webpage?

Fair question. You could tell the user “go to mycardapp.com/sarah” and let them build it there. Why go through all this trouble?

Four reasons:

The user stays put. No new tab. No lost context. The card is right next to the conversation that asked for it.
Your app can talk to the agent. Click a button, and your webpage can call back to your server and get fresh data — no API of your own needed.
Your app can use the agent’s other tools. If the user has connected Gmail and Slack to the agent, your app can ask the agent to send an email or post a message. You didn’t build those integrations. The agent already has them.
It’s safe. Your webpage runs in a locked box. It can’t steal cookies, read other tabs, or do anything sneaky. Even if your server is evil, the box keeps things contained.

What’s actually different from a regular MCP tool?

A regular MCP tool looks like this:

@mcp.tool()
def create_card(name, message, theme):
    return {"name": name, "message": message, "color": "blue"}

The agent calls it, gets the dictionary back, and writes some text about it.

An MCP App tool looks almost identical. You just add one line:

@mcp.tool(meta={"ui": {"resourceUri": "ui://my-card/view.html"}})
def create_card(name, message, theme):
    return {"name": name, "message": message, "color": "blue"}

That one extra line — meta={"ui": {"resourceUri": "..."}} — is the whole trick. It tells the agent: “when you call this tool, don’t just narrate the result. Also load this HTML page and show it to the user.”

The ui://my-card/view.html string isn’t a real URL. It’s just a name — like a filename. It tells the agent which HTML page to grab from your server.

Where does the HTML come from?

From your server, alongside the tool. You register it like this:

@mcp.resource(
    "ui://my-card/view.html",
    mime_type="text/html;profile=mcp-app"   # this tells the agent: it's an App page
)
def view():
    return "...your full webpage..."

So your server now has two things:

A tool that returns data (name, message, colors).
A resource that returns HTML (the page that displays the data).

The tool says “when you call me, also grab the page at this name.” The resource says “here’s the page at that name.” The agent connects them.

How it all flows — step by step

Let’s trace what happens when you ask the agent to make a card:

  1. You type:     "Make a card for Sarah"
                         ↓
  2. Agent's LLM:  Decides to call create_card(name="Sarah")
                         ↓
  3. Your server:  Runs the function, returns:
                   {name: "Sarah", colors: {...}}
                         ↓
  4. Agent:        Sees the special "ui.resourceUri" field.
                   Asks your server: "give me the HTML page
                   called ui://my-card/view.html"
                         ↓
  5. Your server:  Returns the full HTML as a string
                         ↓
  6. Agent:        Drops that HTML into a little box in the chat
                         ↓
  7. The HTML:     Loads, reads the data (Sarah, colors),
                   draws the card
                         ↓
  8. You:          See a pretty card appear in the chat

Once the card is on screen, the agent’s job is basically done. The card is a live webpage now, running on its own.

The “talk back” part: buttons that do things

Here’s where it gets powerful. The card has a button: Show available themes. Click it, and somehow the card calls your server and shows “ocean · sunset · forest · midnight.”

How? Through the agent. The card can’t reach your server directly — it’s locked in a box, remember? But it can ask the agent to do things on its behalf.

  1. User clicks the button
                ↓
  2. The card says to the agent:
     "Hey, can you call the list_themes tool for me?"
                ↓
  3. Agent calls list_themes() on your server
                ↓
  4. Server returns: ["ocean", "sunset", "forest", "midnight"]
                ↓
  5. Agent hands the result back to the card
                ↓
  6. The card updates — shows the themes

The agent is the middleman. This is the safety part. Your webpage doesn’t get direct internet access. It asks the agent, and the agent decides whether to allow it.

What the code actually looks like

Your server is a normal Python file. About 30 lines for something real:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Greeting Card Server", stateless_http=True)

THEMES = {
    "ocean":    {"bg": "#0f4c75", "accent": "#1b6ca8", "emoji": "🌊"},
    "sunset":   {"bg": "#c0392b", "accent": "#e74c3c", "emoji": "🌅"},
    "midnight": {"bg": "#1a1a2e", "accent": "#7c3aed", "emoji": "🌙"},
}

# Tool that the user triggers
@mcp.tool(meta={"ui": {"resourceUri": "ui://greeting-card/view.html"}})
def create_card(name: str, message: str, theme: str = "ocean"):
    return {"name": name, "message": message, "colors": THEMES[theme]}

# Tool that the UI button calls
@mcp.tool()
def list_themes():
    return list(THEMES.keys())

# The webpage itself
@mcp.resource("ui://greeting-card/view.html",
              mime_type="text/html;profile=mcp-app")
def view():
    return HTML_PAGE   # the full HTML string

That’s the entire server. Two tools and one webpage.

What the webpage looks like

The HTML is just a normal webpage, with one small addition: it loads a tiny SDK that handles talking to the agent for you.

<script type="module">
  import { App } from "https://unpkg.com/@modelcontextprotocol/ext-apps@0.4.0/app-with-deps";

  const app = new App({ name: "Greeting Card", version: "1.0.0" });

  // When the agent hands us the card data, draw the card
  app.ontoolresult = ({ content }) => {
    const data = JSON.parse(content[0].text);
    drawCard(data);          // your own function
  };

  // When the user clicks the button, ask the agent to call our server
  async function showThemes() {
    const result = await app.callServerTool("list_themes", {});
    // ...update the card with the themes
  }

  // Say hello to the agent (handshake)
  await app.connect();
</script>

Three things to remember:

What	When
`app.connect()`	Call once, when the page loads. This is the handshake.
`app.ontoolresult`	Runs when the agent pushes fresh data to your page.
`app.callServerTool()`	You call this when the user clicks something.

That’s the whole SDK for most apps. Three methods.

What’s an iframe, really?

The “little box in the chat” I keep mentioning is technically called an iframe. It’s a web feature that’s been around forever — it lets one webpage contain another webpage inside it, like a window into a different house.

In HTML it’s just one tag:

<iframe srcdoc="...your entire HTML here..."></iframe>

The magic is that iframes are isolated by default. The outer page (the agent’s chat UI) can’t peek at what the inner page (your app) is doing, and the inner page can’t peek at the outer page. They can only talk through a specific messaging channel (called postMessage). The SDK above uses that channel for you.

This isolation is why AI agents can safely run code from strangers. Your server could be run by anyone — the agent doesn’t have to trust you. The box keeps everyone honest.

Testing it with Claude

To let the agent talk to your server on your laptop, you need to make your laptop reachable from the internet. The easiest way is a tunnel:

# Terminal 1: start your server
uv run server.py

# Terminal 2: open a tunnel to it
cloudflared tunnel --url http://localhost:3002
# → gives you a URL like https://abc-xyz.trycloudflare.com

Then in Claude, go to Settings → Connectors → Add custom connector, paste the URL (with /mcp on the end), and save. You’ll need a paid Claude plan for this — custom connectors aren’t on the free tier.

One heads-up: the Python FastMCP library checks the Host header for security and rejects anything that isn’t localhost. Cloudflare’s tunnel changes the header to its own domain, which fails this check. You’ll see a “couldn’t reach server” error. The fix is a short middleware that rewrites the header back to localhost before it reaches the MCP code. Annoying but quick.

Where this actually matters

For a fun side project like a greeting card, MCP Apps are cute. Where they get serious is when text answers genuinely aren’t enough:

If someone asks…	Text can only say…	An MCP App can show…
“Show me sales by region”	A list of numbers	A clickable map you can drill into
“Review this PDF”	A description of the PDF	The actual PDF with zoom and pan
“Help me configure my deploy”	20 back-and-forth questions	A single form with all the options
“Show me the system status”	A snapshot in words	A live dashboard that keeps updating
“Compare these two files”	A wall of + and—lines	A side-by-side diff viewer
“Pick a color”	“How about #3498db?”	An actual color picker
“Generate a QR code”	A description of a QR code	The actual scannable image

The rule of thumb: if the answer is something the user reads, text is fine. If the answer is something the user interacts with, you want an MCP App.

The hidden superpower: letting the agent do your work for you

Here’s the part most people miss on the first pass.

Your app can ask the agent to use other tools the user has connected. Say a user has hooked up Gmail, Slack, and Stripe to their agent. Your simple expense-approval app can put a button on screen that triggers:

User clicks [Approve Expense]
        ↓
Your app tells the agent: "approve this and notify the team"
        ↓
The agent does it all:
  • Charges the card     (via the user's Stripe connection)
  • Emails the requester (via the user's Gmail)
  • Posts to #expenses   (via the user's Slack)
        ↓
You didn't write a single line of integration code.

Your little app just borrowed Gmail, Slack, and Stripe from the user. You didn’t build them. You didn’t store any tokens. The agent orchestrated it all.

A traditional web app would need OAuth flows for each service, token storage, API libraries for each vendor, and a backend to coordinate them. With MCP Apps, you just ask.

When not to use MCP Apps

Don’t build an MCP App just because you can. Some questions really are just text questions. “What’s 5 plus 5?” doesn’t need a calculator widget. “What’s the capital of France?” doesn’t need a map.

The complexity is worth it when the answer is something users need to do, not just read. When they need to compare, click, filter, fill in, or watch it update. If none of that applies, plain text wins.

Where it runs today

MCP Apps currently work in Claude (web), Claude Desktop, VS Code’s GitHub Copilot, Goose, Postman, and MCPJam. The official SDK (@modelcontextprotocol/ext-apps) has starter templates for React, Vue, Svelte, Preact, Solid, and plain JavaScript. The Python approach shown here isn’t officially supported yet, but it works — I’ve tested it end to end.

The examples repo on GitHub has working demos for PDFs, 3D globes, budget sliders, QR codes, system monitors, and more. Each one is a good starting point if you want to see what the pattern looks like in practice.

The short version

A regular MCP tool sends the agent some text. The agent reads it out loud to you.

An MCP App sends the agent some text and a small webpage. The agent reads the text out loud, and shows the webpage in a little box inside the chat. The webpage can have buttons. When you click them, the webpage can ask the agent to call tools on your server, or even on other servers you’ve connected. Nothing leaves the chat window.

That’s it. Everything else is just details.

References

MCP Apps overview: https://modelcontextprotocol.io/extensions/apps/overview
Build an MCP App (official guide): https://modelcontextprotocol.io/extensions/apps/build

Posted 23rd April 2026 at 3:48 pm

Akshay Parkhi's Weblog