Akshay Parkhi's Weblog

Subscribe

Beyond Tool Calling: A Practical Tour of Advanced MCP Concepts

9th April 2026

If you’ve used MCP for a few weeks, you already know the basics: a server exposes tools, resources, and prompts, and a client (usually an LLM-driven agent) calls them. That mental model gets you surprisingly far. But it also flattens MCP into “just tool calling,” and you start to wonder what makes the protocol interesting compared to a plain JSON-RPC schema.

The interesting stuff lives in the reverse channel — the things a server can ask the client to do while a tool is running. Once you internalize that MCP is bidirectional, a lot of patterns that felt awkward suddenly become natural: confirmations, summarization, progress bars, sandboxed file access, multi-step wizards.

This post is a tour of the advanced concepts: sampling, elicitation, notifications, roots, and transports.

The Mental Model: MCP Is Bidirectional

The single most important shift in thinking:

An MCP session is not a one-way RPC channel. It’s a long-lived bidirectional connection where the server can pause mid-execution and ask the client for things.

Most introductory material draws MCP like this:

Agent (client) ──tool call──▶ Server
Agent (client) ◀──result──── Server

The actual picture is:

Agent (client) ──tool call────────▶ Server
                                     │
                                     ├──▶ "log this"               (notification)
                                     ├──▶ "20% done"               (progress)
                                     ├──▶ "what dirs can I touch?" (roots)
                                     ├──▶ "ask the user X"         (elicitation)
                                     ├──▶ "ask your LLM Y"         (sampling)
                                     ▼
Agent (client) ◀───── result ────── Server

Each arrow from server back to client is a reverse request the client must be set up to handle. If the client doesn’t register a callback for sampling, a server that needs sampling will fail. If it doesn’t expose roots, a server that needs filesystem boundaries can’t enforce them. The capabilities the client advertises during initialization are a contract.

This is what makes MCP more than “just tool calling”: tools are stateless in plain RPC, but in MCP a tool can drive an entire interactive workflow without ever returning.

Sampling — Let the Server Borrow the Client’s LLM

The problem: A tool needs LLM intelligence to do its job — summarize a document, translate natural language into SQL, classify an input. The naive solution is to give the server its own Anthropic or OpenAI API key and call the model directly.

That’s wrong, for three reasons:

  1. Credentials sprawl. Every server now needs its own keys, billing, and rotation.
  2. Model coupling. The server bakes in a model choice; the user can’t pick.
  3. Trust boundary. The client (the user’s machine) is the one that owns the LLM relationship. The server is a third party.

The fix: Sampling inverts the call. The server says “I need an LLM completion. Here are the messages. Please run them through your model and send me the result.” The client executes the LLM call and sends the answer back. The server never touches a model API.

The server side:

from mcp.server.fastmcp import FastMCP, Context
from mcp.types import SamplingMessage, TextContent

mcp = FastMCP(name="Demo Server")

@mcp.tool()
async def summarize(text_to_summarize: str, ctx: Context):
    prompt = f"""
        Please summarize the following text:
        {text_to_summarize}
    """

    result = await ctx.session.create_message(
        messages=[
            SamplingMessage(
                role="user", content=TextContent(type="text", text=prompt)
            )
        ],
        max_tokens=4000,
        system_prompt="You are a helpful research assistant.",
    )

    if result.content.type == "text":
        return result.content.text

The key line is await ctx.session.create_message(...). That’s the server calling the client, not the other way around. From the server’s perspective it looks like a normal await — but under the hood the client is doing the heavy lifting.

The client side:

async def chat(input_messages: list[SamplingMessage], max_tokens=4000):
    messages = [...]  # convert to anthropic format
    response = await anthropic_client.messages.create(
        model=model,
        messages=messages,
        max_tokens=max_tokens,
    )
    return "".join(p.text for p in response.content if p.type == "text")

async def sampling_callback(context, params):
    text = await chat(params.messages)
    return CreateMessageResult(
        role="assistant",
        model=model,
        content=TextContent(type="text", text=text),
    )

async def run():
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(
            read, write, sampling_callback=sampling_callback
        ) as session:
            await session.initialize()
            result = await session.call_tool(
                name="summarize",
                arguments={"text_to_summarize": "lots of text"},
            )

When the client calls summarize, the server’s tool body invokes create_message. That triggers the sampling_callback on the client. The callback runs the actual Anthropic API call and returns the result. Only then does the original call_tool return.

When to reach for sampling:

  • Summarization of bulky tool results (don’t dump 10k rows into the agent’s context)
  • Natural-language to structured-input translation (NL filters → SQL where clauses)
  • Schema inference and design suggestions
  • Error explanation — turn cryptic stack traces into actionable text
  • Anomaly narratives — turn raw metrics into “your table has X small files, recommend compaction”
  • Anywhere your server wants to think without owning a model

Gotchas:

  • The client controls which model is used. Your server can hint (model_preferences) but not force.
  • Sampling adds latency — every sample call is a full LLM round-trip.
  • Recursion is real. A sampling call from inside a tool that the LLM called means: LLM → tool → LLM → back to tool → back to LLM. Token costs add up.
  • Not every client supports sampling. Always check capabilities before relying on it.

Elicitation — Let the Server Ask the User

The problem: Tools are usually one-shot: input → output. But real workflows hit moments where the server realizes it needs more information from the user, not the LLM. Examples:

  • A booking tool discovers it needs a passport number — and you don’t want the LLM to guess one.
  • A destructive operation needs explicit confirmation, and “the LLM said yes” is not consent.
  • An identifier is ambiguous and the server wants the user to pick from a list.
  • A multi-step wizard wants to walk the user through decisions.

The naive answers are awful: fail with an error, hallucinate a value, or stuff every possible field into the tool’s input schema and pray.

The fix: Elicitation is sampling’s twin. Same direction (server → client), different responder. Where sampling says “ask your LLM,” elicitation says “ask your user.” The server sends a JSON Schema describing the form it wants; the client renders it; the user fills it in; the typed values come back to the server.

@mcp.tool()
async def drop_table(table_name: str, ctx: Context):
    # Pause and ask the human directly — bypassing the LLM entirely
    result = await ctx.session.elicit(
        message=f"You are about to permanently drop '{table_name}'. Confirm?",
        requestedSchema={
            "type": "object",
            "properties": {
                "confirm_table_name": {
                    "type": "string",
                    "description": "Re-type the table name to confirm",
                },
                "delete_data_files": {
                    "type": "boolean",
                    "default": False,
                    "description": "Also delete underlying data files from S3?",
                },
                "i_understand": {
                    "type": "boolean",
                    "description": "I understand this is irreversible",
                },
            },
            "required": ["confirm_table_name", "i_understand"],
        },
    )

    if result.action != "accept":
        return "Cancelled by user."

    values = result.content
    if values["confirm_table_name"] != table_name:
        return "Table name mismatch — aborting."
    if not values["i_understand"]:
        return "Confirmation not granted."

    # ... actually drop the table

The crucial property: the LLM cannot fill out this form. Only the human can. The server gets a guarantee that a real user looked at the consequences and typed the table name themselves.

When to use elicitation:

ScenarioWhy elicitation fits
Destructive confirmationsLLM cannot fake intent
Disambiguating identifiersServer presents the actual options
Collecting credentials / secretsNever goes through the LLM context
Cost gates“This will scan 800 GB. Proceed?”
Multi-step wizardsServer drives the flow, asks per step
Optional advanced paramsDon’t bloat the tool schema; ask only when relevant

Elicitation + sampling, together. The two primitives compose beautifully. A canonical example for an Iceberg or data tool:

optimize_table(name)
  ├─ read metadata
  ├─ sampling: "given these stats, recommend a compaction strategy"
  ├─ elicitation: show strategy + cost → "run this? [yes/modify/cancel]"
  ├─ if yes: run compaction
  ├─ sampling: "summarize what changed in human terms"
  └─ return summary

One tool, two sampling calls (server borrowing the LLM), one elicitation (server asking the user). The agent driving the session sees a single clean tool call and a tidy result. All the messy interactivity happens inside the tool.

This is the unlock: agentic, multi-turn behavior inside a single tool call, without the LLM having to choreograph it.

Notifications — Logging and Progress

Tools that take real time (downloads, conversions, queries) need to communicate progress. Without it the user sees a hung terminal. MCP gives servers two notification types: logging messages and progress reports.

The server side:

@mcp.tool()
async def add(a: int, b: int, ctx: Context) -> int:
    await ctx.info("Preparing to add...")
    await ctx.report_progress(20, 100)

    await asyncio.sleep(2)

    await ctx.info("OK, adding...")
    await ctx.report_progress(80, 100)

    return a + b

Two flavors:

  • ctx.info(...) (and ctx.debug, ctx.warning, ctx.err) → log notifications, surfaced to a logging callback
  • ctx.report_progress(current, total) → progress notifications, surfaced to a progress callback

The client side:

async def logging_callback(params: LoggingMessageNotificationParams):
    print(params.data)

async def print_progress_callback(progress, total, message):
    if total is not None:
        percentage = (progress / total) * 100
        print(f"Progress: {progress}/{total} ({percentage:.1f}%)")
    else:
        print(f"Progress: {progress}")

async def run():
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(
            read, write, logging_callback=logging_callback
        ) as session:
            await session.initialize()
            await session.call_tool(
                name="add",
                arguments={"a": 1, "b": 3},
                progress_callback=print_progress_callback,
            )

Two callbacks, registered in different places:

  • logging_callback → on the session, because logs can come from any server-side activity
  • progress_callback → on the specific call, because progress is scoped to the in-flight tool invocation

Notifications turn long-running tools from black boxes into observable processes. Even better, they let an agent surface meaningful intermediate state to the user — “downloading file 3 of 12” — without having to invent a polling protocol. For LLM agents specifically, notifications are how a server can leak hints to the client UI (not the model context) about what’s happening. The model sees the final result; the user sees a live stream.

Roots — Sandboxing the Server’s Filesystem

The problem: Filesystem-touching tools are dangerous. A convert_video tool that takes an arbitrary path will happily read ~/.ssh/id_rsa if the LLM says so. You want the server to be physically incapable of touching anything outside an explicit allow-list.

The fix: Roots are directories the client declares as accessible. The server can ask “what roots do I have?” via ctx.session.list_roots() and gate every filesystem operation accordingly.

Server side:

async def is_path_allowed(requested_path: Path, ctx: Context) -> bool:
    roots_result = await ctx.session.list_roots()
    client_roots = roots_result.roots

    if not requested_path.exists():
        return False
    if requested_path.is_file():
        requested_path = requested_path.parent

    for root in client_roots:
        root_path = file_url_to_path(root.uri)
        try:
            requested_path.relative_to(root_path)
            return True
        except ValueError:
            continue
    return False

@mcp.tool()
async def convert_video(input_path: str, format: str, *, ctx: Context):
    """Convert an MP4 video file to another format using ffmpeg"""
    input_file = VideoConverter.validate_input(input_path)
    if not await is_path_allowed(input_file, ctx):
        raise ValueError(f"Access to path is not allowed: {input_path}")
    return await VideoConverter.convert(input_path, format)

Every filesystem-touching tool calls is_path_allowed. The LLM has no way around it: even if it passes /etc/passwd, the server refuses.

Client side:

def _create_roots(self, root_paths: list[str]) -> list[Root]:
    roots = []
    for path in root_paths:
        p = Path(path).resolve()
        file_url = FileUrl(f"file://{p}")
        roots.append(Root(uri=file_url, name=p.name or "Root"))
    return roots

async def _handle_list_roots(self, context):
    return ListRootsResult(roots=self._roots)

async def connect(self):
    # ...
    self._session = await self._exit_stack.enter_async_context(
        ClientSession(
            _stdio,
            _write,
            list_roots_callback=self._handle_list_roots if self._roots else None,
        )
    )

The client constructs its own list of roots from the user’s config and registers a list_roots_callback. When the server asks, the client answers with whatever the user authorized — not whatever the server requested.

Clean separation of concerns: the server enforces, the client authorizes, the user decides. The LLM doesn’t enter the trust loop at all.

Roots vs. just validating paths server-side: Why not hardcode allowed paths in the server? Two reasons. First, the user shouldn’t need to edit server code to add a directory — roots make it config. Second, different sessions should have different access — roots are per-session; hardcoding isn’t.

Transports — stdio vs HTTP

TransportUse when
stdioLocal servers, agent spawns the server process, simplest possible setup. What uv run server.py does.
streamable HTTPRemote servers, browser clients, multiple concurrent users, network boundaries

stdio is the default for local development. HTTP is the production deployment story.

The HTTP server:

mcp = FastMCP(
    "mcp-server",
    stateless_http=True,
    json_response=True,
)

@mcp.tool()
async def add(a: int, b: int, ctx: Context) -> int:
    await ctx.info("Preparing to add...")
    await asyncio.sleep(2)
    await ctx.report_progress(80, 100)
    return a + b

app = mcp.streamable_http_app()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
    expose_headers=["mcp-session-id"],
)

uvicorn.run(app, host="127.0.0.1", port=8000)

A few things worth flagging:

  1. stateless_http=True — each request is independent; the server doesn’t keep session state in memory. Good for horizontally scaled deployments.
  2. json_response=True — responses come back as plain JSON instead of an SSE stream. Easier for ad-hoc browser clients; loses streaming.
  3. CORS middleware is mandatory for browser clients. Without it, the browser preflight OPTIONS /mcp/ returns 405 Method Not Allowed and you spend an hour confused. We learned this the hard way.
  4. expose_headers=["mcp-session-id"] — the session id rides in a custom header; the browser can’t read it without an explicit expose.

Once the server speaks HTTP, it stops being a local-only toy. You can host it behind an API gateway, put it on Lambda or Cloud Run, have a web UI talk to it directly, or multiplex many clients onto one server. The flip side: HTTP brings auth, CORS, rate limiting, observability — all the production concerns the stdio model lets you defer. Choose deliberately.

Putting It All Together

A complete example: a Claude-powered CLI chat agent that talks to a document MCP server. It exercises the three core primitives in a single session:

  • Toolsread_doc, edit_doc (model-controlled, called by Claude)
  • Resourcesdocs://documents, docs://documents/{id} (app-controlled, used for @mention autocomplete and context injection)
  • Promptsformat (user-controlled, triggered with a / slash command)

The decision tree. The cleanest mental model is the “primitive choice” decision tree:

NeedUse
Give the model a new capabilityTool
Populate UI or inject contextResource
Predefined user-triggered workflowPrompt
Server asks the user somethingElicitation
Server thinks with the user’s LLMSampling
Report progress on a long taskNotifications
Gate filesystem accessRoots

If you’re unsure which primitive to use, run through this list. Every real decision falls cleanly into one slot.

How @ mentions work: @ mentions are resources injected as context. The client extracts mentions, fetches the matching documents via MCP resources, and wraps them in <document> blocks before sending to Claude. Claude never sees the @ syntax doing anything magical — it just sees document content as context.

async def _extract_resources(self, query: str) -> str:
    mentions = [word[1:] for word in query.split() if word.startswith("@")]
    doc_ids = await self.list_docs_ids()  # MCP resource
    mentioned_docs = []
    for doc_id in doc_ids:
        if doc_id in mentions:
            content = await self.get_doc_content(doc_id)
            mentioned_docs.append((doc_id, content))
    return "".join(
        f'\n<document id="{doc_id}">\n{content}\n</document>\n'
        for doc_id, content in mentioned_docs
    )

How / commands work: / commands map to prompts. They run a server-defined message workflow that becomes the next turn in the conversation.

async def _process_command(self, query: str) -> bool:
    if not query.startswith("/"):
        return False
    words = query.split()
    command = words[0].replace("/", "")
    messages = await self.doc_client.get_prompt(command, {"doc_id": words[1]})
    self.messages += convert_prompt_messages_to_message_params(messages)
    return True

This is the textbook example of why MCP has three primitives instead of one: the same project naturally needs all three, and squashing them into “just tools” would force the LLM to do work the application should do.

A Practical Design Checklist

When you sit down to design an MCP server for a real domain (Iceberg + AWS, GitHub, your internal data platform), walk through this:

  • Granularity. Are your tools shaped like user intents or like API endpoints? Aim for intents. Five intent-shaped tools beat fifty API-shaped ones.
  • Idempotency. Classify each tool: read-only, reversible, destructive. Destructive tools always elicit confirmation.
  • Auth boundary. Where do credentials live? Never in the LLM context. Use elicitation if they need to be collected from the user.
  • Output size. Are any results big enough to blow the agent’s context window? Use sampling to summarize, return resources for the full payload.
  • Error surface. Are errors actionable to the LLM? If not, rewrite them — and consider sampling to translate cryptic infra errors into useful guidance.
  • Notifications. Does the tool take more than a second? Add report_progress. Does it have meaningful intermediate state? Add info logs.
  • Roots. Does the tool touch the filesystem? Gate every path through a list_roots check.
  • Transport. Local-only? stdio. Browser or remote? streamable HTTP, with CORS configured.
  • Description quality. Tool descriptions are prompts. Write them assuming the reader has never heard of your domain.
  • Dry-run. Mutating tools should accept a dry_run flag.
  • Observability. Log every call with inputs, outputs, latency, and (if you can) cost.

The Big Picture

The reason MCP is more than “RPC for LLMs” is that it explicitly models the bidirectional nature of agentic workflows:

  • Tools, resources, prompts = client → server. The agent uses the server.
  • Sampling, elicitation, notifications, roots = server → client. The server uses the agent and the user.

A server that only exposes tools is fine. A server that uses sampling to think, elicitation to ask, notifications to communicate, and roots to enforce safety is agentic in its own right — it can drive multi-step workflows from a single tool call and never lose the human in the loop.

The deeper you go, the more MCP starts to feel less like “an API spec for tools” and more like “a collaboration protocol between a server, an LLM, and a human.” That’s the headline. Once you see it, you stop writing 1:1 wrappers and start designing tools that carry intent — and your agents get dramatically better as a result.

This is Beyond Tool Calling: A Practical Tour of Advanced MCP Concepts by Akshay Parkhi, posted on 9th April 2026.

Previous: AgentCore Auth from First Principles: How JWT Flows from Browser to Agent Container