Akshay Parkhi's Weblog

Subscribe

Inside an AgentCore microVM — Ports, Cold Starts, and the Sidecar Pattern

12th March 2026

When you deploy an agent on Amazon Bedrock AgentCore Runtime, your Docker container runs inside a Firecracker microVM. But what actually happens inside that microVM? Here’s the complete picture — what boots, what listens on which port, why there’s a non-root user, and exactly what determines a cold start vs a warm start.

What’s Inside the microVM — Three HTTP Servers

When AgentCore boots your microVM, three separate processes start listening on three different ports:

┌────────────────────────────────────────────────────────────────────┐
│  INSIDE THE FIRECRACKER microVM                                    │
│                                                                    │
│  PORT 8080 — YOUR APP (Starlette/Uvicorn)                          │
│    ├── POST /invocations  ← your agent handles requests here       │
│    ├── GET  /ping         ← AgentCore health checks                │
│    └── WS   /ws           ← WebSocket support                     │
│                                                                    │
│  PORT 9000 — AGENTCORE SIDECAR (injected by AgentCore)             │
│    ├── Receives requests from AgentCore control plane              │
│    ├── Forwards to your app on :8080                               │
│    ├── Manages session lifecycle                                   │
│    ├── Handles auth tokens (AgentCore Identity)                    │
│    └── Reports health back to control plane                        │
│                                                                    │
│  PORT 8000 — OPENTELEMETRY COLLECTOR (auto-instrumentation)        │
│    ├── Collects spans from your agent's LLM calls                  │
│    ├── Collects tool execution metrics                             │
│    └── Ships to CloudWatch (AgentCore Observability)               │
└────────────────────────────────────────────────────────────────────┘

You write the code that runs on port 8080. The sidecar on 9000 and the OTel collector on 8000 are injected by AgentCore — you don’t write or manage them.

The Dockerfile — What Gets Deployed

A typical AgentCore Dockerfile looks like this:

FROM python:3.13-slim-bookworm

# Install dependencies
RUN pip install strands-agents bedrock-agentcore boto3
RUN pip install aws-opentelemetry-distro

# Create non-root user
RUN useradd -m -u 1000 bedrock_agentcore
USER bedrock_agentcore

EXPOSE 9000 8000 8080

CMD ["opentelemetry-instrument", "python", "-m", "your_agent_module"]

The CMD line is important — opentelemetry-instrument wraps your Python process and auto-instruments all HTTP requests, boto3 calls, and function calls marked with spans. This is how metrics appear in CloudWatch under the bedrock-agentcore namespace without you writing any instrumentation code.

Why bedrock_agentcore User? Defense in Depth

The Dockerfile creates a non-root user (uid=1000) and switches to it. This is one layer in AgentCore’s security stack:

┌──────────────────────────────────────────────────────────┐
│  SECURITY: Defense in Depth                               │
│                                                           │
│  Layer 1: Firecracker microVM (hardware isolation via KVM)│
│  Layer 2: Jailer (chroot + cgroups + seccomp filters)     │
│  Layer 3: Non-root user (bedrock_agentcore, uid=1000)     │
│                                                           │
│  As root:                                                 │
│    - Can read /etc/shadow                                 │
│    - Can modify system binaries                           │
│    - Can bind to privileged ports (<1024)                 │
│    - Can access /proc, /sys for host info                 │
│                                                           │
│  As bedrock_agentcore (uid=1000):                         │
│    - Can only read/write /app and /home/bedrock_agentcore │
│    - Cannot modify system files                           │
│    - Cannot bind to port 80/443                           │
│    - Limited /proc access                                 │
│                                                           │
│  That's why ports are 8000, 8080, 9000 — all > 1024      │
│  Non-root users CAN'T bind to ports below 1024           │
└──────────────────────────────────────────────────────────┘

Even if an LLM hallucinates a malicious tool call that escapes the process, it’s running as a non-root user inside a microVM with seccomp filters. Three layers would need to be breached simultaneously.

Request Flow — From Your API Call to Your Agent

You call: invoke_agent_runtime(session_id, payload)
  │
  ▼
AgentCore Control Plane → routes to correct microVM
  │
  ▼
Port 9000 (sidecar inside microVM)
  │  Adds headers: X-Session-Id, X-Request-Id, X-Access-Token
  │
  ▼
Port 8080 (your Starlette app)
  │  POST /invocations with JSON payload
  │
  ▼
@app.entrypoint → your_handler(payload)
  │  agent(prompt) → LLM + tools → response
  │
  ▼
Response streams back: 8080 → 9000 → AgentCore → you

Meanwhile, port 8000 (OTel collector) captures:
  - LLM latency, token counts
  - Tool execution durations
  - gen_ai.client.token.usage metrics
  → Ships to CloudWatch / X-Ray

The sidecar on port 9000 exists so your app doesn’t need to handle session management, auth token injection, or health reporting. It’s the bridge between AgentCore’s control plane and your code.

Cold Start vs Warm Start — The Complete Picture

The rule is simple: does a microVM for this session ID already exist and is it alive?

ScenarioResultWhy
First request with session-ACOLDNo microVM exists, must boot one
Second request with same session-A (within timeout)WARMmicroVM still running, reuse it
Request with new session-BCOLDDifferent session = always new microVM
Request with session-A after timeout expiredCOLDmicroVM was terminated, boots fresh

Cold Start — What Actually Happens

invoke_agent_runtime(session_id="new-session")
  │
  ▼
AgentCore: "new-session" not found
  │
  ├── 1. Jailer creates jail + cgroups
  ├── 2. Firecracker process starts
  ├── 3. Linux kernel boots inside microVM
  ├── 4. Container image loaded
  ├── 5. CMD runs:
  │      opentelemetry-instrument python -m your_agent
  │      ├── OTel collector starts on :8000
  │      ├── Sidecar starts on :9000
  │      ├── Python imports strands, boto3
  │      ├── Agent() initializes model connection
  │      └── Uvicorn starts on :8080
  ├── 6. Sidecar pings :8080/ping → HEALTHY
  ├── 7. Sidecar forwards request to :8080/invocations
  └── 8. Agent processes prompt → response streams back

TOTAL: ~3.4s (steps 1-6 are the cold start penalty ~0.8s)
       (steps 7-8 are agent processing ~2.5s)

Warm Start — What Gets Skipped

invoke_agent_runtime(session_id="existing-session")
  │
  ▼
AgentCore: "existing-session" found → route to existing microVM
  │
  ├── Sidecar on :9000 receives request
  ├── Forwards to :8080/invocations
  │   Python already running. Agent already initialized.
  │   No boot. No imports. No init.
  ├── Agent processes prompt (LLM + tools)
  └── Response streams back

TOTAL: ~2.5s (saved ~0.9s of boot + init)
Idle timer RESETS → microVM stays alive

The warm start saves the entire boot sequence — Firecracker, kernel, Python imports, agent initialization. Everything is already in memory from the previous request.

Session ID Is Everything

The session ID is the key that maps to a microVM. Here’s how it plays out in practice:

# Request 1: session-A → COLD START (new microVM boots)
agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=agent_arn,
    runtimeSessionId="session-A",
    payload=json.dumps({"prompt": "My name is Anuja"})
)

# Request 2: same session-A → WARM START (same microVM, instant)
# The agent REMEMBERS "Anuja" — state lives in memory
agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=agent_arn,
    runtimeSessionId="session-A",
    payload=json.dumps({"prompt": "What's my name?"})
)
# Response: "Anuja!" — no database lookup, no serialization

# Request 3: session-B → COLD START (completely new microVM)
# This microVM has NO knowledge of session-A
agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=agent_arn,
    runtimeSessionId="session-B",
    payload=json.dumps({"prompt": "What's my name?"})
)
# Response: "I don't know your name" — different microVM, different memory

Each session ID gets its own isolated microVM with its own kernel, memory, filesystem, and Python process. There is no shared state between sessions.

Pre-Warming — Paying Cold Start Cost Early

Since AgentCore has no provisioned concurrency, you can pre-warm by invoking sessions before users arrive:

WITHOUT pre-warming:
  User A arrives → session-A → COLD (microVM boots ~0.8s penalty)
  User A again   → session-A → WARM (same microVM)

WITH pre-warming:
  7:00 AM: invoke(session-001, "ping") → COLD (boots microVM)
           invoke(session-002, "ping") → COLD (boots microVM)
           invoke(session-003, "ping") → COLD (boots microVM)

           Now 3 microVMs are alive and idle.

  9:00 AM: User A arrives
           Assign User A → session-001
           invoke(session-001, prompt)  → WARM (microVM already running)

Pre-warming = paying the cold start cost BEFORE users arrive
so that when users arrive, they get warm starts.

Cost: you pay for idle microVM time (8 GB RAM each)
Benefit: zero cold start penalty for your users

The OpenTelemetry Auto-Instrumentation

The CMD wraps your Python process with opentelemetry-instrument:

CMD ["opentelemetry-instrument", "python", "-m", "your_agent"]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^
     This wrapper auto-instruments:
       - boto3 HTTP requests → Bedrock API latency
       - All function calls marked with spans
       - gen_ai.client.token.usage metrics
       - strands.event_loop.cycle_duration metrics

Your agent code
  │
  │ (auto-instrumented by OTel)
  ▼
localhost:8000 (OTel collector inside microVM)
  │
  │ (exports metrics/traces)
  ▼
CloudWatch / X-Ray

You don’t write any instrumentation code. The metrics and traces appear in CloudWatch automatically because the OTel wrapper intercepts all outgoing HTTP calls and records timing, status codes, and token counts.

Why Three Ports Instead of One?

Separation of concerns:

PortOwnerPurposeYou Control It?
8080Your appAgent logic, request handlingYes
9000AgentCore sidecarSession management, auth, routingNo
8000OTel collectorMetrics, traces, observabilityNo

The sidecar pattern means your agent code stays clean — you write a request handler and return a response. Session lifecycle, authentication, health reporting, and observability are handled by the two processes you didn’t write. All three run inside the same Firecracker microVM, sharing the 2 vCPU and 8 GB RAM allocation.

References

This is Inside an AgentCore microVM — Ports, Cold Starts, and the Sidecar Pattern by Akshay Parkhi, posted on 12th March 2026.

Next: What Actually Happens When You Call invoke_agent_runtime()

Previous: AgentCore Runtime vs Lambda — Scaling, Warm Pools, and Why Fixed 8 GB Boxes Exist