Inside an AgentCore microVM — Ports, Cold Starts, and the Sidecar Pattern
12th March 2026
When you deploy an agent on Amazon Bedrock AgentCore Runtime, your Docker container runs inside a Firecracker microVM. But what actually happens inside that microVM? Here’s the complete picture — what boots, what listens on which port, why there’s a non-root user, and exactly what determines a cold start vs a warm start.
What’s Inside the microVM — Three HTTP Servers
When AgentCore boots your microVM, three separate processes start listening on three different ports:
┌────────────────────────────────────────────────────────────────────┐
│ INSIDE THE FIRECRACKER microVM │
│ │
│ PORT 8080 — YOUR APP (Starlette/Uvicorn) │
│ ├── POST /invocations ← your agent handles requests here │
│ ├── GET /ping ← AgentCore health checks │
│ └── WS /ws ← WebSocket support │
│ │
│ PORT 9000 — AGENTCORE SIDECAR (injected by AgentCore) │
│ ├── Receives requests from AgentCore control plane │
│ ├── Forwards to your app on :8080 │
│ ├── Manages session lifecycle │
│ ├── Handles auth tokens (AgentCore Identity) │
│ └── Reports health back to control plane │
│ │
│ PORT 8000 — OPENTELEMETRY COLLECTOR (auto-instrumentation) │
│ ├── Collects spans from your agent's LLM calls │
│ ├── Collects tool execution metrics │
│ └── Ships to CloudWatch (AgentCore Observability) │
└────────────────────────────────────────────────────────────────────┘
You write the code that runs on port 8080. The sidecar on 9000 and the OTel collector on 8000 are injected by AgentCore — you don’t write or manage them.
The Dockerfile — What Gets Deployed
A typical AgentCore Dockerfile looks like this:
FROM python:3.13-slim-bookworm
# Install dependencies
RUN pip install strands-agents bedrock-agentcore boto3
RUN pip install aws-opentelemetry-distro
# Create non-root user
RUN useradd -m -u 1000 bedrock_agentcore
USER bedrock_agentcore
EXPOSE 9000 8000 8080
CMD ["opentelemetry-instrument", "python", "-m", "your_agent_module"]
The CMD line is important — opentelemetry-instrument wraps your Python process and auto-instruments all HTTP requests, boto3 calls, and function calls marked with spans. This is how metrics appear in CloudWatch under the bedrock-agentcore namespace without you writing any instrumentation code.
Why bedrock_agentcore User? Defense in Depth
The Dockerfile creates a non-root user (uid=1000) and switches to it. This is one layer in AgentCore’s security stack:
┌──────────────────────────────────────────────────────────┐
│ SECURITY: Defense in Depth │
│ │
│ Layer 1: Firecracker microVM (hardware isolation via KVM)│
│ Layer 2: Jailer (chroot + cgroups + seccomp filters) │
│ Layer 3: Non-root user (bedrock_agentcore, uid=1000) │
│ │
│ As root: │
│ - Can read /etc/shadow │
│ - Can modify system binaries │
│ - Can bind to privileged ports (<1024) │
│ - Can access /proc, /sys for host info │
│ │
│ As bedrock_agentcore (uid=1000): │
│ - Can only read/write /app and /home/bedrock_agentcore │
│ - Cannot modify system files │
│ - Cannot bind to port 80/443 │
│ - Limited /proc access │
│ │
│ That's why ports are 8000, 8080, 9000 — all > 1024 │
│ Non-root users CAN'T bind to ports below 1024 │
└──────────────────────────────────────────────────────────┘
Even if an LLM hallucinates a malicious tool call that escapes the process, it’s running as a non-root user inside a microVM with seccomp filters. Three layers would need to be breached simultaneously.
Request Flow — From Your API Call to Your Agent
You call: invoke_agent_runtime(session_id, payload)
│
▼
AgentCore Control Plane → routes to correct microVM
│
▼
Port 9000 (sidecar inside microVM)
│ Adds headers: X-Session-Id, X-Request-Id, X-Access-Token
│
▼
Port 8080 (your Starlette app)
│ POST /invocations with JSON payload
│
▼
@app.entrypoint → your_handler(payload)
│ agent(prompt) → LLM + tools → response
│
▼
Response streams back: 8080 → 9000 → AgentCore → you
Meanwhile, port 8000 (OTel collector) captures:
- LLM latency, token counts
- Tool execution durations
- gen_ai.client.token.usage metrics
→ Ships to CloudWatch / X-Ray
The sidecar on port 9000 exists so your app doesn’t need to handle session management, auth token injection, or health reporting. It’s the bridge between AgentCore’s control plane and your code.
Cold Start vs Warm Start — The Complete Picture
The rule is simple: does a microVM for this session ID already exist and is it alive?
| Scenario | Result | Why |
|---|---|---|
| First request with session-A | COLD | No microVM exists, must boot one |
| Second request with same session-A (within timeout) | WARM | microVM still running, reuse it |
| Request with new session-B | COLD | Different session = always new microVM |
| Request with session-A after timeout expired | COLD | microVM was terminated, boots fresh |
Cold Start — What Actually Happens
invoke_agent_runtime(session_id="new-session")
│
▼
AgentCore: "new-session" not found
│
├── 1. Jailer creates jail + cgroups
├── 2. Firecracker process starts
├── 3. Linux kernel boots inside microVM
├── 4. Container image loaded
├── 5. CMD runs:
│ opentelemetry-instrument python -m your_agent
│ ├── OTel collector starts on :8000
│ ├── Sidecar starts on :9000
│ ├── Python imports strands, boto3
│ ├── Agent() initializes model connection
│ └── Uvicorn starts on :8080
├── 6. Sidecar pings :8080/ping → HEALTHY
├── 7. Sidecar forwards request to :8080/invocations
└── 8. Agent processes prompt → response streams back
TOTAL: ~3.4s (steps 1-6 are the cold start penalty ~0.8s)
(steps 7-8 are agent processing ~2.5s)
Warm Start — What Gets Skipped
invoke_agent_runtime(session_id="existing-session")
│
▼
AgentCore: "existing-session" found → route to existing microVM
│
├── Sidecar on :9000 receives request
├── Forwards to :8080/invocations
│ Python already running. Agent already initialized.
│ No boot. No imports. No init.
├── Agent processes prompt (LLM + tools)
└── Response streams back
TOTAL: ~2.5s (saved ~0.9s of boot + init)
Idle timer RESETS → microVM stays alive
The warm start saves the entire boot sequence — Firecracker, kernel, Python imports, agent initialization. Everything is already in memory from the previous request.
Session ID Is Everything
The session ID is the key that maps to a microVM. Here’s how it plays out in practice:
# Request 1: session-A → COLD START (new microVM boots)
agentcore_client.invoke_agent_runtime(
agentRuntimeArn=agent_arn,
runtimeSessionId="session-A",
payload=json.dumps({"prompt": "My name is Anuja"})
)
# Request 2: same session-A → WARM START (same microVM, instant)
# The agent REMEMBERS "Anuja" — state lives in memory
agentcore_client.invoke_agent_runtime(
agentRuntimeArn=agent_arn,
runtimeSessionId="session-A",
payload=json.dumps({"prompt": "What's my name?"})
)
# Response: "Anuja!" — no database lookup, no serialization
# Request 3: session-B → COLD START (completely new microVM)
# This microVM has NO knowledge of session-A
agentcore_client.invoke_agent_runtime(
agentRuntimeArn=agent_arn,
runtimeSessionId="session-B",
payload=json.dumps({"prompt": "What's my name?"})
)
# Response: "I don't know your name" — different microVM, different memory
Each session ID gets its own isolated microVM with its own kernel, memory, filesystem, and Python process. There is no shared state between sessions.
Pre-Warming — Paying Cold Start Cost Early
Since AgentCore has no provisioned concurrency, you can pre-warm by invoking sessions before users arrive:
WITHOUT pre-warming:
User A arrives → session-A → COLD (microVM boots ~0.8s penalty)
User A again → session-A → WARM (same microVM)
WITH pre-warming:
7:00 AM: invoke(session-001, "ping") → COLD (boots microVM)
invoke(session-002, "ping") → COLD (boots microVM)
invoke(session-003, "ping") → COLD (boots microVM)
Now 3 microVMs are alive and idle.
9:00 AM: User A arrives
Assign User A → session-001
invoke(session-001, prompt) → WARM (microVM already running)
Pre-warming = paying the cold start cost BEFORE users arrive
so that when users arrive, they get warm starts.
Cost: you pay for idle microVM time (8 GB RAM each)
Benefit: zero cold start penalty for your users
The OpenTelemetry Auto-Instrumentation
The CMD wraps your Python process with opentelemetry-instrument:
CMD ["opentelemetry-instrument", "python", "-m", "your_agent"]
^^^^^^^^^^^^^^^^^^^^^^^^^^
This wrapper auto-instruments:
- boto3 HTTP requests → Bedrock API latency
- All function calls marked with spans
- gen_ai.client.token.usage metrics
- strands.event_loop.cycle_duration metrics
Your agent code
│
│ (auto-instrumented by OTel)
▼
localhost:8000 (OTel collector inside microVM)
│
│ (exports metrics/traces)
▼
CloudWatch / X-Ray
You don’t write any instrumentation code. The metrics and traces appear in CloudWatch automatically because the OTel wrapper intercepts all outgoing HTTP calls and records timing, status codes, and token counts.
Why Three Ports Instead of One?
Separation of concerns:
| Port | Owner | Purpose | You Control It? |
|---|---|---|---|
| 8080 | Your app | Agent logic, request handling | Yes |
| 9000 | AgentCore sidecar | Session management, auth, routing | No |
| 8000 | OTel collector | Metrics, traces, observability | No |
The sidecar pattern means your agent code stays clean — you write a request handler and return a response. Session lifecycle, authentication, health reporting, and observability are handled by the two processes you didn’t write. All three run inside the same Firecracker microVM, sharing the 2 vCPU and 8 GB RAM allocation.
References
More recent articles
- OpenUSD: Advanced Patterns and Common Gotchas. - 28th March 2026
- OpenUSD Mastery: From Composition to Pipeline — A SO-101 Arm Journey - 25th March 2026
- Learning OpenUSD — From Curious Questions to Real Understanding - 19th March 2026