AgentCore Runtime vs Lambda — Scaling, Warm Pools, and Why Fixed 8 GB Boxes Exist
11th March 2026
Amazon Bedrock AgentCore Runtime uses Firecracker microVMs to run AI agent tools in isolated environments. But if you’ve used Lambda, it sounds familiar — serverless, auto-scaling, pay-per-use. So why does AgentCore exist? Here’s the complete picture: how AgentCore actually scales, what it can and can’t do, and when you’d pick it over Lambda or ECS.
AgentCore Resource Allocation — Fixed, Not Flexible
AgentCore gives every session a fixed allocation. You cannot configure it:
| Session Type | CPU | RAM | Adjustable? |
|---|---|---|---|
| Agent Runtime | 2 vCPU | 8 GB | No |
| Browser sessions | 1 vCPU | 4 GB | No |
| Code Interpreter | 2 vCPU | 8 GB | No |
No API to change this. No parameter to request more. Your agent gets 8 GB. Period. Need 16 GB? Not possible on AgentCore today. While Firecracker supports memory hotplugging at the infrastructure level, AWS does not expose this to you — you get a fixed box.
Cold Starts and Warm Sessions — No Warm Pools
AgentCore has no equivalent to Lambda’s provisioned concurrency:
❌ No "provisioned concurrency" like Lambda
❌ No "warm pool" configuration
❌ No "min instances" setting
❌ No way to pre-warm microVMs
What AgentCore DOES have: idle session timeout
HOW IT WORKS:
Request 1 arrives → new microVM boots (COLD START ~1-3s)
Request 1 completes → microVM stays IDLE
┌──────────────────────────────────────────────────────────┐
│ │
│ ←── idle timeout (default 15 min) ──→ │
│ (active) (waiting) (WARM!) (waiting) (WARM!) │
│ │
└──────────────────────────────────────────────────────────┘
Request 2 arrives within timeout → WARM START (same microVM, instant)
Request 2 arrives after timeout → COLD START (new microVM, ~1-3s)
The only knob you have is idleRuntimeSessionTimeout:
# Increase idle timeout to keep sessions warm longer
agentcore_control_client.update_agent_runtime(
agentRuntimeId=agent_id,
lifecycleConfiguration={
'idleRuntimeSessionTimeout': 3600 # 1 hour instead of 15 min
}
)
But longer timeout = you pay for idle RAM the whole time. That’s the tradeoff.
Simulating Warm Pools With What’s Available
Since AgentCore doesn’t offer warm pools natively, here are workarounds using available features:
Strategy 1: Long Idle Timeout + Periodic Pings
Set timeout to 1 hour.
Send a health check ping every 50 minutes.
Session never goes idle → never terminated.
┌──────────────────────────────────────────────────────────┐
│ Session lifetime (up to 8 hours max) │
│ │
│ ├── real request │
│ ├── 50 min... ping (keep alive) │
│ ├── 50 min... ping (keep alive) │
│ ├── real request (INSTANT — session was warm) │
│ ├── 50 min... ping (keep alive) │
│ └── ...up to 8 hours max lifetime │
└──────────────────────────────────────────────────────────┘
Cost: you pay for 8 GB RAM sitting idle.
Benefit: zero cold starts for your users.
Strategy 2: Pre-Create Sessions for Expected Traffic
You know traffic spikes at 9 AM.
At 8:55 AM, invoke 50 sessions with a dummy request.
Each session boots a microVM → stays warm until idle timeout.
┌──────────────────────────────────────────────────────────┐
│ 8:55 AM: Pre-warm │
│ invoke(session_1, "ping") → microVM 1 booted │
│ invoke(session_2, "ping") → microVM 2 booted │
│ invoke(session_3, "ping") → microVM 3 booted │
│ ... │
│ invoke(session_50, "ping") → microVM 50 booted │
│ │
│ 9:00 AM: Real traffic │
│ user_A → session_1 (WARM!) │
│ user_B → session_2 (WARM!) │
│ user_C → session_3 (WARM!) │
│ │
│ 9:15 AM: Unused sessions auto-terminate │
└──────────────────────────────────────────────────────────┘
Strategy 3: Reuse Session IDs (The Intended Model)
Same session_id = same microVM (if still alive)
User A's first request → new microVM (cold start)
User A's second request → SAME microVM (warm!)
agentcore_client.invoke_agent_runtime(
agentRuntimeArn=agent_arn,
runtimeSessionId="user-anuja-session", # same ID = same microVM
payload=json.dumps({"prompt": "What is 2+2?"})
)
As long as user keeps chatting within idle timeout → always warm.
Hard Limits From Official Docs
| Limit | Default | Adjustable? |
|---|---|---|
| Active sessions per account (us-east-1) | 1,000 | Yes (support ticket) |
| Active sessions per account (other regions) | 500 | Yes (support ticket) |
| New sessions per minute per endpoint | 100 | Yes |
| Invocations per second per endpoint | 50 | Yes |
| Idle session timeout | 15 minutes | Yes (via API) |
| Max session lifetime | 8 hours | No |
| Total agents per account | 1,000 | Yes |
| CPU per session | 2 vCPU | No |
| RAM per session | 8 GB | No |
| Payload size | 100 MB | No |
Why Lambda Can’t Do What AgentCore Does
For simple agents, Lambda might be enough. AgentCore exists for the things Lambda can’t do:
Problem 1: Time Limit
Lambda: max 15 minutes → function killed
AgentCore: max 8 hours
Agent doing research:
→ calls 20 tools
→ each tool waits for API
→ LLM thinks between each step
→ total time: 45 minutes
Lambda: 💥 KILLED at 15 min (halfway through)
AgentCore: ✅ runs to completion
Problem 2: Stateful Sessions
LAMBDA (stateless — every invocation starts fresh):
Request 1: "My name is Anuja" → Lambda boots → responds → DIES
Request 2: "What's my name?" → NEW Lambda → no memory of Request 1
To keep state: save to DynamoDB/S3 between EVERY request,
then load it back on EVERY new request. YOU build all of this.
AGENTCORE (stateful — same microVM stays alive):
Request 1: "My name is Anuja" → microVM boots → responds → STAYS ALIVE
Request 2: "What's my name?" → SAME microVM → "Anuja!" → instant
State lives in memory. No serialization. No DynamoDB. It just works.
Problem 3: Session Isolation (Security)
LAMBDA (container isolation — shares host OS kernel):
Container A ──┐
Container B ──┼── shared Linux kernel ← container escape = see all
Container C ──┘
If an agent runs malicious code (LLM hallucinated a bad tool call),
a container escape could access other users' data.
AGENTCORE (microVM isolation — each session has its OWN kernel):
microVM A: [own kernel] [own memory] [own filesystem]
microVM B: [own kernel] [own memory] [own filesystem]
Even if code escapes the process, it's still inside a VM.
Hardware-level isolation (KVM), not just software isolation.
Problem 4: Large Payloads
Lambda: max 6 MB request / 6 MB response
AgentCore: max 100 MB request / response
Agent analyzing a PDF:
Lambda: "Upload to S3 first, pass the S3 URL" → extra complexity
AgentCore: send the 50 MB PDF directly in the request → just works
Problem 5: Persistent Local State
Lambda: /tmp is 512 MB, wiped between invocations
Agent downloads 3 files, processes them across steps.
Between invocations → files might be gone.
AgentCore: local filesystem persists for the session (up to 8 hours)
Agent downloads files → stays on disk → next request uses them
No S3 round-trips. No state management code.
Problem 6: Streaming
Lambda: streaming support exists but awkward (response streaming URLs)
AgentCore: SSE streaming built-in, works with agent.stream_async() directly
Side-by-Side Comparison
| Feature | Lambda | AgentCore Runtime | ECS/Fargate |
|---|---|---|---|
| Max duration | 15 min | 8 hours | Unlimited |
| State between requests | Stateless | Stateful (same microVM) | Stateful |
| Isolation | Container | microVM (hardware-level) | Container |
| Streaming | Awkward | Built-in SSE | DIY |
| Cold start | ~1-2s | ~1-3s | 30-60s |
| Warm pools | Provisioned concurrency | Not available | Min tasks |
| Memory config | 128 MB—10 GB | Fixed 8 GB | Any size |
| CPU config | Proportional to memory | Fixed 2 vCPU | Any size |
| Scaling control | Full | Fully managed | Full control |
| Payload size | 6 MB | 100 MB | Unlimited |
| Identity/Auth | DIY | Built-in (OAuth, IAM) | DIY |
| Session management | DIY (DynamoDB) | Built-in | DIY |
| Agent-specific features | None | Built-in | None |
When to Use What
USE LAMBDA WHEN:
✅ Agent is simple (1-2 tool calls, responds in < 30 seconds)
✅ Stateless is fine (each request is independent)
✅ Small payloads (text only, < 6 MB)
✅ You want full control over scaling
✅ You already have Lambda infrastructure
✅ Cost optimization is #1 priority (Lambda is cheaper for short tasks)
USE AGENTCORE WHEN:
✅ Agent runs long tasks (minutes to hours)
✅ Multi-turn conversations (need state between requests)
✅ Large files (PDFs, images, datasets > 6 MB)
✅ Security-critical (need microVM isolation, not container)
✅ Agent acts on behalf of users (need built-in OAuth identity)
✅ You don't want to build session management, streaming, auth
✅ You want to deploy with 4 lines of code, not manage infrastructure
USE ECS/FARGATE WHEN:
✅ You need full control over everything
✅ Custom memory/CPU per container
✅ Warm pools with min/max task counts
✅ Long-running services (always-on, not session-based)
✅ You have DevOps team to manage it
The Real Reason AgentCore Exists
WITHOUT AgentCore, to build a production agent you need:
┌────────────────────────────────────────────────────────────┐
│ YOU must build: │
│ │
│ State persistence → S3 + serialize/deserialize │
│ Streaming → API Gateway + WebSocket │
│ Auth / Identity → Cognito + custom middleware │
│ Isolation → Container security hardening │
│ Long-running support → Step Functions or ECS │
│ Large payload handling → S3 pre-signed URLs │
│ Health checks → Custom /ping endpoint │
│ Scaling → Auto Scaling policies │
│ Cleanup → Lifecycle hooks │
│ │
│ = 2-4 weeks of infrastructure work before writing │
│ a single line of agent logic │
└────────────────────────────────────────────────────────────┘
WITH AgentCore:
┌────────────────────────────────────────────────────────────┐
│ │
│ @app.entrypoint │
│ def my_agent(payload): │
│ return agent(payload["prompt"]) │
│ │
│ app.run() │
│ │
│ = 4 lines. Deploy. Done. │
│ Sessions, streaming, auth, isolation — all included. │
└────────────────────────────────────────────────────────────┘
Lambda is a general-purpose compute service. You can build agents on it, but you build all the agent infrastructure yourself. AgentCore is an agent-specific compute service — sessions, streaming, isolation, auth, and tool execution are built in. It’s the difference between renting an empty office and signing up for a fully furnished co-working space. Both work. One requires you to buy desks, chairs, internet, and coffee machines first.
References
More recent articles
- OpenUSD: Advanced Patterns and Common Gotchas. - 28th March 2026
- OpenUSD Mastery: From Composition to Pipeline — A SO-101 Arm Journey - 25th March 2026
- Learning OpenUSD — From Curious Questions to Real Understanding - 19th March 2026