Akshay Parkhi's Weblog

Subscribe

AgentCore Runtime vs Lambda — Scaling, Warm Pools, and Why Fixed 8 GB Boxes Exist

11th March 2026

Amazon Bedrock AgentCore Runtime uses Firecracker microVMs to run AI agent tools in isolated environments. But if you’ve used Lambda, it sounds familiar — serverless, auto-scaling, pay-per-use. So why does AgentCore exist? Here’s the complete picture: how AgentCore actually scales, what it can and can’t do, and when you’d pick it over Lambda or ECS.

AgentCore Resource Allocation — Fixed, Not Flexible

AgentCore gives every session a fixed allocation. You cannot configure it:

Session TypeCPURAMAdjustable?
Agent Runtime2 vCPU8 GBNo
Browser sessions1 vCPU4 GBNo
Code Interpreter2 vCPU8 GBNo

No API to change this. No parameter to request more. Your agent gets 8 GB. Period. Need 16 GB? Not possible on AgentCore today. While Firecracker supports memory hotplugging at the infrastructure level, AWS does not expose this to you — you get a fixed box.

Cold Starts and Warm Sessions — No Warm Pools

AgentCore has no equivalent to Lambda’s provisioned concurrency:

❌ No "provisioned concurrency" like Lambda
❌ No "warm pool" configuration
❌ No "min instances" setting
❌ No way to pre-warm microVMs

What AgentCore DOES have: idle session timeout

HOW IT WORKS:

  Request 1 arrives → new microVM boots (COLD START ~1-3s)
  Request 1 completes → microVM stays IDLE

  ┌──────────────────────────────────────────────────────────┐
  │                                                          │
  │  ←── idle timeout (default 15 min) ──→                   │
  │  (active)   (waiting)  (WARM!)  (waiting)  (WARM!)       │
  │                                                          │
  └──────────────────────────────────────────────────────────┘

  Request 2 arrives within timeout → WARM START (same microVM, instant)
  Request 2 arrives after timeout  → COLD START (new microVM, ~1-3s)

The only knob you have is idleRuntimeSessionTimeout:

# Increase idle timeout to keep sessions warm longer
agentcore_control_client.update_agent_runtime(
    agentRuntimeId=agent_id,
    lifecycleConfiguration={
        'idleRuntimeSessionTimeout': 3600   # 1 hour instead of 15 min
    }
)

But longer timeout = you pay for idle RAM the whole time. That’s the tradeoff.

Simulating Warm Pools With What’s Available

Since AgentCore doesn’t offer warm pools natively, here are workarounds using available features:

Strategy 1: Long Idle Timeout + Periodic Pings

Set timeout to 1 hour.
Send a health check ping every 50 minutes.
Session never goes idle → never terminated.

  ┌──────────────────────────────────────────────────────────┐
  │  Session lifetime (up to 8 hours max)                    │
  │                                                          │
  │  ├── real request                                        │
  │  ├── 50 min... ping (keep alive)                         │
  │  ├── 50 min... ping (keep alive)                         │
  │  ├── real request (INSTANT — session was warm)           │
  │  ├── 50 min... ping (keep alive)                         │
  │  └── ...up to 8 hours max lifetime                       │
  └──────────────────────────────────────────────────────────┘

Cost: you pay for 8 GB RAM sitting idle.
Benefit: zero cold starts for your users.

Strategy 2: Pre-Create Sessions for Expected Traffic

You know traffic spikes at 9 AM.
At 8:55 AM, invoke 50 sessions with a dummy request.
Each session boots a microVM → stays warm until idle timeout.

  ┌──────────────────────────────────────────────────────────┐
  │  8:55 AM: Pre-warm                                       │
  │    invoke(session_1, "ping")  → microVM 1 booted         │
  │    invoke(session_2, "ping")  → microVM 2 booted         │
  │    invoke(session_3, "ping")  → microVM 3 booted         │
  │    ...                                                   │
  │    invoke(session_50, "ping") → microVM 50 booted        │
  │                                                          │
  │  9:00 AM: Real traffic                                   │
  │    user_A → session_1 (WARM!)                            │
  │    user_B → session_2 (WARM!)                            │
  │    user_C → session_3 (WARM!)                            │
  │                                                          │
  │  9:15 AM: Unused sessions auto-terminate                 │
  └──────────────────────────────────────────────────────────┘

Strategy 3: Reuse Session IDs (The Intended Model)

Same session_id = same microVM (if still alive)

User A's first request  → new microVM (cold start)
User A's second request → SAME microVM (warm!)

agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=agent_arn,
    runtimeSessionId="user-anuja-session",  # same ID = same microVM
    payload=json.dumps({"prompt": "What is 2+2?"})
)

As long as user keeps chatting within idle timeout → always warm.

Hard Limits From Official Docs

LimitDefaultAdjustable?
Active sessions per account (us-east-1)1,000Yes (support ticket)
Active sessions per account (other regions)500Yes (support ticket)
New sessions per minute per endpoint100Yes
Invocations per second per endpoint50Yes
Idle session timeout15 minutesYes (via API)
Max session lifetime8 hoursNo
Total agents per account1,000Yes
CPU per session2 vCPUNo
RAM per session8 GBNo
Payload size100 MBNo

Why Lambda Can’t Do What AgentCore Does

For simple agents, Lambda might be enough. AgentCore exists for the things Lambda can’t do:

Problem 1: Time Limit

Lambda:     max 15 minutes → function killed
AgentCore:  max 8 hours

Agent doing research:
  → calls 20 tools
  → each tool waits for API
  → LLM thinks between each step
  → total time: 45 minutes

Lambda:     💥 KILLED at 15 min (halfway through)
AgentCore:  ✅ runs to completion

Problem 2: Stateful Sessions

LAMBDA (stateless — every invocation starts fresh):
  Request 1: "My name is Anuja"  → Lambda boots → responds → DIES
  Request 2: "What's my name?"   → NEW Lambda → no memory of Request 1

  To keep state: save to DynamoDB/S3 between EVERY request,
  then load it back on EVERY new request. YOU build all of this.

AGENTCORE (stateful — same microVM stays alive):
  Request 1: "My name is Anuja"  → microVM boots → responds → STAYS ALIVE
  Request 2: "What's my name?"   → SAME microVM → "Anuja!" → instant

  State lives in memory. No serialization. No DynamoDB. It just works.

Problem 3: Session Isolation (Security)

LAMBDA (container isolation — shares host OS kernel):
  Container A ──┐
  Container B ──┼── shared Linux kernel ← container escape = see all
  Container C ──┘

  If an agent runs malicious code (LLM hallucinated a bad tool call),
  a container escape could access other users' data.

AGENTCORE (microVM isolation — each session has its OWN kernel):
  microVM A: [own kernel] [own memory] [own filesystem]
  microVM B: [own kernel] [own memory] [own filesystem]

  Even if code escapes the process, it's still inside a VM.
  Hardware-level isolation (KVM), not just software isolation.

Problem 4: Large Payloads

Lambda:     max 6 MB request / 6 MB response
AgentCore:  max 100 MB request / response

Agent analyzing a PDF:
  Lambda:     "Upload to S3 first, pass the S3 URL" → extra complexity
  AgentCore:  send the 50 MB PDF directly in the request → just works

Problem 5: Persistent Local State

Lambda:     /tmp is 512 MB, wiped between invocations
            Agent downloads 3 files, processes them across steps.
            Between invocations → files might be gone.

AgentCore:  local filesystem persists for the session (up to 8 hours)
            Agent downloads files → stays on disk → next request uses them
            No S3 round-trips. No state management code.

Problem 6: Streaming

Lambda:     streaming support exists but awkward (response streaming URLs)
AgentCore:  SSE streaming built-in, works with agent.stream_async() directly

Side-by-Side Comparison

FeatureLambdaAgentCore RuntimeECS/Fargate
Max duration15 min8 hoursUnlimited
State between requestsStatelessStateful (same microVM)Stateful
IsolationContainermicroVM (hardware-level)Container
StreamingAwkwardBuilt-in SSEDIY
Cold start~1-2s~1-3s30-60s
Warm poolsProvisioned concurrencyNot availableMin tasks
Memory config128 MB—10 GBFixed 8 GBAny size
CPU configProportional to memoryFixed 2 vCPUAny size
Scaling controlFullFully managedFull control
Payload size6 MB100 MBUnlimited
Identity/AuthDIYBuilt-in (OAuth, IAM)DIY
Session managementDIY (DynamoDB)Built-inDIY
Agent-specific featuresNoneBuilt-inNone

When to Use What

USE LAMBDA WHEN:
  ✅ Agent is simple (1-2 tool calls, responds in < 30 seconds)
  ✅ Stateless is fine (each request is independent)
  ✅ Small payloads (text only, < 6 MB)
  ✅ You want full control over scaling
  ✅ You already have Lambda infrastructure
  ✅ Cost optimization is #1 priority (Lambda is cheaper for short tasks)

USE AGENTCORE WHEN:
  ✅ Agent runs long tasks (minutes to hours)
  ✅ Multi-turn conversations (need state between requests)
  ✅ Large files (PDFs, images, datasets > 6 MB)
  ✅ Security-critical (need microVM isolation, not container)
  ✅ Agent acts on behalf of users (need built-in OAuth identity)
  ✅ You don't want to build session management, streaming, auth
  ✅ You want to deploy with 4 lines of code, not manage infrastructure

USE ECS/FARGATE WHEN:
  ✅ You need full control over everything
  ✅ Custom memory/CPU per container
  ✅ Warm pools with min/max task counts
  ✅ Long-running services (always-on, not session-based)
  ✅ You have DevOps team to manage it

The Real Reason AgentCore Exists

WITHOUT AgentCore, to build a production agent you need:

  ┌────────────────────────────────────────────────────────────┐
  │  YOU must build:                                           │
  │                                                            │
  │  State persistence       → S3 + serialize/deserialize      │
  │  Streaming               → API Gateway + WebSocket         │
  │  Auth / Identity         → Cognito + custom middleware     │
  │  Isolation               → Container security hardening    │
  │  Long-running support    → Step Functions or ECS           │
  │  Large payload handling  → S3 pre-signed URLs              │
  │  Health checks           → Custom /ping endpoint           │
  │  Scaling                 → Auto Scaling policies           │
  │  Cleanup                 → Lifecycle hooks                 │
  │                                                            │
  │  = 2-4 weeks of infrastructure work before writing         │
  │    a single line of agent logic                            │
  └────────────────────────────────────────────────────────────┘

WITH AgentCore:

  ┌────────────────────────────────────────────────────────────┐
  │                                                            │
  │  @app.entrypoint                                           │
  │  def my_agent(payload):                                    │
  │      return agent(payload["prompt"])                        │
  │                                                            │
  │  app.run()                                                 │
  │                                                            │
  │  = 4 lines. Deploy. Done.                                  │
  │    Sessions, streaming, auth, isolation — all included.    │
  └────────────────────────────────────────────────────────────┘

Lambda is a general-purpose compute service. You can build agents on it, but you build all the agent infrastructure yourself. AgentCore is an agent-specific compute service — sessions, streaming, isolation, auth, and tool execution are built in. It’s the difference between renting an empty office and signing up for a fully furnished co-working space. Both work. One requires you to buy desks, chairs, internet, and coffee machines first.

References

This is AgentCore Runtime vs Lambda — Scaling, Warm Pools, and Why Fixed 8 GB Boxes Exist by Akshay Parkhi, posted on 11th March 2026.

Next: Inside an AgentCore microVM — Ports, Cold Starts, and the Sidecar Pattern

Previous: How Firecracker MicroVMs Power AgentCore Runtime — From 125ms Boot to Auto-Scaling AI Agents