AgentCore Runtime vs Lambda — Scaling, Warm Pools, and Why Fixed 8 GB Boxes Exist

11th March 2026

Amazon Bedrock AgentCore Runtime uses Firecracker microVMs to run AI agent tools in isolated environments. But if you’ve used Lambda, it sounds familiar — serverless, auto-scaling, pay-per-use. So why does AgentCore exist? Here’s the complete picture: how AgentCore actually scales, what it can and can’t do, and when you’d pick it over Lambda or ECS.

AgentCore Resource Allocation — Fixed, Not Flexible

AgentCore gives every session a fixed allocation. You cannot configure it:

Session Type	CPU	RAM	Adjustable?
Agent Runtime	2 vCPU	8 GB	No
Browser sessions	1 vCPU	4 GB	No
Code Interpreter	2 vCPU	8 GB	No

No API to change this. No parameter to request more. Your agent gets 8 GB. Period. Need 16 GB? Not possible on AgentCore today. While Firecracker supports memory hotplugging at the infrastructure level, AWS does not expose this to you — you get a fixed box.

Cold Starts and Warm Sessions — No Warm Pools

AgentCore has no equivalent to Lambda’s provisioned concurrency:

❌ No "provisioned concurrency" like Lambda
❌ No "warm pool" configuration
❌ No "min instances" setting
❌ No way to pre-warm microVMs

What AgentCore DOES have: idle session timeout

HOW IT WORKS:

  Request 1 arrives → new microVM boots (COLD START ~1-3s)
  Request 1 completes → microVM stays IDLE

  ┌──────────────────────────────────────────────────────────┐
  │                                                          │
  │  ←── idle timeout (default 15 min) ──→                   │
  │  (active)   (waiting)  (WARM!)  (waiting)  (WARM!)       │
  │                                                          │
  └──────────────────────────────────────────────────────────┘

  Request 2 arrives within timeout → WARM START (same microVM, instant)
  Request 2 arrives after timeout  → COLD START (new microVM, ~1-3s)

The only knob you have is idleRuntimeSessionTimeout:

# Increase idle timeout to keep sessions warm longer
agentcore_control_client.update_agent_runtime(
    agentRuntimeId=agent_id,
    lifecycleConfiguration={
        'idleRuntimeSessionTimeout': 3600   # 1 hour instead of 15 min
    }
)

But longer timeout = you pay for idle RAM the whole time. That’s the tradeoff.

Simulating Warm Pools With What’s Available

Since AgentCore doesn’t offer warm pools natively, here are workarounds using available features:

Strategy 1: Long Idle Timeout + Periodic Pings

Set timeout to 1 hour.
Send a health check ping every 50 minutes.
Session never goes idle → never terminated.

  ┌──────────────────────────────────────────────────────────┐
  │  Session lifetime (up to 8 hours max)                    │
  │                                                          │
  │  ├── real request                                        │
  │  ├── 50 min... ping (keep alive)                         │
  │  ├── 50 min... ping (keep alive)                         │
  │  ├── real request (INSTANT — session was warm)           │
  │  ├── 50 min... ping (keep alive)                         │
  │  └── ...up to 8 hours max lifetime                       │
  └──────────────────────────────────────────────────────────┘

Cost: you pay for 8 GB RAM sitting idle.
Benefit: zero cold starts for your users.

Strategy 2: Pre-Create Sessions for Expected Traffic

You know traffic spikes at 9 AM.
At 8:55 AM, invoke 50 sessions with a dummy request.
Each session boots a microVM → stays warm until idle timeout.

  ┌──────────────────────────────────────────────────────────┐
  │  8:55 AM: Pre-warm                                       │
  │    invoke(session_1, "ping")  → microVM 1 booted         │
  │    invoke(session_2, "ping")  → microVM 2 booted         │
  │    invoke(session_3, "ping")  → microVM 3 booted         │
  │    ...                                                   │
  │    invoke(session_50, "ping") → microVM 50 booted        │
  │                                                          │
  │  9:00 AM: Real traffic                                   │
  │    user_A → session_1 (WARM!)                            │
  │    user_B → session_2 (WARM!)                            │
  │    user_C → session_3 (WARM!)                            │
  │                                                          │
  │  9:15 AM: Unused sessions auto-terminate                 │
  └──────────────────────────────────────────────────────────┘

Strategy 3: Reuse Session IDs (The Intended Model)

Same session_id = same microVM (if still alive)

User A's first request  → new microVM (cold start)
User A's second request → SAME microVM (warm!)

agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=agent_arn,
    runtimeSessionId="user-anuja-session",  # same ID = same microVM
    payload=json.dumps({"prompt": "What is 2+2?"})
)

As long as user keeps chatting within idle timeout → always warm.

Hard Limits From Official Docs

Limit	Default	Adjustable?
Active sessions per account (us-east-1)	1,000	Yes (support ticket)
Active sessions per account (other regions)	500	Yes (support ticket)
New sessions per minute per endpoint	100	Yes
Invocations per second per endpoint	50	Yes
Idle session timeout	15 minutes	Yes (via API)
Max session lifetime	8 hours	No
Total agents per account	1,000	Yes
CPU per session	2 vCPU	No
RAM per session	8 GB	No
Payload size	100 MB	No

Why Lambda Can’t Do What AgentCore Does

For simple agents, Lambda might be enough. AgentCore exists for the things Lambda can’t do:

Problem 1: Time Limit

Lambda:     max 15 minutes → function killed
AgentCore:  max 8 hours

Agent doing research:
  → calls 20 tools
  → each tool waits for API
  → LLM thinks between each step
  → total time: 45 minutes

Lambda:     💥 KILLED at 15 min (halfway through)
AgentCore:  ✅ runs to completion

Problem 2: Stateful Sessions

LAMBDA (stateless — every invocation starts fresh):
  Request 1: "My name is Anuja"  → Lambda boots → responds → DIES
  Request 2: "What's my name?"   → NEW Lambda → no memory of Request 1

  To keep state: save to DynamoDB/S3 between EVERY request,
  then load it back on EVERY new request. YOU build all of this.

AGENTCORE (stateful — same microVM stays alive):
  Request 1: "My name is Anuja"  → microVM boots → responds → STAYS ALIVE
  Request 2: "What's my name?"   → SAME microVM → "Anuja!" → instant

  State lives in memory. No serialization. No DynamoDB. It just works.

Problem 3: Session Isolation (Security)

LAMBDA (container isolation — shares host OS kernel):
  Container A ──┐
  Container B ──┼── shared Linux kernel ← container escape = see all
  Container C ──┘

  If an agent runs malicious code (LLM hallucinated a bad tool call),
  a container escape could access other users' data.

AGENTCORE (microVM isolation — each session has its OWN kernel):
  microVM A: [own kernel] [own memory] [own filesystem]
  microVM B: [own kernel] [own memory] [own filesystem]

  Even if code escapes the process, it's still inside a VM.
  Hardware-level isolation (KVM), not just software isolation.

Problem 4: Large Payloads

Lambda:     max 6 MB request / 6 MB response
AgentCore:  max 100 MB request / response

Agent analyzing a PDF:
  Lambda:     "Upload to S3 first, pass the S3 URL" → extra complexity
  AgentCore:  send the 50 MB PDF directly in the request → just works

Problem 5: Persistent Local State

Lambda:     /tmp is 512 MB, wiped between invocations
            Agent downloads 3 files, processes them across steps.
            Between invocations → files might be gone.

AgentCore:  local filesystem persists for the session (up to 8 hours)
            Agent downloads files → stays on disk → next request uses them
            No S3 round-trips. No state management code.

Problem 6: Streaming

Lambda:     streaming support exists but awkward (response streaming URLs)
AgentCore:  SSE streaming built-in, works with agent.stream_async() directly

Side-by-Side Comparison

Feature	Lambda	AgentCore Runtime	ECS/Fargate
Max duration	15 min	8 hours	Unlimited
State between requests	Stateless	Stateful (same microVM)	Stateful
Isolation	Container	microVM (hardware-level)	Container
Streaming	Awkward	Built-in SSE	DIY
Cold start	~1-2s	~1-3s	30-60s
Warm pools	Provisioned concurrency	Not available	Min tasks
Memory config	128 MB—10 GB	Fixed 8 GB	Any size
CPU config	Proportional to memory	Fixed 2 vCPU	Any size
Scaling control	Full	Fully managed	Full control
Payload size	6 MB	100 MB	Unlimited
Identity/Auth	DIY	Built-in (OAuth, IAM)	DIY
Session management	DIY (DynamoDB)	Built-in	DIY
Agent-specific features	None	Built-in	None

When to Use What

USE LAMBDA WHEN:
  ✅ Agent is simple (1-2 tool calls, responds in < 30 seconds)
  ✅ Stateless is fine (each request is independent)
  ✅ Small payloads (text only, < 6 MB)
  ✅ You want full control over scaling
  ✅ You already have Lambda infrastructure
  ✅ Cost optimization is #1 priority (Lambda is cheaper for short tasks)

USE AGENTCORE WHEN:
  ✅ Agent runs long tasks (minutes to hours)
  ✅ Multi-turn conversations (need state between requests)
  ✅ Large files (PDFs, images, datasets > 6 MB)
  ✅ Security-critical (need microVM isolation, not container)
  ✅ Agent acts on behalf of users (need built-in OAuth identity)
  ✅ You don't want to build session management, streaming, auth
  ✅ You want to deploy with 4 lines of code, not manage infrastructure

USE ECS/FARGATE WHEN:
  ✅ You need full control over everything
  ✅ Custom memory/CPU per container
  ✅ Warm pools with min/max task counts
  ✅ Long-running services (always-on, not session-based)
  ✅ You have DevOps team to manage it

The Real Reason AgentCore Exists

WITHOUT AgentCore, to build a production agent you need:

  ┌────────────────────────────────────────────────────────────┐
  │  YOU must build:                                           │
  │                                                            │
  │  State persistence       → S3 + serialize/deserialize      │
  │  Streaming               → API Gateway + WebSocket         │
  │  Auth / Identity         → Cognito + custom middleware     │
  │  Isolation               → Container security hardening    │
  │  Long-running support    → Step Functions or ECS           │
  │  Large payload handling  → S3 pre-signed URLs              │
  │  Health checks           → Custom /ping endpoint           │
  │  Scaling                 → Auto Scaling policies           │
  │  Cleanup                 → Lifecycle hooks                 │
  │                                                            │
  │  = 2-4 weeks of infrastructure work before writing         │
  │    a single line of agent logic                            │
  └────────────────────────────────────────────────────────────┘

WITH AgentCore:

  ┌────────────────────────────────────────────────────────────┐
  │                                                            │
  │  @app.entrypoint                                           │
  │  def my_agent(payload):                                    │
  │      return agent(payload["prompt"])                        │
  │                                                            │
  │  app.run()                                                 │
  │                                                            │
  │  = 4 lines. Deploy. Done.                                  │
  │    Sessions, streaming, auth, isolation — all included.    │
  └────────────────────────────────────────────────────────────┘

Lambda is a general-purpose compute service. You can build agents on it, but you build all the agent infrastructure yourself. AgentCore is an agent-specific compute service — sessions, streaming, isolation, auth, and tool execution are built in. It’s the difference between renting an empty office and signing up for a fully furnished co-working space. Both work. One requires you to buy desks, chairs, internet, and coffee machines first.

References

Posted 11th March 2026 at 6:02 pm · Subscribe to my newsletter

Akshay Parkhi's Weblog