What Actually Happens When You Call invoke_agent

What Actually Happens When You Call invoke_agent_runtime()

12th March 2026

You call invoke_agent_runtime(). Your agent responds 3 seconds later. But what actually happened in those 3 seconds? There’s an entire orchestration layer — sidecars, health checks, microVM boot sequences — that you never see. Here’s the full picture.

What invoke_agent_runtime() Actually Does

When you run this code:

agentcore_client = boto3.client('bedrock-agentcore', region_name=region)

boto3_response = agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=agent_arn,
    qualifier="DEFAULT",
    payload=json.dumps({"prompt": "What is 2+2?"})
)

You’re making ONE HTTPS request to the AgentCore control plane. That’s it. You never call /ping. You never call /invocations. You call invoke_agent_runtime() and everything else happens behind the scenes.

YOUR CODE                         AGENTCORE (internal)

invoke_agent_runtime() ────────►  route to microVM
                                    │
(you never see /ping              ├── GET /ping (background)
 or /invocations)                 │   (already running)
                                    │
                                    └── POST /invocations
                                         │
                                         ▼
                                    your @app.entrypoint runs
                                         │
◄─────────────────────────────────  response streams back
boto3_response

One API call from you. AgentCore handles everything else internally.

Cold Start vs Warm Start

The experience differs based on whether a microVM already exists for your session:

COLD START (new microVM):
  1. Boot Firecracker microVM              (~125ms)
  2. Start your container
  3. CMD runs: opentelemetry-instrument python -m strands_claude
     ├── OTel collector on :8000
     ├── Sidecar on :9000
     └── Your app on :8080
  4. Sidecar polls /ping until 200          ← ping FIRST
  5. Then forwards your request             ← invoke SECOND

  Your invoke_agent_runtime() call BLOCKS during steps 1-4.
  You don't see this. You just wait ~3.4 seconds.

WARM START (existing microVM):
  1. Sidecar already pinging /ping every few seconds
  2. Control plane knows microVM is Healthy
  3. Forward your request immediately

  Your invoke_agent_runtime() gets response in ~2.5 seconds.

The /ping on cold start is the gate — AgentCore won’t send your request until it confirms your agent is alive and ready. That ~0.8s difference between cold and warm is partly this ping-wait loop.

The Sidecar: An Invisible Helper You Never Installed

Every AgentCore microVM has a sidecar process. You didn’t write it. You didn’t install it. You don’t control it. AWS injects it at boot time alongside your container.

INSIDE YOUR microVM

┌─────────────────────────┐  ┌─────────────────────────┐
│  YOUR APP (:8080)        │  │  SIDECAR (:9000)         │
│  ← your Dockerfile       │  │  ← AWS injected this     │
│  ← strands_claude.py     │  │  ← not in your image     │
│  ← your agent + tools    │  │  ← you don't see it      │
│                          │  │                          │
│  Knows: how to answer    │  │  Knows: how to talk to   │
│  questions               │  │  AgentCore control plane  │
└─────────────────────────┘  └─────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  OTel COLLECTOR (:8000)  ← also injected by AWS         │
└─────────────────────────────────────────────────────────┘

The name comes from a motorcycle sidecar: the motorcycle (your app) does the real work, the sidecar (attached helper) handles logistics. Your code doesn’t need a single line about AgentCore infrastructure. The sidecar handles all the integration for you.

The 6 Jobs of the Sidecar

Job 1: Receive Requests From Outside

AgentCore’s control plane can’t talk to your app’s :8080 directly. The sidecar on :9000 is the door into your microVM. It receives the request from the control plane and forwards it to your app.

Job 2: Health Checks

Every few seconds, the sidecar pings your app:

Sidecar: GET http://localhost:8080/ping
App:     {"status": "Healthy"}
Sidecar → tells control plane: "this VM is alive"

If /ping fails:
Sidecar → tells control plane: "this VM is DEAD"
Control plane → terminates microVM

Job 3: Inject Request Context

When a request arrives, the sidecar adds headers before forwarding to :8080:

Incoming from control plane:
  session_id: "abc-123"

Sidecar ADDS headers:
  X-Session-Id: abc-123
  X-Request-Id: uuid-456
  X-Access-Token: <agent identity token>

Your app reads these via RequestContext:
  context.session_id → "abc-123"

You didn’t parse any of this. The sidecar did it for you.

Job 4: Lifecycle Management

The sidecar continuously checks: Has the idle timeout been reached? Has maxLifetime been exceeded? If idle timeout hits, the sidecar triggers graceful shutdown and terminates the microVM. Your app doesn’t have a single line about timeouts.

Job 5: Stream Responses Back

Your app returns an SSE stream from :8080. The sidecar receives the stream, relays it through :9000 back to the AgentCore control plane, which streams it to your boto3 client. The full path:

:8080 → :9000 → AgentCore control plane → boto3 → you

Job 6: Agent Identity (OAuth Tokens)

If your agent needs to access external services (Slack, GitHub, etc.) on behalf of a user, the sidecar injects OAuth tokens into the request. Your app reads them via BedrockAgentCoreContext.get_workload_access_token(). You didn’t implement OAuth. The sidecar brought the token from the AgentCore Identity service.

Where Does the Sidecar Actually Live?

The sidecar sits INSIDE the microVM — on the AgentCore side. Not on your laptop. Not in your code. Not in your Docker image.

YOUR LAPTOP (local):
  └── test_warm_pools.py
      └── agentcore_client.invoke_agent_runtime()
              │
              │  HTTPS request over internet
              ▼
AWS CLOUD:
  ├── AgentCore Control Plane  ← managed by AWS, routes requests
  ├── ECR                      ← stores your Docker image
  └── Firecracker microVM      ← runs your container
       ├── YOUR APP (:8080)    ← from your Docker image
       ├── SIDECAR (:9000)     ← injected by AWS at boot time
       └── OTel (:8000)        ← injected by AWS at boot time

When Does the Sidecar Get Added?

When you deploy, your Docker image gets pushed to ECR. It contains your Python runtime, your dependencies, and your agent code. It does NOT contain the sidecar.

When AgentCore boots a microVM for a new session:

Step 1: Create Firecracker microVM
Step 2: Load your container image from ECR
Step 3: INJECT sidecar process     ← AWS adds this
Step 4: INJECT OTel collector      ← AWS adds this
Step 5: Start everything
Step 6: Sidecar starts pinging :8080/ping
Step 7: Ready for requests

It’s the same pattern used everywhere in cloud infrastructure:

Kubernetes:    Envoy sidecar  → service mesh, traffic routing
AWS App Mesh:  Envoy sidecar  → service discovery, traffic routing
Istio:         Envoy sidecar  → observability, security, traffic
AgentCore:     AWS sidecar    → health, auth, routing, lifecycle, streaming

Same principle everywhere: your app stays simple, the sidecar handles infrastructure. Your app doesn’t change when AWS upgrades the sidecar. Your app is portable — it works with or without the sidecar.

With vs Without a Sidecar

Without the sidecar, you’d need to build all of this yourself:

WITHOUT sidecar (you do everything):
  your_app.py:
    ├── agent logic (tools, LLM calls)
    ├── health check endpoint
    ├── auth token management
    ├── session tracking
    ├── idle timeout logic
    ├── graceful shutdown
    ├── metrics collection
    ├── streaming protocol

  = your code
  = you maintain it
  = breaks when AgentCore changes

WITH sidecar (separation of concerns):
  your_app.py:
    ├── agent logic (tools, LLM calls)
    └── @app.entrypoint  ← that's it

  sidecar (AWS maintains):
    ├── everything else

  = 30 lines of your code
  = AWS maintains the rest
  = upgrades happen without you changing anything

That’s the sidecar. An invisible helper process that handles all the AgentCore plumbing so your agent code stays clean. And invoke_agent_runtime()? It’s one API call. The entire orchestration — boot, ping, route, stream — happens on AWS’s side, invisible to you.

Posted 12th March 2026 at 5:32 pm · Subscribe to my newsletter

Akshay Parkhi's Weblog