AgentCore Harness, Inside Out

24th April 2026

What’s actually running when AWS says “declarative agents” — and when it’s the right tool.

The one-line summary

AgentCore Harness is an agentic CLI (Kiro / Claude Code / Codex) as a managed service — a single Strands agent running in a per-session Firecracker microVM, extended by config instead of code.

If that sentence makes sense to you, skip to the architecture section. If not, the rest of this post earns it.

Why I went looking

AWS launched a new thing in preview called the AgentCore Harness. The marketing says “declare your agent in a config file and AWS handles the rest.” That’s both a big claim and a vague one.

So I deployed one in my own account, poked at the live microVM it spun up, read the CLI source, and tried to figure out:

What is it, really?
How is it different from AgentCore Runtime and from Strands?
What’s running under the hood?
Does it support multi-agent patterns?
What are the honest use cases worth building around?

This post is the compressed answer.

The three layers (the confusion starts here)

The Bedrock AgentCore family has three overlapping offerings. If you don’t separate them, nothing makes sense.

	Strands Agents	AgentCore Runtime	AgentCore Harness
What it is	Open-source Python/TS SDK	Managed compute to host an agent	Fully managed agent service
You write	Python — tools, loop, prompt	Agent code in any framework	A JSON config
Who runs it	You, anywhere	AWS — microVM per session	AWS — same microVM + wired-in primitives
Framework support	—	Strands, LangChain, LangGraph, Google ADK, OpenAI Agents	Strands only (pre-wired)
Analogy	The library	EC2 for agents — BYO binary	SaaS agent — BYO prompt

Rule of thumb:

Don’t want to write agent code → Harness.
Already wrote agent code, need AWS to run it at scale → Runtime.
Want maximum control and portability → Strands directly.

Deploying one in ten minutes

Less hand-waving — here’s the actual sequence that stood up a working harness in my account.

# Install the CLI
npm install -g @aws/agentcore@preview

# Scaffold a project
mkdir myresearchagent && cd myresearchagent
agentcore create --name myresearchagent --model-provider bedrock

# Add a deploy target (one-time)
cat > agentcore/aws-targets.json <<'EOF'
[{"name":"default","account":"xxxx","region":"us-east-1"}]
EOF

# Ship it
agentcore deploy -y -v

Six CloudFormation resources later:

Resource	Detail
Harness	`arn:aws:bedrock-agentcore:...:harness/myresearchagent-2YmsTKvYKu`
Runtime (behind it)	`arn:aws:bedrock-agentcore:...:runtime/harness_myresearchagent-4xB9Dy6iHF`
Memory	SEMANTIC + USER_PREFERENCE + SUMMARIZATION + EPISODIC
IAM execution role	least-priv, auto-generated
CFN stack	`AgentCore-myresearchagent-default`

First invocation:

$ agentcore invoke --harness myresearchagent \
    --session-id "$(uuidgen)$(uuidgen)" \
    "In one sentence: what are you, which model, what year?"

Tool: shell          ← the agent auto-ran `date`
1025 in · 36 out · 1.7s

"I am Claude, an AI assistant made by Anthropic, running as Claude 3.5
 Sonnet, and the current year is 2026."

Look at that Tool: shell line. With zero config, the agent already had a real shell and a real filesystem. It ran date to avoid hallucinating the year. That behavior is only possible because a sandbox was there — and that sandbox is the actual product.

What’s actually running inside

I used agentcore invoke --exec to poke at the running container:

$ agentcore invoke --harness myresearchagent --exec "uname -a"
Linux localhost 6.1.158-15.288.amzn2023.aarch64 ...

$ agentcore invoke --harness myresearchagent --exec \
    "python3 -c 'import pkg_resources; [print(d) for d in pkg_resources.working_set]'"
bedrock-agentcore==1.4.8
strands-agents==1.35.0
strands-agents-tools==0.4.0
opentelemetry-instrumentation-...

That one result settles the biggest question:

The harness is Strands under the hood.

bedrock-agentcore is a thin AWS wrapper; strands-agents is the actual agent loop; strands-agents-tools supplies shell and file_operations as always-on defaults.

End-to-end request flow

YOUR SIDE
  agentcore invoke → boto3 client → HTTP (SigV4 / CUSTOM_JWT)
                    { harnessArn, sessionId, actorId, msg }
                               |
=========================== AWS managed ==============================
                               v
                   AgentCore control plane
                   (auth, routing, quota, sessions)
                         /              \
              existing  /                \  new
              session  /                  \ session
                      v                    v
            resume warm microVM    spin up Firecracker microVM
                              \   /
                               v
  Firecracker microVM (Amazon Linux 2023, Python 3.10, arm64)

    bedrock-agentcore (entrypoint)
      reads:  harness.json, system-prompt.md, skills/*/SKILL.md
      builds: Strands Agent(model, tools, skills, memory, truncation)

    Strands agent loop
        LLM → "call tool X" → dispatch
         ^                             |
         +------- observation ---------+

    Tools available to the loop:
      shell (VM)  ·  files (VM)  ·  browser (remote)
      code interp (remote)  ·  remote MCP

    Always-wired data planes:
      AgentCore Memory (4 strategies, namespaced per user)
      OpenTelemetry → CloudWatch / X-Ray

Tools and skills — the two extension points

Tools (5 types)

From the live schema:

Type	What it is	When you pick it
`agentcore_browser`	Managed Playwright	web scraping, login-walled sites
`agentcore_code_interpreter`	Sandboxed Python/Node	data analysis, safe code exec
`agentcore_gateway`	Your Gateway routing to Lambdas / APIs / MCP	unified tool surface
`remote_mcp`	External MCP server by URL	Slack, GitHub, Notion, your own
`inline_function`	Declare a schema, Gateway dispatches	small custom callables

Add one in a single command:

agentcore add tool --harness myresearchagent \
  --type agentcore_browser --name browser
agentcore deploy -y

And the default tools are always on, even with an empty tools: []:

shell — bash execution in the microVM
file_operations — view / str_replace / create / insert

I confirmed this by asking the live agent to list its own tools. It reported those two.

Skills — same format as Claude Skills

Skills in harness use the Claude Skills spec: markdown files with progressive disclosure. SKILL.md is always loaded; longer references are pulled in when the agent needs them.

app/myresearchagent/
  harness.json
  system-prompt.md
  skills/
    legal-contract-review/
      SKILL.md          ← always loaded (~200 words)
      playbook.md       ← loaded on demand
      templates.md      ← loaded on demand
    financial-modeling/
      SKILL.md

---
name: legal-contract-review
description: Use when the user asks to review, redline, or summarize a contract.
---

## When to use
- User uploads a contract PDF or DOC
- User mentions redlining, MSA, SOW, NDA

## Procedure
1. Extract party names, term, renewal, liability cap.
2. Flag unusual clauses against playbook.md.
3. Produce summary table + redline memo.

Wire it into harness.json:

{
  "skills": [
    "skills/legal-contract-review/SKILL.md",
    "skills/financial-modeling/SKILL.md"
  ]
}

agentcore deploy -y and the skill ships into the container via an AGENT_SKILLS env var.

Tools vs skills, one line: Tools are things the agent calls (verbs). Skills are procedures it reads to decide when and how to call them (playbooks).

The hidden value (the bit not in the marketing)

After digging in, here’s what the harness actually gives you that’s hard to replicate.

1. Per-session microVM with a real filesystem

Most agent frameworks are stateless. The harness gives each session a live Linux sandbox where the agent can write files, pip install things, run shell commands, and keep state for up to 8 hours. This is “Kiro / Claude Code / Codex as infra” — but isolated, billable, and in your AWS account.

This is the exact primitive behind every agentic CLI — Kiro, Claude Code, Codex — except those run on your laptop. The harness gives you that sandbox in the cloud, per user, isolated. Firecracker microVMs at per-session granularity is serious plumbing you cannot easily replicate.

2. Direct execution = real token savings

The shell tool runs in the microVM, not through another model call. For deterministic steps (ls, grep, curl, pandas) the agent pays no LLM tokens. Over a long session that’s a 30–60% cost reduction vs a naive ReAct loop.

3. Memory that would take weeks to build

Four strategies wired in — SEMANTIC, USER_PREFERENCE, SUMMARIZATION, EPISODIC — with /{actorId}/{sessionId} namespacing. That namespacing is the multi-tenant story for free.

4. Isolation boundary is the enterprise story

Per-session microVM means user A’s scratchpad cannot leak into user B’s. Regulated industries (health, fin, gov) pay premium for this property.

5. Config-as-audit-trail

A compliance reviewer sees a 12-line JSON, not 4000 lines of Python. That’s a real procurement unlock.

6. Model swap at invoke time

agentcore invoke --harness myresearchagent \
  --model-id "anthropic.claude-3-5-haiku-20241022" "..."

A/B test Claude vs Gemini vs Nova per request without redeploying.

The value prop, compressed

Without AgentCore Harness	With AgentCore Harness
pick a framework	declare `harness.json`
write agent loop	(Strands is pre-wired)
wire up tools	5 built-in types, add by CLI
build memory (vectors + TTL + namespacing + extraction)	4 strategies, namespaced, managed
build session sandbox	Firecracker microVM per session
build identity (IAM / JWT)	IAM + CUSTOM_JWT built in
build observability	OTel → CloudWatch automatic
build multi-tenant isolation	microVM = hard isolation by default
deploy Docker + Lambda + API GW	`agentcore deploy -y`
~4–8 weeks	~10 minutes

Multi-agent patterns — what works, what doesn’t

Everyone’s first question: “Can I do LangGraph / agent-as-tool / multi-agent with this?”

Honest answer: supervisor-with-sub-agents works great. Graphs with conditional edges and loops don’t — you drop down to Runtime for those.

Why multi-agent works at all in harness

The runtime supports four protocol modes:

ProtocolMode = 'HTTP' | 'MCP' | 'A2A' | 'AGUI'
                         |       |
                         |       +-> Google's Agent-to-Agent standard
                         +---------> every harness is reachable as MCP

So any harness can be called by any other harness — via MCP or A2A. That’s enough for supervisor topologies.

Pattern: Supervisor + workers (works)

Client
  |
  v
SUPERVISOR harness
  system: "delegate"
  tools:
   · remote_mcp → worker1  — MCP →  RESEARCHER harness
   · remote_mcp → worker2  — MCP →  DRAFTER harness
   · agentcore_gateway      —    →   REVIEWER Lambda

Each worker: own microVM, own memory, own skills.

Wiring is pure config:

agentcore add harness --name supervisor
agentcore add harness --name worker_research
agentcore add harness --name worker_drafter
agentcore deploy -y

# Get each worker's MCP URL from `agentcore status --json`
agentcore add tool --harness supervisor --type remote_mcp --name research \
  --url "<worker_research-mcp-url>"
agentcore add tool --harness supervisor --type remote_mcp --name draft \
  --url "<worker_drafter-mcp-url>"
agentcore deploy -y

Add a skill describing the delegation playbook, and you have a real supervisor-workers system without writing a line of Python.

Pattern: Peer-to-peer (A2A)

Agent1  <--A2A-->  Agent2  <--A2A-->  Agent3

Harnesses exposed on A2A protocol can negotiate peer-to-peer (customer-support sim, negotiation agents, debate panels).

What the harness cannot do

Graph / DAG orchestration — conditional edges, cycles, checkpointers. Use LangGraph or Strands Graph on Runtime.
Deterministic workflows with human-in-the-loop — use Step Functions.
Shared state without a store — each harness has its own memory; share via a referenced Memory ARN or an external store.

The decision tree

Shape	Use
One agent with tools?	Harness.
Supervisor + workers (≤ 5)?	Multiple harnesses wired via MCP / Gateway / A2A.
Peer negotiation?	Multiple harnesses on A2A.
True graph with branches+loops?	Runtime + LangGraph/Strands Graph.
Deterministic pipeline?	Step Functions.

The hybrid that real systems converge to

Client
   |
   v
Runtime (LangGraph or Strands Graph)
   state machine / DAG with branches, loops, retries
        |           |           |            |
        v           v           v            v
   call harness  call harness  call Lambda  call API
    (researcher)  (drafter)   (deterministic)

Runtime = the brain, harnesses = the specialists

Runtime runs the graph. Harnesses are the nodes that need isolation + memory + skills. Deterministic steps are plain Lambdas.

Is this basically an agentic CLI (Kiro / Claude Code / Codex)?

Pretty much. The isomorphism across the whole category is striking:

Kiro-cli / Claude Code / Codex (on your laptop)	AgentCore Harness (cloud)
single agent loop	single Strands loop
shell + file editor tools	shell + file_operations tools (same!)
your local FS	per-session microVM FS
you approve tool calls	IAM / policy approves
MCP for external tools	MCP for external tools
SKILL.md (Claude Skills spec)	SKILL.md (same format!)
spawn subagents via Agent / Task	spawn subagents via A2A / MCP / Gateway
runs model against a provider API	runs loop in microVM → Bedrock / OpenAI / Gemini

All three mainstream agentic CLIs — AWS’s Kiro-cli, Anthropic’s Claude Code, OpenAI’s Codex — converge on the same architecture: a single-agent loop with shell + file tools, MCP for extensions, markdown skills for procedures, subagents for delegation. The harness is that architecture packaged as a managed enterprise service: same mental model, same primitives, different operational surface.

If you’ve been productive in any of those CLIs, you’ll be productive in the harness. If you’ve built skills and MCP servers for one of them, they port over with minimal change.

Business use cases that actually earn their keep

Forget “build an AI agent” as a product. Here are the seven wedges where the harness specifically is the unlock, not generic LLMs.

1. Per-tenant AI Data Analyst (SaaS)

Upload CSV/DB → chat with an analyst. Each tenant gets an isolated microVM; the agent runs pandas directly in the VM. Compliance-friendly isolation OpenAI’s API can’t match.
Pricing: $200–$2K/mo/seat.

2. Regulated-Industry Research Copilot

Legal / medical / financial research agent with full audit trail. microVM isolation + CloudWatch traces + IAM + config-as-code = SOC2/HIPAA story pre-built. "We deploy in your AWS account" is a procurement love letter.
Pricing: $10K–$100K/yr/org.

3. Agentic Browser Automation (vertical Zapier)

“Reconcile my Stripe + QuickBooks every morning.” Agent logs in, navigates, files reports. Built-in browser tool + persistent session + credential vault. Competitors rebuilt this infra; you rent it.
Pricing: $50–$500/mo.

4. Support Agent With Cross-Session Memory

Customer support agent that remembers the last six months of tickets. Episodic + summarization memory, per-user actorId namespacing. Intercom/Zendesk AI is amnesiac by comparison.
Pricing: $0.10–$1/conversation or $X/seat.

5. Per-Employee Work Copilot

Every rep / CSM / analyst gets a long-lived agent that learns their style, remembers accounts, writes in their voice. User-preference memory + per-user isolation.
Pricing: $50–$200/seat/month.

6. Sandbox-as-a-Service for Untrusted Code

“Let your LLM run arbitrary generated code safely.” microVM is the sandbox. Competitors: E2B, Modal, Daytona. Harness = AWS-native alternative.
Pricing: per-session compute.

7. Vertical Artifact-Generating Agents

Contract review → redlined PDF. 10-K analyst → DCF memo. Claims → decision brief. Long sessions + filesystem = agent builds intermediate artifacts while it reasons.
Pricing: $500–$5K/seat — premium.

The meta-insight

The product isn’t “an agent.” The product is one of:

Isolation (regulated buyers pay for this)
Memory across time (retention = stickiness = LTV)
Persistent sandbox (agents that do, not just chat)
Config-as-audit (enterprise procurement unlock)

The harness gives you all four for free. Your job is to pick a vertical and wrap it in a UI + data connectors.

When NOT to use the harness

Be honest with yourself:

Stateless Q&A chatbot — you’re paying for a microVM you don’t use. Use Bedrock directly.
Deterministic pipelines — Step Functions + Lambda is 10× cheaper.
You need model/cloud portability — harness is AWS-locked.
You want to own the agent loop — Strands on Runtime gives you that; the harness hides it.
Voice agents with bidirectional streaming — that’s Runtime territory; the harness is request/response-shaped.
Consumer $10/mo product — the per-session microVM cost structure is wrong for that tier.

The playbook

If you’re evaluating this for a real project:

Deploy a hello-world harness (10 min). Understand the deploy loop.
Invoke with --exec to confirm what’s in the microVM. Trust by inspection.
Add one tool — pick agentcore_browser or a remote_mcp — and redeploy. Understand extension.
Write one skill — a real procedure, not a toy. Observe the agent picking it up.
Ask the disqualifier questions — does my topology need graphs? streaming voice? determinism? If yes to any, reach for Runtime.
Pick a vertical wedge — isolation, memory, sandbox, or config-as-audit. Build around the one your market actually pays for.

Closing

The harness is not “yet another agent framework.” It’s an opinionated bundle of the infrastructure you were going to build anyway — microVM, memory, identity, tools, observability — with Strands wired in as the loop and config as your only surface.

For the 60% of use cases that are “a single agent with tools and memory,” it’s the fastest path from zero to production I’ve seen on AWS.

For the complex 20% (graphs, loops, bespoke orchestration), it becomes a building block inside a larger Runtime-driven system.

For the remaining 20% (deterministic, stateless, portable), it’s the wrong tool — and that’s fine.

Pick the wedge. Ship the MVP. Let AWS carry the plumbing.

Posted 24th April 2026 at 5:19 pm

Akshay Parkhi's Weblog