AgentCore Harness, Inside Out
24th April 2026
What’s actually running when AWS says “declarative agents” — and when it’s the right tool.
The one-line summary
AgentCore Harness is an agentic CLI (Kiro / Claude Code / Codex) as a managed service — a single Strands agent running in a per-session Firecracker microVM, extended by config instead of code.
If that sentence makes sense to you, skip to the architecture section. If not, the rest of this post earns it.
Why I went looking
AWS launched a new thing in preview called the AgentCore Harness. The marketing says “declare your agent in a config file and AWS handles the rest.” That’s both a big claim and a vague one.
So I deployed one in my own account, poked at the live microVM it spun up, read the CLI source, and tried to figure out:
- What is it, really?
- How is it different from AgentCore Runtime and from Strands?
- What’s running under the hood?
- Does it support multi-agent patterns?
- What are the honest use cases worth building around?
This post is the compressed answer.
The three layers (the confusion starts here)
The Bedrock AgentCore family has three overlapping offerings. If you don’t separate them, nothing makes sense.
| Strands Agents | AgentCore Runtime | AgentCore Harness | |
|---|---|---|---|
| What it is | Open-source Python/TS SDK | Managed compute to host an agent | Fully managed agent service |
| You write | Python — tools, loop, prompt | Agent code in any framework | A JSON config |
| Who runs it | You, anywhere | AWS — microVM per session | AWS — same microVM + wired-in primitives |
| Framework support | — | Strands, LangChain, LangGraph, Google ADK, OpenAI Agents | Strands only (pre-wired) |
| Analogy | The library | EC2 for agents — BYO binary | SaaS agent — BYO prompt |
Rule of thumb:
- Don’t want to write agent code → Harness.
- Already wrote agent code, need AWS to run it at scale → Runtime.
- Want maximum control and portability → Strands directly.
Deploying one in ten minutes
Less hand-waving — here’s the actual sequence that stood up a working harness in my account.
# Install the CLI
npm install -g @aws/agentcore@preview
# Scaffold a project
mkdir myresearchagent && cd myresearchagent
agentcore create --name myresearchagent --model-provider bedrock
# Add a deploy target (one-time)
cat > agentcore/aws-targets.json <<'EOF'
[{"name":"default","account":"xxxx","region":"us-east-1"}]
EOF
# Ship it
agentcore deploy -y -v
Six CloudFormation resources later:
| Resource | Detail |
|---|---|
| Harness | arn:aws:bedrock-agentcore:...:harness/myresearchagent-2YmsTKvYKu |
| Runtime (behind it) | arn:aws:bedrock-agentcore:...:runtime/harness_myresearchagent-4xB9Dy6iHF |
| Memory | SEMANTIC + USER_PREFERENCE + SUMMARIZATION + EPISODIC |
| IAM execution role | least-priv, auto-generated |
| CFN stack | AgentCore-myresearchagent-default |
First invocation:
$ agentcore invoke --harness myresearchagent \
--session-id "$(uuidgen)$(uuidgen)" \
"In one sentence: what are you, which model, what year?"
Tool: shell ← the agent auto-ran `date`
1025 in · 36 out · 1.7s
"I am Claude, an AI assistant made by Anthropic, running as Claude 3.5
Sonnet, and the current year is 2026."
Look at that Tool: shell line. With zero config, the agent already had a real shell and a real filesystem. It ran date to avoid hallucinating the year. That behavior is only possible because a sandbox was there — and that sandbox is the actual product.
What’s actually running inside
I used agentcore invoke --exec to poke at the running container:
$ agentcore invoke --harness myresearchagent --exec "uname -a"
Linux localhost 6.1.158-15.288.amzn2023.aarch64 ...
$ agentcore invoke --harness myresearchagent --exec \
"python3 -c 'import pkg_resources; [print(d) for d in pkg_resources.working_set]'"
bedrock-agentcore==1.4.8
strands-agents==1.35.0
strands-agents-tools==0.4.0
opentelemetry-instrumentation-...
That one result settles the biggest question:
The harness is Strands under the hood.
bedrock-agentcore is a thin AWS wrapper; strands-agents is the actual agent loop; strands-agents-tools supplies shell and file_operations as always-on defaults.
End-to-end request flow
YOUR SIDE
agentcore invoke → boto3 client → HTTP (SigV4 / CUSTOM_JWT)
{ harnessArn, sessionId, actorId, msg }
|
=========================== AWS managed ==============================
v
AgentCore control plane
(auth, routing, quota, sessions)
/ \
existing / \ new
session / \ session
v v
resume warm microVM spin up Firecracker microVM
\ /
v
Firecracker microVM (Amazon Linux 2023, Python 3.10, arm64)
bedrock-agentcore (entrypoint)
reads: harness.json, system-prompt.md, skills/*/SKILL.md
builds: Strands Agent(model, tools, skills, memory, truncation)
Strands agent loop
LLM → "call tool X" → dispatch
^ |
+------- observation ---------+
Tools available to the loop:
shell (VM) · files (VM) · browser (remote)
code interp (remote) · remote MCP
Always-wired data planes:
AgentCore Memory (4 strategies, namespaced per user)
OpenTelemetry → CloudWatch / X-Ray
Tools and skills — the two extension points
Tools (5 types)
From the live schema:
| Type | What it is | When you pick it |
|---|---|---|
agentcore_browser | Managed Playwright | web scraping, login-walled sites |
agentcore_code_interpreter | Sandboxed Python/Node | data analysis, safe code exec |
agentcore_gateway | Your Gateway routing to Lambdas / APIs / MCP | unified tool surface |
remote_mcp | External MCP server by URL | Slack, GitHub, Notion, your own |
inline_function | Declare a schema, Gateway dispatches | small custom callables |
Add one in a single command:
agentcore add tool --harness myresearchagent \
--type agentcore_browser --name browser
agentcore deploy -y
And the default tools are always on, even with an empty tools: []:
shell— bash execution in the microVMfile_operations— view / str_replace / create / insert
I confirmed this by asking the live agent to list its own tools. It reported those two.
Skills — same format as Claude Skills
Skills in harness use the Claude Skills spec: markdown files with progressive disclosure. SKILL.md is always loaded; longer references are pulled in when the agent needs them.
app/myresearchagent/
harness.json
system-prompt.md
skills/
legal-contract-review/
SKILL.md ← always loaded (~200 words)
playbook.md ← loaded on demand
templates.md ← loaded on demand
financial-modeling/
SKILL.md
---
name: legal-contract-review
description: Use when the user asks to review, redline, or summarize a contract.
---
## When to use
- User uploads a contract PDF or DOC
- User mentions redlining, MSA, SOW, NDA
## Procedure
1. Extract party names, term, renewal, liability cap.
2. Flag unusual clauses against playbook.md.
3. Produce summary table + redline memo.
Wire it into harness.json:
{
"skills": [
"skills/legal-contract-review/SKILL.md",
"skills/financial-modeling/SKILL.md"
]
}
agentcore deploy -y and the skill ships into the container via an AGENT_SKILLS env var.
Tools vs skills, one line: Tools are things the agent calls (verbs). Skills are procedures it reads to decide when and how to call them (playbooks).
The hidden value (the bit not in the marketing)
After digging in, here’s what the harness actually gives you that’s hard to replicate.
1. Per-session microVM with a real filesystem
Most agent frameworks are stateless. The harness gives each session a live Linux sandbox where the agent can write files, pip install things, run shell commands, and keep state for up to 8 hours. This is “Kiro / Claude Code / Codex as infra” — but isolated, billable, and in your AWS account.
This is the exact primitive behind every agentic CLI — Kiro, Claude Code, Codex — except those run on your laptop. The harness gives you that sandbox in the cloud, per user, isolated. Firecracker microVMs at per-session granularity is serious plumbing you cannot easily replicate.
2. Direct execution = real token savings
The shell tool runs in the microVM, not through another model call. For deterministic steps (ls, grep, curl, pandas) the agent pays no LLM tokens. Over a long session that’s a 30–60% cost reduction vs a naive ReAct loop.
3. Memory that would take weeks to build
Four strategies wired in — SEMANTIC, USER_PREFERENCE, SUMMARIZATION, EPISODIC — with /{actorId}/{sessionId} namespacing. That namespacing is the multi-tenant story for free.
4. Isolation boundary is the enterprise story
Per-session microVM means user A’s scratchpad cannot leak into user B’s. Regulated industries (health, fin, gov) pay premium for this property.
5. Config-as-audit-trail
A compliance reviewer sees a 12-line JSON, not 4000 lines of Python. That’s a real procurement unlock.
6. Model swap at invoke time
agentcore invoke --harness myresearchagent \
--model-id "anthropic.claude-3-5-haiku-20241022" "..."
A/B test Claude vs Gemini vs Nova per request without redeploying.
The value prop, compressed
| Without AgentCore Harness | With AgentCore Harness |
|---|---|
| pick a framework | declare harness.json |
| write agent loop | (Strands is pre-wired) |
| wire up tools | 5 built-in types, add by CLI |
| build memory (vectors + TTL + namespacing + extraction) | 4 strategies, namespaced, managed |
| build session sandbox | Firecracker microVM per session |
| build identity (IAM / JWT) | IAM + CUSTOM_JWT built in |
| build observability | OTel → CloudWatch automatic |
| build multi-tenant isolation | microVM = hard isolation by default |
| deploy Docker + Lambda + API GW | agentcore deploy -y |
| ~4–8 weeks | ~10 minutes |
Multi-agent patterns — what works, what doesn’t
Everyone’s first question: “Can I do LangGraph / agent-as-tool / multi-agent with this?”
Honest answer: supervisor-with-sub-agents works great. Graphs with conditional edges and loops don’t — you drop down to Runtime for those.
Why multi-agent works at all in harness
The runtime supports four protocol modes:
ProtocolMode = 'HTTP' | 'MCP' | 'A2A' | 'AGUI'
| |
| +-> Google's Agent-to-Agent standard
+---------> every harness is reachable as MCP
So any harness can be called by any other harness — via MCP or A2A. That’s enough for supervisor topologies.
Pattern: Supervisor + workers (works)
Client
|
v
SUPERVISOR harness
system: "delegate"
tools:
· remote_mcp → worker1 — MCP → RESEARCHER harness
· remote_mcp → worker2 — MCP → DRAFTER harness
· agentcore_gateway — → REVIEWER Lambda
Each worker: own microVM, own memory, own skills.
Wiring is pure config:
agentcore add harness --name supervisor
agentcore add harness --name worker_research
agentcore add harness --name worker_drafter
agentcore deploy -y
# Get each worker's MCP URL from `agentcore status --json`
agentcore add tool --harness supervisor --type remote_mcp --name research \
--url "<worker_research-mcp-url>"
agentcore add tool --harness supervisor --type remote_mcp --name draft \
--url "<worker_drafter-mcp-url>"
agentcore deploy -y
Add a skill describing the delegation playbook, and you have a real supervisor-workers system without writing a line of Python.
Pattern: Peer-to-peer (A2A)
Agent1 <--A2A--> Agent2 <--A2A--> Agent3
Harnesses exposed on A2A protocol can negotiate peer-to-peer (customer-support sim, negotiation agents, debate panels).
What the harness cannot do
- Graph / DAG orchestration — conditional edges, cycles, checkpointers. Use LangGraph or Strands Graph on Runtime.
- Deterministic workflows with human-in-the-loop — use Step Functions.
- Shared state without a store — each harness has its own memory; share via a referenced Memory ARN or an external store.
The decision tree
| Shape | Use |
|---|---|
| One agent with tools? | Harness. |
| Supervisor + workers (≤ 5)? | Multiple harnesses wired via MCP / Gateway / A2A. |
| Peer negotiation? | Multiple harnesses on A2A. |
| True graph with branches+loops? | Runtime + LangGraph/Strands Graph. |
| Deterministic pipeline? | Step Functions. |
The hybrid that real systems converge to
Client
|
v
Runtime (LangGraph or Strands Graph)
state machine / DAG with branches, loops, retries
| | | |
v v v v
call harness call harness call Lambda call API
(researcher) (drafter) (deterministic)
Runtime = the brain, harnesses = the specialists
Runtime runs the graph. Harnesses are the nodes that need isolation + memory + skills. Deterministic steps are plain Lambdas.
Is this basically an agentic CLI (Kiro / Claude Code / Codex)?
Pretty much. The isomorphism across the whole category is striking:
| Kiro-cli / Claude Code / Codex (on your laptop) | AgentCore Harness (cloud) |
|---|---|
| single agent loop | single Strands loop |
| shell + file editor tools | shell + file_operations tools (same!) |
| your local FS | per-session microVM FS |
| you approve tool calls | IAM / policy approves |
| MCP for external tools | MCP for external tools |
| SKILL.md (Claude Skills spec) | SKILL.md (same format!) |
| spawn subagents via Agent / Task | spawn subagents via A2A / MCP / Gateway |
| runs model against a provider API | runs loop in microVM → Bedrock / OpenAI / Gemini |
All three mainstream agentic CLIs — AWS’s Kiro-cli, Anthropic’s Claude Code, OpenAI’s Codex — converge on the same architecture: a single-agent loop with shell + file tools, MCP for extensions, markdown skills for procedures, subagents for delegation. The harness is that architecture packaged as a managed enterprise service: same mental model, same primitives, different operational surface.
If you’ve been productive in any of those CLIs, you’ll be productive in the harness. If you’ve built skills and MCP servers for one of them, they port over with minimal change.
Business use cases that actually earn their keep
Forget “build an AI agent” as a product. Here are the seven wedges where the harness specifically is the unlock, not generic LLMs.
1. Per-tenant AI Data Analyst (SaaS)
Upload CSV/DB → chat with an analyst. Each tenant gets an isolated microVM; the agent runs pandas directly in the VM. Compliance-friendly isolation OpenAI’s API can’t match.
Pricing: $200–$2K/mo/seat.
2. Regulated-Industry Research Copilot
Legal / medical / financial research agent with full audit trail. microVM isolation + CloudWatch traces + IAM + config-as-code = SOC2/HIPAA story pre-built. "We deploy in your AWS account" is a procurement love letter.
Pricing: $10K–$100K/yr/org.
3. Agentic Browser Automation (vertical Zapier)
“Reconcile my Stripe + QuickBooks every morning.” Agent logs in, navigates, files reports. Built-in browser tool + persistent session + credential vault. Competitors rebuilt this infra; you rent it.
Pricing: $50–$500/mo.
4. Support Agent With Cross-Session Memory
Customer support agent that remembers the last six months of tickets. Episodic + summarization memory, per-user actorId namespacing. Intercom/Zendesk AI is amnesiac by comparison.
Pricing: $0.10–$1/conversation or $X/seat.
5. Per-Employee Work Copilot
Every rep / CSM / analyst gets a long-lived agent that learns their style, remembers accounts, writes in their voice. User-preference memory + per-user isolation.
Pricing: $50–$200/seat/month.
6. Sandbox-as-a-Service for Untrusted Code
“Let your LLM run arbitrary generated code safely.” microVM is the sandbox. Competitors: E2B, Modal, Daytona. Harness = AWS-native alternative.
Pricing: per-session compute.
7. Vertical Artifact-Generating Agents
Contract review → redlined PDF. 10-K analyst → DCF memo. Claims → decision brief. Long sessions + filesystem = agent builds intermediate artifacts while it reasons.
Pricing: $500–$5K/seat — premium.
The meta-insight
The product isn’t “an agent.” The product is one of:
- Isolation (regulated buyers pay for this)
- Memory across time (retention = stickiness = LTV)
- Persistent sandbox (agents that do, not just chat)
- Config-as-audit (enterprise procurement unlock)
The harness gives you all four for free. Your job is to pick a vertical and wrap it in a UI + data connectors.
When NOT to use the harness
Be honest with yourself:
- Stateless Q&A chatbot — you’re paying for a microVM you don’t use. Use Bedrock directly.
- Deterministic pipelines — Step Functions + Lambda is 10× cheaper.
- You need model/cloud portability — harness is AWS-locked.
- You want to own the agent loop — Strands on Runtime gives you that; the harness hides it.
- Voice agents with bidirectional streaming — that’s Runtime territory; the harness is request/response-shaped.
- Consumer $10/mo product — the per-session microVM cost structure is wrong for that tier.
The playbook
If you’re evaluating this for a real project:
- Deploy a hello-world harness (10 min). Understand the deploy loop.
- Invoke with
--execto confirm what’s in the microVM. Trust by inspection. - Add one tool — pick
agentcore_browseror aremote_mcp— and redeploy. Understand extension. - Write one skill — a real procedure, not a toy. Observe the agent picking it up.
- Ask the disqualifier questions — does my topology need graphs? streaming voice? determinism? If yes to any, reach for Runtime.
- Pick a vertical wedge — isolation, memory, sandbox, or config-as-audit. Build around the one your market actually pays for.
Closing
The harness is not “yet another agent framework.” It’s an opinionated bundle of the infrastructure you were going to build anyway — microVM, memory, identity, tools, observability — with Strands wired in as the loop and config as your only surface.
For the 60% of use cases that are “a single agent with tools and memory,” it’s the fastest path from zero to production I’ve seen on AWS.
For the complex 20% (graphs, loops, bespoke orchestration), it becomes a building block inside a larger Runtime-driven system.
For the remaining 20% (deterministic, stateless, portable), it’s the wrong tool — and that’s fine.
Pick the wedge. Ship the MVP. Let AWS carry the plumbing.
More recent articles
- MCP Apps Explained: How AI Agent Shows Live Widgets Inside the Chat - 23rd April 2026
- AgentCore Registry: The Missing Yellow Pages for AI Agents - 14th April 2026