7 Mental Models for Building Agent Skills (From Anthropic’s Internal Playbook)

18th March 2026

Anthropic just published their internal playbook for Claude Code Skills — based on hundreds of skills in active use. Buried inside the practical advice are deep mental models for building better agents. Here’s what they’re really telling you.

Mental Model #1: Skills Are Context Engineering, Not Prompts

The biggest misconception: skills are “just markdown files.” They’re not. A skill is a folder — scripts, assets, data, references, config files — that the agent discovers, explores, and manipulates at runtime.

This is progressive disclosure applied to AI. Instead of cramming everything into the system prompt, you structure information across files and let the agent pull what it needs, when it needs it.

BAD:  One giant prompt with everything
GOOD: A folder the agent navigates

my-skill/
  skill.md            <-- entry point, high-level instructions
  references/
    api.md            <-- detailed function signatures
    gotchas.md        <-- failure patterns to avoid
  scripts/
    fetch_data.py     <-- reusable helper functions
    verify.sh         <-- verification script
  assets/
    template.md       <-- output template to copy
  config.json         <-- user-specific settings

The insight: the file system IS the context window management strategy. Every file you put in the skill folder is a piece of context the agent can load on demand instead of carrying permanently.

Mental Model #2: Don’t Tell Claude What It Already Knows

Claude knows a lot about coding. Your skill should push it out of its default thinking, not repeat what it already knows. The highest-signal content is always the Gotchas section — common failure points that Claude hits when doing this specific task in your specific codebase.

This is the “bitter lesson” applied to skills: don’t over-engineer instructions for things the model handles well. Focus your engineering budget on the delta — what’s unique to your context.

LOW VALUE:
  "When writing Python, use descriptive variable names
   and follow PEP 8 conventions."

HIGH VALUE:
  "GOTCHA: Our billing API returns cents, not dollars.
   Every response must be divided by 100 before display.
   Claude gets this wrong 80% of the time."

Build your gotchas section from real failures. Update it every time Claude makes a new mistake. This is a living document that learns from production.

Mental Model #3: Give Code, Not Instructions

The most powerful thing you can give an agent is code it can compose. Scripts and libraries let the agent spend its turns on deciding what to do next rather than reconstructing boilerplate from scratch.

WEAK: "To fetch user events, query the events table
       joining on user_id with a date filter..."

STRONG: Include a helpers/ folder with:

  helpers/fetch_events.py
  helpers/fetch_cohort.py
  helpers/compare_retention.py

The agent composes these into novel analysis scripts
on the fly. You write the primitives once.
It writes the composition every time.

This maps directly to the “Bash is all you need” insight: give agents generic, composable primitives instead of rigid, specialized tools. The agent’s strength is composition and reasoning. Your strength is providing reliable building blocks.

Mental Model #4: Skills Need Memory

Stateless skills repeat themselves. Stateful skills get smarter. Store data within or alongside your skill — an append-only log, a JSON file, a SQLite database — so the agent can read its own history.

standup-post skill:
  |
  |-- Reads standups.log (its own previous posts)
  |-- Sees what it posted yesterday
  |-- Computes the delta (what changed since then)
  |-- Writes today's standup
  |-- Appends to standups.log
  |
  Next time: even better context

Use ${CLAUDE_PLUGIN_DATA} for stable storage that survives skill upgrades. The skill directory itself may get wiped on update.

Mental Model #5: The Description Is a Trigger, Not a Summary

When Claude Code starts a session, it scans every skill’s description to decide: “is there a skill for this request?” The description field is not documentation for humans. It’s a trigger pattern for the model.

BAD DESCRIPTION:
  "A skill for working with our billing system"

GOOD DESCRIPTION:
  "Use when: code imports billing-lib, user asks about
   invoices/charges/subscriptions, or changes touch
   the payments/ directory. DO NOT use for: general
   API questions or auth-related billing."

Write descriptions like you’re writing routing rules. Tell the model exactly when to activate and when NOT to activate.

Mental Model #6: Don’t Railroad — Inform and Flex

Skills are reusable across many contexts. If your instructions are too rigid, they’ll be wrong half the time. Give Claude the information it needs but let it adapt to the situation.

RAILROADING:
  "Always run tests in this exact order:
   1. Unit tests  2. Integration  3. E2E
   Fail immediately on any error."

FLEXIBLE:
  "Test priority: unit > integration > E2E.
   Run what's relevant to the change.
   If unit tests cover the change fully,
   skip heavier tests unless user asks."

The agent is better at adapting to context than you are at predicting every context. Trust the reasoning, constrain the boundaries.

Mental Model #7: On-Demand Hooks Are Surgical Guardrails

Skills can register hooks that activate only when the skill is called and last for the duration of the session. This lets you build context-dependent safety.

/careful
  Blocks: rm -rf, DROP TABLE, force-push, kubectl delete
  When: You're touching production
  Why: Having this always-on would drive you insane

/freeze
  Blocks: Edit/Write outside a specific directory
  When: Debugging — you want to add logs without
        accidentally "fixing" unrelated code

These are permission modes you toggle based on risk. They don’t exist in the system prompt permanently — they appear when the situation demands them.

The 9 Skill Categories

Anthropic found their hundreds of skills cluster into 9 types. Use this as an audit checklist — which categories are you missing?

#	Category	What It Does	Example
1	Library & API Reference	How to correctly use internal/external libraries	billing-lib gotchas, CLI subcommands
2	Product Verification	Test that code actually works (Playwright, tmux)	signup-flow-driver, checkout-verifier
3	Data Fetching & Analysis	Connect to data/monitoring stacks	funnel-query, grafana dashboard lookup
4	Business Process	Automate repetitive workflows	standup-post, weekly-recap
5	Code Scaffolding	Generate framework boilerplate	new-migration, create-app
6	Code Quality & Review	Enforce standards, review code	adversarial-review, testing-practices
7	CI/CD & Deployment	Fetch, push, deploy code	babysit-pr, deploy-service
8	Runbooks	Symptom → investigation → finding	oncall-runner, log-correlator
9	Infrastructure Ops	Maintenance with guardrails	orphan cleanup, cost investigation

The Distribution Model

Two paths for sharing skills:

Check into repo (.claude/skills/) — good for small teams, few repos. But every checked-in skill adds to model context.
Plugin marketplace — good at scale. Users choose which skills to install. Organic discovery: sandbox → traction → marketplace PR.

Warning from Anthropic: it’s easy to create bad or redundant skills. Curation before release is essential. Track skill usage with PreToolUse hooks to find what’s popular and what’s undertriggering.

The Bottom Line

A skill is not a prompt.
A skill is a workspace the agent walks into.

The folder structure is your context engineering.
The gotchas section is your highest-ROI writing.
The scripts are your composable primitives.
The description is your routing rule.
The memory is what makes it get smarter.

Start with a few lines and one gotcha.
Add to it every time Claude fails.
That's the whole process.

Posted 18th March 2026 at 1:41 pm · Subscribe to my newsletter

Akshay Parkhi's Weblog