7 Mental Models for Building Agent Skills (From Anthropic’s Internal Playbook)
18th March 2026
Anthropic just published their internal playbook for Claude Code Skills — based on hundreds of skills in active use. Buried inside the practical advice are deep mental models for building better agents. Here’s what they’re really telling you.
Mental Model #1: Skills Are Context Engineering, Not Prompts
The biggest misconception: skills are “just markdown files.” They’re not. A skill is a folder — scripts, assets, data, references, config files — that the agent discovers, explores, and manipulates at runtime.
This is progressive disclosure applied to AI. Instead of cramming everything into the system prompt, you structure information across files and let the agent pull what it needs, when it needs it.
BAD: One giant prompt with everything
GOOD: A folder the agent navigates
my-skill/
skill.md <-- entry point, high-level instructions
references/
api.md <-- detailed function signatures
gotchas.md <-- failure patterns to avoid
scripts/
fetch_data.py <-- reusable helper functions
verify.sh <-- verification script
assets/
template.md <-- output template to copy
config.json <-- user-specific settings
The insight: the file system IS the context window management strategy. Every file you put in the skill folder is a piece of context the agent can load on demand instead of carrying permanently.
Mental Model #2: Don’t Tell Claude What It Already Knows
Claude knows a lot about coding. Your skill should push it out of its default thinking, not repeat what it already knows. The highest-signal content is always the Gotchas section — common failure points that Claude hits when doing this specific task in your specific codebase.
This is the “bitter lesson” applied to skills: don’t over-engineer instructions for things the model handles well. Focus your engineering budget on the delta — what’s unique to your context.
LOW VALUE: "When writing Python, use descriptive variable names and follow PEP 8 conventions." HIGH VALUE: "GOTCHA: Our billing API returns cents, not dollars. Every response must be divided by 100 before display. Claude gets this wrong 80% of the time."
Build your gotchas section from real failures. Update it every time Claude makes a new mistake. This is a living document that learns from production.
Mental Model #3: Give Code, Not Instructions
The most powerful thing you can give an agent is code it can compose. Scripts and libraries let the agent spend its turns on deciding what to do next rather than reconstructing boilerplate from scratch.
WEAK: "To fetch user events, query the events table
joining on user_id with a date filter..."
STRONG: Include a helpers/ folder with:
helpers/fetch_events.py
helpers/fetch_cohort.py
helpers/compare_retention.py
The agent composes these into novel analysis scripts
on the fly. You write the primitives once.
It writes the composition every time.
This maps directly to the “Bash is all you need” insight: give agents generic, composable primitives instead of rigid, specialized tools. The agent’s strength is composition and reasoning. Your strength is providing reliable building blocks.
Mental Model #4: Skills Need Memory
Stateless skills repeat themselves. Stateful skills get smarter. Store data within or alongside your skill — an append-only log, a JSON file, a SQLite database — so the agent can read its own history.
standup-post skill: | |-- Reads standups.log (its own previous posts) |-- Sees what it posted yesterday |-- Computes the delta (what changed since then) |-- Writes today's standup |-- Appends to standups.log | Next time: even better context
Use ${CLAUDE_PLUGIN_DATA} for stable storage that survives skill upgrades. The skill directory itself may get wiped on update.
Mental Model #5: The Description Is a Trigger, Not a Summary
When Claude Code starts a session, it scans every skill’s description to decide: “is there a skill for this request?” The description field is not documentation for humans. It’s a trigger pattern for the model.
BAD DESCRIPTION: "A skill for working with our billing system" GOOD DESCRIPTION: "Use when: code imports billing-lib, user asks about invoices/charges/subscriptions, or changes touch the payments/ directory. DO NOT use for: general API questions or auth-related billing."
Write descriptions like you’re writing routing rules. Tell the model exactly when to activate and when NOT to activate.
Mental Model #6: Don’t Railroad — Inform and Flex
Skills are reusable across many contexts. If your instructions are too rigid, they’ll be wrong half the time. Give Claude the information it needs but let it adapt to the situation.
RAILROADING: "Always run tests in this exact order: 1. Unit tests 2. Integration 3. E2E Fail immediately on any error." FLEXIBLE: "Test priority: unit > integration > E2E. Run what's relevant to the change. If unit tests cover the change fully, skip heavier tests unless user asks."
The agent is better at adapting to context than you are at predicting every context. Trust the reasoning, constrain the boundaries.
Mental Model #7: On-Demand Hooks Are Surgical Guardrails
Skills can register hooks that activate only when the skill is called and last for the duration of the session. This lets you build context-dependent safety.
/careful
Blocks: rm -rf, DROP TABLE, force-push, kubectl delete
When: You're touching production
Why: Having this always-on would drive you insane
/freeze
Blocks: Edit/Write outside a specific directory
When: Debugging — you want to add logs without
accidentally "fixing" unrelated code
These are permission modes you toggle based on risk. They don’t exist in the system prompt permanently — they appear when the situation demands them.
The 9 Skill Categories
Anthropic found their hundreds of skills cluster into 9 types. Use this as an audit checklist — which categories are you missing?
| # | Category | What It Does | Example |
|---|---|---|---|
| 1 | Library & API Reference | How to correctly use internal/external libraries | billing-lib gotchas, CLI subcommands |
| 2 | Product Verification | Test that code actually works (Playwright, tmux) | signup-flow-driver, checkout-verifier |
| 3 | Data Fetching & Analysis | Connect to data/monitoring stacks | funnel-query, grafana dashboard lookup |
| 4 | Business Process | Automate repetitive workflows | standup-post, weekly-recap |
| 5 | Code Scaffolding | Generate framework boilerplate | new-migration, create-app |
| 6 | Code Quality & Review | Enforce standards, review code | adversarial-review, testing-practices |
| 7 | CI/CD & Deployment | Fetch, push, deploy code | babysit-pr, deploy-service |
| 8 | Runbooks | Symptom → investigation → finding | oncall-runner, log-correlator |
| 9 | Infrastructure Ops | Maintenance with guardrails | orphan cleanup, cost investigation |
The Distribution Model
Two paths for sharing skills:
- Check into repo (
.claude/skills/) — good for small teams, few repos. But every checked-in skill adds to model context. - Plugin marketplace — good at scale. Users choose which skills to install. Organic discovery: sandbox → traction → marketplace PR.
Warning from Anthropic: it’s easy to create bad or redundant skills. Curation before release is essential. Track skill usage with PreToolUse hooks to find what’s popular and what’s undertriggering.
The Bottom Line
A skill is not a prompt. A skill is a workspace the agent walks into. The folder structure is your context engineering. The gotchas section is your highest-ROI writing. The scripts are your composable primitives. The description is your routing rule. The memory is what makes it get smarter. Start with a few lines and one gotcha. Add to it every time Claude fails. That's the whole process.
More recent articles
- OpenUSD: Advanced Patterns and Common Gotchas. - 28th March 2026
- OpenUSD Mastery: From Composition to Pipeline — A SO-101 Arm Journey - 25th March 2026
- Learning OpenUSD — From Curious Questions to Real Understanding - 19th March 2026