Akshay Parkhi's Blog

March 11, 2026

AgentCore Runtime vs Lambda — Scaling, Warm Pools, and Why Fixed 8 GB Boxes Exist

Amazon Bedrock AgentCore Runtime uses Firecracker microVMs to run AI agent tools in isolated environments. But if you’ve used Lambda, it sounds familiar — serverless, auto-scaling, pay-per-use. So why does AgentCore exist? Here’s the complete picture: how AgentCore actually scales, what it can and can’t do, and when you’d pick it over Lambda or ECS.

[... 1,534 words]

6:02 pm / ai-agents

How Firecracker MicroVMs Power AgentCore Runtime — From 125ms Boot to Auto-Scaling AI Agents

When AWS needed to run Lambda functions — millions of them, simultaneously, for strangers on the internet — containers weren’t isolated enough and full VMs were too slow. So they built Firecracker: a microVM that boots in ~125 milliseconds with ~5 MB of memory overhead, gives you hardware-level isolation, and lets you pack thousands of them onto a single server. Now Amazon Bedrock AgentCore Runtime uses the same technology to run AI agent tools. Here’s exactly how it all works.

[... 2,548 words]

4:49 pm / ai-agents

How Everything Connects — NVIDIA’s Cosmos Pipeline from Simulation to Real-World Robots

Training robots and autonomous vehicles is fundamentally dangerous and expensive. You can’t crash 1,000 cars to teach collision avoidance, and you can’t let a robot fall off cliffs to learn edge detection. NVIDIA’s solution is an end-to-end pipeline that generates synthetic data so realistic that AI models trained on it transfer directly to the real world. Here’s how every piece connects.

[... 1,776 words]

9:06 am / physical-ai

March 10, 2026

The Complete Beginner’s Guide to Robotics — From Your First Camera to Foundation Models

This is everything you need to go from zero to building, sensing, and coding robots at home. We’ll start with what hardware to buy, walk through how Tesla and Waymo actually see the road, write real depth estimation and PID control code you can run with just a camera, and end with how foundation models and RAG are transforming what robots can do.

[... 4,183 words]

7:36 pm / ai-agents

March 9, 2026

From Webcam to Robot Brain: How Vision-Language Models and Vision-Language-Action Models Actually Work

I built a webcam app that sends live frames to Claude and GPT-4o for real-time scene understanding. Along the way, I discovered how fundamentally different this is from what robots like OpenVLA do with the same camera input. Here’s the full pipeline — from photons hitting your webcam sensor to tokens coming back from the cloud.

[... 1,756 words]

6:46 pm / ai-agents

10 Hidden Concepts in Strands SDK Hooks That You Won’t See by Reading the Code

The Strands SDK hook system looks simple on the surface — register a callback, receive an event. But there are 10 hidden concepts buried in the design that you’ll never see by just reading the code. Here’s what’s actually happening under the hood.

[... 1,390 words]

3:36 pm / ai-agents

Top Claude Code Skills: What 20 YouTube Videos and 2.3M Views Agree On

I researched 20 YouTube videos on Claude Code skills, fed them all into Google NotebookLM, and asked it to synthesize the top skills across every source. Here’s what came back — ranked by how often they were mentioned and how impactful creators found them.

[... 848 words]

2:19 pm / ai-agents

Build a Research Engine in Claude Code: YouTube Search → NotebookLM → Synthesized Insights in 5 Minutes

I built a research engine inside Claude Code that searches YouTube, feeds results into Google NotebookLM, and lets me query across all the sources — all without leaving the terminal. Here’s exactly how it works and how to set it up yourself.

[... 1,151 words]

1:37 pm / ai-agents

playwright-cli: How to Give Your AI Coding Assistant a Real Browser

If you use Claude Code (or any AI coding assistant), there’s a tool that makes browser automation trivially easy: playwright-cli. It’s a command-line wrapper around Microsoft’s Playwright that lets you control a real browser from your terminal — navigate pages, click buttons, fill forms, take screenshots, and scrape content. Here’s how to set it up and why it’s genuinely useful.

[... 1,104 words]

12:59 pm / ai-agents

Context Engineering for AI Agents: 6 Techniques from Claude Code, Manus, and Devin

After studying how production AI agents like Claude Code, Manus, and Devin actually work under the hood, the single most important concept isn’t prompt engineering — it’s context engineering. The art of controlling exactly what goes into the model’s context window, and what stays out.

[... 2,082 words]

12:39 am / ai-agents

March 8, 2026

Multi-Agent Is Two Problems: Why TypeScript and Python Each Win Half"

Will multi-agent be done better in JavaScript than Python? No. They solve different types of multi-agent. This is the most important distinction nobody is making clearly enough.

[... 1,364 words]

3:15 pm / ai-agents

Why Every Coding Agent Is TypeScript (And Every ML Framework Is Python)

Every major coding agent — Claude Code, OpenCode, Pi, Amp — is built in TypeScript. Not Python. This isn’t a coincidence. It’s architecture. And to understand why, you need to go back to the origin stories.

[... 2,983 words]

2:59 pm / ai-agents

Why .md Files Are the Agent: How ClawdBot Turns Markdown Into a Living Assistant

You’d expect an AI agent to be built from code — Python classes, orchestration frameworks, complex pipelines. But ClawdBot (OpenClaw) does something different. Its agent is mostly .md files sitting in a folder. Plain text. Markdown. The kind of thing you’d write documentation in.

[... 1,496 words]

1:23 pm / ai-agents

OpenClaw vs Claude Code Agent Teams — Architecture Comparison

OpenClaw and Claude Code Agent Teams share DNA — both use .md files as personality/rules injected into system prompts, both support multiple sessions with separate context windows, and both execute tools. But they’re fundamentally different in scope and execution model.

[... 1,183 words]

1:15 pm / ai-agents

OpenClaw / ClawdBot Architecture — How a Telegram AI Bot Actually Works

OpenClaw (ClawdBot) is a Telegram bot powered by Claude Opus that delivers daily AI news digests, runs scheduled jobs, and maintains persistent conversations. Here’s the full architecture — how every piece connects.

[... 1,200 words]

12:57 pm / ai-agents

Agentic Engineering Patterns — Key Takeaways from Simon Willison’s Guide

Simon Willison recently published an excellent guide on agentic engineering patterns — practical lessons from building with AI coding agents. Here are the key takeaways that matter most for AI agent developers, distilled from his full guide.

[... 567 words]

7:41 am / ai-agents

March 7, 2026

How AWS Strands Hooks Work

Hooks are an event-driven extensibility system — a way to inject custom logic at specific points in the agent lifecycle without modifying core code. Think of them as middleware/interceptors.

[... 713 words]

6:46 pm / ai-agents

How AWS Strands Agent Loop Works

Strands uses the ReAct pattern — a recursive loop where the LLM reasons, optionally calls tools, observes results, then reasons again. It is NOT prompt chaining (where you have a fixed sequence of prompts). The loop is open-ended and driven by the model’s decisions.

[... 650 words]

6:37 pm / ai-agents

How Claude Team Agents ACTUALLY Connect — No Fluff

Agents are separate processes. They don’t share memory. They don’t call each other’s functions. They communicate through files on disk and CLI commands that Claude Code provides as tools.

[... 2,382 words]

6:06 pm / ai-agents

How Pi Builds Its System Prompt at Runtime — And the Innovations That Make It Stand Out

A deep dive into the open-source coding agent that assembles its brain on the fly.

[... 2,918 words]

5:40 pm / ai-agents

Scaling Agents: The Definitive Open-Source Guide — From 1 Agent to 100 Agents, 1 Tool to 100 Tools, Managing Context

Tool Wall: Every tool’s JSON schema goes into the system prompt. At 15+ tools, models start selecting wrong tools. At 50+, token usage explodes and accuracy plummets.

[... 4,423 words]

3:58 pm / ai-agents

March 6, 2026

Finding the Perfect Prompt: Combining DSPy’s Optimization with Strands Agents for Cost-Effective Multi-Agent Systems

How we used Bayesian optimization to find better prompts automatically — and made cheap models perform like expensive ones.

[... 2,460 words]

5:10 pm / ai-agents

March 5, 2026

How to Save 90% on Agent Token Costs with Prompt Caching on AWS Bedrock

How I reduced my AI agent’s input token costs by 90% using prompt caching on AWS Bedrock — with real pricing data and hands-on examples using Strands Agents.

[... 2,661 words]

5:22 pm / ai-agents

Complete Guide: Setting Up XRoboToolkit for Robot Teleoperation with Pico 4 Ultra on WSL2

A step-by-step guide to setting up XR-based robot teleoperation using the Pico 4 Ultra headset, XRoboToolkit, and MuJoCo simulation — all running on Windows WSL2.

[... 1,293 words]

1:22 am / physical-ai

March 4, 2026

XR-Robotics with Pico 4 Ultra: VR Teleoperation Setup from Headset to Robot Simulation

I’ve been setting up XR-Robotics with a Pico 4 Ultra headset to teleoperate robot arms in simulation — and eventually collect demonstration data for imitation learning. The setup spans a PC running Ubuntu, a Python teleoperation stack, and a VR headset acting as the human interface. Here’s the complete step-by-step guide.

[... 2,636 words]

5:03 pm

March 2, 2026

ROS 2 Humble: Complete Installation Guide with Turtlesim from Zero to First Node

This is a complete walkthrough for installing ROS 2 Humble on Ubuntu 22.04 and getting your first robot simulation running with Turtlesim. I wrote this after going through the process myself — the official docs are thorough but scattered across many pages. This puts everything in one place, from locale setup to writing your first Python node.

[... 2,166 words]

4:31 pm / physical-ai

RDF, ROS, and Sim-to-Real: Understanding Robot Description Files

When you start working with robot simulation — whether it’s Isaac Sim, Gazebo, or MoveIt — you immediately run into a file called something.urdf. It’s one of those things that seems simple on the surface but connects to everything in the robotics stack. Here’s a clear breakdown of what URDF is, what it isn’t, and how it fits alongside ROS.

[... 1,082 words]

3:24 pm

OpenTelemetry for AI Agents: How the Strands SDK Instruments Traces, Metrics, and Token Usage

I’ve been digging into the Strands Agents SDK and was surprised to find a comprehensive, production-ready OpenTelemetry integration baked right in. If you’re building AI agents and wondering how to get visibility into what’s actually happening at runtime — model calls, tool executions, latencies, token usage — this is worth understanding.

[... 1,384 words]

3:23 pm / ai-agents

March 1, 2026

How a VLA Controls a Robot Arm: GR00T N1.5 System Architecture from Camera to Motor

I’ve been building a robot arm system that uses NVIDIA’s GR00T N1.5 — a Vision-Language-Action (VLA) model — to pick up objects from a table using only a camera, natural language instructions, and 50 demonstration episodes. After getting it working end-to-end, I wanted to write down the full system architecture for anyone trying to understand how all the pieces connect.

[... 912 words]

4:36 pm / physical-ai

Feb. 28, 2026

Collecting Training Data for VLA Robot Fine-Tuning (The Hard Way)

A Vision-Language-Action model takes camera images and a language instruction as input, and outputs robot joint actions. NVIDIA’s GR00T N1.5 is one such model — pre-trained on millions of robot demonstrations and fine-tunable for your specific robot and task. The catch: even though GR00T is pre-trained, you still need your own demonstrations to teach it your robot’s exact joint calibration, camera angles, and task environment. Without this, the model generates actions that are plausible in general but wrong for your specific setup.

[... 1,771 words]

6:55 am / physical-ai

«« first « previous page 2 / 3 next » last »»

Akshay Parkhi's Weblog