Sunday, 22nd February 2026
From Vision to Torques: How NVIDIA’s GR00T Stack Controls a Humanoid Robot
NVIDIA’s GR00T stack for humanoid robots has three layers: a Vision-Language-Action model that understands what to do, a whole-body controller that figures out how to move, and a physics simulator that validates it all before touching real hardware. Here’s how they connect.
[... 976 words]VLA → WBC → MuJoCo: Two Ways to Wire Up NVIDIA’s GR00T Humanoid Stack
There are two ways to wire up NVIDIA’s GR00T stack from vision-language all the way down to physics simulation: the official NVIDIA eval pipeline and a custom pipeline using the SONIC C++ binary. I’ve set up both. Here’s how they work and where they differ.
[... 674 words]