Akshay Parkhi's Weblog

Subscribe

Sunday, 22nd February 2026

From Vision to Torques: How NVIDIA’s GR00T Stack Controls a Humanoid Robot

NVIDIA’s GR00T stack for humanoid robots has three layers: a Vision-Language-Action model that understands what to do, a whole-body controller that figures out how to move, and a physics simulator that validates it all before touching real hardware. Here’s how they connect.

[... 976 words]

VLA → WBC → MuJoCo: Two Ways to Wire Up NVIDIA’s GR00T Humanoid Stack

There are two ways to wire up NVIDIA’s GR00T stack from vision-language all the way down to physics simulation: the official NVIDIA eval pipeline and a custom pipeline using the SONIC C++ binary. I’ve set up both. Here’s how they work and where they differ.

[... 674 words]

2026 » February

MTWTFSS
      1
2345678
9101112131415
16171819202122
232425262728