NVIDIA’s GR00T Whole-Body Control stack in MuJoCo

20th February 2026

I’ve been running NVIDIA’s GR00T Whole-Body Control stack in MuJoCo — the sim-to-real bridge for humanoid robot locomotion. A MuJoCo viewer showing a simulated robot walking might look like a toy, but the neural network policy inside it is the same binary that runs on a real Unitree G1. Here’s what’s actually going on.

What sim2mujoco Actually Is

It’s the sim-to-real transfer bridge. The workflow:

Train a neural network policy in NVIDIA Isaac Sim (GPU-accelerated, thousands of robots in parallel)
Validate in MuJoCo — lightweight, accurate physics, catches policy failures before touching hardware
Deploy the same ONNX policy onto a real Unitree G1 robot

The policy file is the same binary (.onnx) at every stage. MuJoCo is just the test bench, not the end product. Think of it like watching a self-driving car in a simulator — looks like a video game, but the model inside it drives real cars.

Real-World Applications

This exact stack (GR00T WBC) is used for:

Warehouse automation — humanoid robots walking, picking, placing in unstructured environments
Manufacturing — robots that can navigate factory floors, climb stairs, handle objects
Hazardous environments — inspection in places unsafe for humans (nuclear, disaster zones)
General-purpose humanoid robotics — NVIDIA’s GR00T project is their bet on foundation models for humanoid control

Why It Matters Technically

The policy running here is the same binary (.onnx) that runs on real hardware — not a separate simulation-only thing
Whole-body control with 29 DOFs (legs + arms + torso) is an unsolved hard problem — this is state of the art
The PD controller, observation space, action scaling are all tuned to match real actuator dynamics

What Runs When You Launch the Simulation

Every 0.005s timestep, the simulation executes this loop:

reads sensors: joint positions, velocities, IMU (gravity, angular vel)
        ↓
builds observation vector (86 dims × 6 history frames = 516)
        ↓
feeds into ONNX policy (neural net)
        ↓
target joint angles (15 values)
        ↓
PD controller → joint torques
        ↓
MuJoCo physics engine steps the simulation
        ↓
repeat

What each piece is:

Component	What It Does
`GR00T-WholeBodyControl-Balance.onnx`	The trained neural network — same file you’d load on a real G1 robot
`g1_gear_wbc.xml`	MuJoCo model of the G1 — masses, joint limits, meshes, actuators — standing in for real hardware
PD controller	Converts target joint angles → torques, same math runs on real motor controllers
`compute_observation`	Simulates what real sensors (IMU, joint encoders) would report
Keyboard input	Simulates the command interface (joystick/autonomy stack on real robot)

MuJoCo replaces the physical robot. Everything else — the policy, the PD controller, the observation pipeline — is identical to what runs on real hardware. If the robot walks here, it has a high chance of walking on the real G1. If it falls here, it would fall in real life too. That’s the whole point: break things in simulation, not on a $50k+ robot.

Which Policy Is Actually Running

The gait script only loads one policy: GR00T-WholeBodyControl-Balance.onnx — the balance/standing policy. It does not load GR00T-WholeBodyControl-Walk.onnx.

Compare with the other script which loads both:

# run_mujoco_gear_wbc.py
self.policy = self.load_onnx_policy(self.config["policy_path"])        # Balance
self.walk_policy = self.load_onnx_policy(self.config["walk_policy_path"])  # Walk

So the gait script runs the Balance policy with gait logic layered on top in Python code. The Walk neural network (GR00T-WholeBodyControl-Walk.onnx) sits unused. These are NVIDIA’s pre-trained policies, trained in Isaac Sim using reinforcement learning on the Unitree G1.

Where Does GR00T Fit In

GR00T = Generalist Robot 00 Technology. It’s NVIDIA’s initiative to build foundation models that can control any humanoid robot. The hierarchy:

NVIDIA GR00T (project/platform)
    └── GR00T-WholeBodyControl (this repo)
            └── decoupled_wbc (the control framework)
                    └── sim2mujoco (what we're running)

The full GR00T project has multiple layers:

Layer	What It Does
GR00T Foundation Model	Large multimodal model — understands language and vision, generates robot actions
GR00T Whole-Body Control	Locomotion policies — walk, balance, recover from pushes
GR00T Dexterity	Hand and manipulation policies

The .onnx files in this codebase are low-level controllers produced by the GR00T project. They handle: don’t fall over, walk when told to walk, track height/orientation commands.

The Bigger Picture

In a full GR00T deployment:

Human says: "Go pick up that box"
        ↓
GR00T Foundation Model (vision + language → high-level plan)
        ↓
Commands: walk forward, turn left, reach arm...
        ↓
GR00T WholeBodyControl ← YOU ARE HERE
(balance + locomotion neural net)
        ↓
Joint torques → real robot moves

We’re running the legs of the stack. GR00T is the brain that would sit on top, sending the locomotion commands that you currently send with the keyboard.

Posted 20th February 2026 at 6:42 pm · Subscribe to my newsletter

Akshay Parkhi's Weblog