Teaching a Humanoid Robot to Wave: Custom Motions with GEAR-SONIC

24th February 2026

GEAR-SONIC can track arbitrary motions — not just its built-in locomotion styles. You define joint angles in a CSV, preview them with direct replay in MuJoCo, then deploy through the SONIC neural network. Here’s how the three-stage pipeline works.

Stage 1: Create the Motion

The script create_motion.py lets you define joint angles for each frame at 50 Hz. The Unitree G1 has 29 joints in IsaacLab ordering:

Legs (0-11) → Waist (12-14) → Left Arm (15-21) → Right Arm (22-28)

For example, a wave gesture builds up per frame using a smooth envelope that ramps 0→1, holds, then ramps 1→0:

for each frame:
    t = frame / 50Hz
    env = smooth_envelope(t)          # ramps up 0→1, holds, ramps down 1→0

    poses[R_SH_P]  = env * -0.8       # shoulder pitch (arm slightly forward)
    poses[R_SH_R]  = env * -1.2       # shoulder roll (arm out to side)
    poses[R_ELB]   = env * 1.5        # elbow bent 90° (forearm up)
    poses[R_WR_Y]  = env * sin(2.5Hz) # wrist oscillates side-to-side

This outputs six CSV files:

wave/
├── joint_pos.csv      (250 × 29) — joint angles per frame
├── joint_vel.csv      (250 × 29) — finite-difference velocities
├── body_pos.csv       (250 × 42) — 14 body positions (x,y,z)
├── body_quat.csv      (250 × 56) — 14 body orientations (w,x,y,z)
├── body_lin_vel.csv   (250 × 42) — body linear velocities
├── body_ang_vel.csv   (250 × 42) — body angular velocities
└── metadata.txt       — body part indexes, frame count

250 frames at 50 Hz gives a 5-second motion clip. Every quantity the SONIC encoder needs — joint positions, velocities, body orientations, angular velocities — is precomputed and written to disk.

Stage 2: Visualize the Motion

Before involving any neural network, you verify the motion with direct replay:

python visualize_motion.py --motion_dir wave/

This reads joint_pos.csv, reorders the joints from IsaacLab ordering to MuJoCo ordering, sets qpos directly, and renders. No policy, no controller — just raw joint angles applied frame by frame.

The reordering is necessary because IsaacLab and MuJoCo number joints differently:

isaaclab_to_mujoco = [0,3,6,9,13,17, 1,4,7,10,14,18, 2,5,8,11,15,19, ...]

What you see here is the ground truth — exactly the motion you defined. If it looks wrong at this stage, the CSV needs fixing before going further.

Stage 3: Deploy Through SONIC

This is the full pipeline with the neural network. Two terminals:

# Terminal 1: MuJoCo physics simulation
python gear_sonic/scripts/run_sim_loop.py

# Terminal 2: SONIC C++ inference with your motion data
bash deploy.sh --motion-data wave/ sim

Here’s what happens inside the SONIC C++ binary every 20ms (50 Hz):

┌──────────────────────────────────────────────────────┐
│  1. Read CSV files (joint_pos, body_quat, etc.)      │
│                                                      │
│  2. Every 20ms (50Hz):                               │
│     ┌──────────────────────────────────────────┐     │
│     │ Planner assembles reference:             │     │
│     │  - next motion reference frames from CSV │     │
│     │  - current robot joint positions         │     │
│     │  - current robot joint velocities        │     │
│     │  - body angular velocity                 │     │
│     │  - last actions                          │     │
│     └──────────────────┬───────────────────────┘     │
│                        ↓                             │
│     ┌──────────────────────────────────────────┐     │
│     │ Encoder (TensorRT GPU):                  │     │
│     │  290D motion joints + velocities         │     │
│     │  → 64D latent token                      │     │
│     └──────────────────┬───────────────────────┘     │
│                        ↓                             │
│     ┌──────────────────────────────────────────┐     │
│     │ Decoder (TensorRT GPU):                  │     │
│     │  latent + robot state → 29 joint targets │     │
│     └──────────────────┬───────────────────────┘     │
│                        ↓                             │
│     Send motor commands via DDS                      │
└──────────────────────────────────────────────────────┘
                         ↓
┌──────────────────────────────────────────────────────┐
│  MuJoCo (Terminal 1):                                │
│  Receive motor commands (kp, kd, q_target)           │
│  torque = kp*(q_target - q) + kd*(dq_target - dq)   │
│  Step physics → render                               │
└──────────────────────────────────────────────────────┘

Direct Replay vs Neural Network

This is the critical distinction: SONIC does not directly replay your CSV. It uses your motion as a hint to a trained neural network. The encoder compresses the motion reference into a 64-dimensional latent token. The decoder combines that latent with the robot’s actual physical state to produce joint targets that are physically feasible — respecting balance, joint limits, and contact dynamics.

The policy decides how closely to follow your motion. It was trained primarily on locomotion data, so:

Leg tracking — accurate, the policy has seen thousands of walking/running examples
Arm gestures — may be attenuated, the policy is conservative about upper-body motions it hasn’t seen during training

This is a feature, not a bug. The neural network keeps the robot balanced even if your motion CSV would cause it to fall over with direct replay.

Summary

Stage	What It Does	Command
Create	Define joint angles at 50 Hz, output CSVs	`python create_motion.py`
Verify	Direct replay in MuJoCo — no neural net	`python visualize_motion.py --motion_dir wave/`
Deploy	Through SONIC neural network	`bash deploy.sh --motion-data wave/ sim`

Create defines the intent. Verify confirms the intent looks right. Deploy runs it through a physics-aware neural network that keeps the robot on its feet.

Akshay Parkhi's Weblog