Akshay Parkhi's Weblog

Subscribe

Wednesday, 18th February 2026

GR00T N1.6 Architecture and Parameter Distribution

GR00T uses a massive “backbone” to understand its surroundings. It combines SigLIP 2 (for vision) and Qwen 3 (for language). While the eyes are frozen to keep perception stable, the reasoning layers are partially trainable to help the robot learn specific tasks.

[... 362 words]

How GR00T Merges Vision, Chat, and Action

The biggest challenge is that vision models speak “Image-ish” (pixels) while chat models speak “Text-ish” (tokens). GR00T uses a specialized component called a Projector to act as a real-time translator.

[... 377 words]

2026 » February

MTWTFSS
      1
2345678
9101112131415
16171819202122
232425262728