Archive for Wednesday, 18th February 2026

Wednesday, 18th February 2026

GR00T N1.6 Architecture and Parameter Distribution

GR00T uses a massive “backbone” to understand its surroundings. It combines SigLIP 2 (for vision) and Qwen 3 (for language). While the eyes are frozen to keep perception stable, the reasoning layers are partially trainable to help the robot learn specific tasks.

[... 362 words]

2:16 am / physical-ai

How GR00T Merges Vision, Chat, and Action

The biggest challenge is that vision models speak “Image-ish” (pixels) while chat models speak “Text-ish” (tokens). GR00T uses a specialized component called a Projector to act as a real-time translator.

[... 377 words]

2:24 am / physical-ai

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Akshay Parkhi's Weblog

Wednesday, 18th February 2026

GR00T N1.6 Architecture and Parameter Distribution

How GR00T Merges Vision, Chat, and Action