GR00T N1.6 — PickOrange (SO-101)

Fine-tuned NVIDIA GR00T N1.6 (3B) for the LeIsaac PickOrange task in Isaac Lab simulation.

Task

Pick oranges from a kitchen counter and place them on a plate using an SO-101 5-DOF robot arm + gripper.

Environment: LeIsaac-SO101-PickOrange-v0 (NVIDIA Isaac Lab)
Robot: SO-101 follower (5 arm joints + 1 gripper)
Cameras: Front (480x640) + Wrist (480x640)
Language instruction: "Pick the orange and place it on the plate"

Training

Parameter	Value
Base model	`nvidia/GR00T-N1.6-3B`
Training steps	10,000 (3 phases: 3K + 4K + 3K)
Learning rates	1e-4 → 5e-5 → 2e-5
Final loss	0.017
Batch size	8
Action horizon	16
Frozen	Diffusion decoder (`--no-tune-diffusion-model`)
GPU	RTX 4090 (24GB)
Dataset	60 teleoperation demos, dual camera

Loss Curve

Step    | Loss
--------|-------
    250 | 0.854
  1,000 | 0.082
  3,000 | 0.050
  5,000 | 0.030
  7,000 | 0.023
 10,000 | 0.017

Results

The model successfully reaches toward oranges and grasps them. Evaluated across 3 episodes (900 sim steps each = 15 seconds at 60Hz).

Eval Videos

See leisaac-pick-orange-learnings for recorded eval episodes.

Comparison with Other Approaches

Approach	Params	Grasp	Place
BC-RNN-GMM (no vision)	~1M	0%	0%
BC-RNN + ResNet18	~12M	0%	0%
SmolVLA	450M	60%	0%
GR00T N1.6 (this model)	3B	Reaching + grasping	In progress

Usage

Inference Server (GR00T client-server architecture)

cd /path/to/Isaac-GR00T

python gr00t/eval/run_gr00t_server.py \
  --model_path /path/to/this/checkpoint \
  --embodiment_tag NEW_EMBODIMENT \
  --port 5555 \
  --use_sim_policy_wrapper

Eval Client (Isaac Lab)

from gr00t.policy.server_client import PolicyClient
import numpy as np

client = PolicyClient(host="localhost", port=5555, timeout_ms=15000, strict=False)

# Observation format
obs = {
    "video.front": np.uint8, shape (1, 1, 480, 640, 3),  # (B, T, H, W, C)
    "video.wrist": np.uint8, shape (1, 1, 480, 640, 3),
    "state.single_arm": np.float32, shape (1, 1, 5),      # (B, T, D)
    "state.gripper": np.float32, shape (1, 1, 1),
    "annotation.human.task_description": ["Pick the orange and place it on the plate"],
}

action_dict, info = client._get_action(obs)
# Returns: action.single_arm (1, 16, 5), action.gripper (1, 16, 1)

Modality Config

Requires a custom modality config for the SO-101 embodiment. See so101_pick_orange_config.py.

Setup Notes

Install GR00T with pip install -e . --no-deps to avoid breaking Isaac Lab's torch
--use_sim_policy_wrapper flag is required on the server for flat observation format
State must be float32 (not float64) with temporal dim (B, T, D)
Video must include temporal dim (B, T, H, W, C)
Uses msgpack serialization — NOT compatible with LeIsaac's torch pickle client

Citation

@misc{groot-n1.6-pick-orange,
  title={GR00T N1.6 Fine-tuned for PickOrange},
  author={Rajesh Kumar},
  year={2026},
  url={https://huggingface.co/rajeshramana/groot-n1.6-pick-orange}
}

Acknowledgments

Downloads last month: 9

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for rajeshramana/groot-n1.6-pick-orange

Base model

nvidia/GR00T-N1.6-3B

Finetuned

(18)

this model

rajeshramana
/

groot-n1.6-pick-orange